210

# 03_Ploting with ggplot

Basic plotting with ggplot and some extras. Part of the Zero to R hero series (http://zerotorhero.wordpress.com). January 31, 2013

## Transcript

1. Plotting in R
using ggplot2
Etienne Low-Decarie
material in part prepared
by Eric Pederson

2. www.meetup.com/Montreal-R-User-Group/

3. http://www.codeschool.com/courses/try-r

4. You
  Have you created a plot?
  With what data?
  What kind of plot?
  Have plotted with R?
  used ggplot?

  Code and HTML available at:
  https://github.com/zeroto hero/MBSU
  Recommendation
  create your own new script
  refer to provided code only if needed
  avoid copy pasting or running the code directly
from script
  ggplot is also hosted on github

6. Required packages
  install.packages(ggplot2)
  require(ggplot2)

7. Outline
  basic scatter plot
  Exercise 1
  grammar of graphics
  Available plot elements and when to use them
  Exercise 2
  saving a plot
  themes

8. ggplot
  plotting function : “qplot” (quick plot)
  ?qplot
  arguments!
  data!
  x!
  y!
  …!
Basic scatter plot

9. ggplot
  look at built in “iris” data
  ?iris!
  str(iris)!
  names(iris)!
Basic scatter plot

10. ggplot
Basic scatter plot
qplot(data=iris,!
x=Sepal.Length,!
y=Sepal.Width)!
!
! ! !

11. ggplot
Basic scatter plot (categorical)
qplot(data=iris,!
x=Species,!
!y=Sepal.Width)

12. ggplot
  ?qplot
  other arguments!
  xlab!
  ylab!
  main!
  log!
  …!
Less basic scatter plot

13. ggplot
Scatter plot
qplot(data=iris,!
x=Sepal.Length,!
xlab="Sepal Width (mm)",!
y=Sepal.Width,!
!ylab="Sepal Length (mm)",!
!main="Sepal dimensions”)!

14. ggplot
Exercise 1
  produce a basic plot with build in
data!
  CO2!
  ?CO2!
  BOD!
  data()!

15. 6 H. WICKHAM
Figure 1. Graphics objects produced by (from left to right): geometric objects, scales and coordinate system,
plot annotations.
ggplot
1.  a graphic is made of elements (layers)
  data
  aesthetics (aes)
  transformation
  geoms (geometric objects)
  axis (coordinate system)
  scales
Grammar of graphics (gg)

16. ggplot
  Aesthetics (aes) make data visible:
  x,y : position along the x and y axis
  colour: the colour of the point
  group: what group a point belongs to
  shape: the figure used to plot a point
  linetype: the type of line used (solid, dashed, etc)
  size: the size of the point or line
  alpha: the transparency of the point
Grammar of graphics (gg)

17. ggplot
  geometric objects(geoms)
  point: scatterplot
  line: line plot, where lines connect points by
increasing x value
  path: line plot, where lines connect points in
sequence of appearance
  boxplot: box-and-whisker plots, for catagorical y data
  bar: barplots
  histogram: histograms (for 1-dimensional data)
Grammar of graphics (gg)

18. ggplot
  Aesthetics (aes) make data visible:
  x,y : position along the x and y axis
  colour: the colour of the point
  group: what group a point belongs to
  shape: the figure used to plot a point
  linetype: the type of line used (solid, dashed, etc)
  size: the size of the point or line
  alpha: the transparency of the point
Grammar of graphics (gg)

19. ggplot
Grammar of graphics (gg)

20. ggplot
2. editing an element produces a new
graph
  just change the coordinate system!
Grammar of graphics (gg)
A LAYERED GRAMMAR OF GRAPHICS 23
Figure 16. Bar chart (left) and equivalent Coxcomb plot (right) of clarity distribution. The Coxcomb plot is a
bar chart in polar coordinates. Note that the categories abut in the Coxcomb, but are separated in the bar chart:
this is an example of a graphical convention that differs in different coordinate systems.

21. ggplot
Grammar of graphics (gg)

22. ggplot
1.  create a simple plot object
  plot.object<-qplot()!
  plot.object<-plot.object+layer()!
  repeat step 2 until satisfied!
3.  print your object to screen (or to
graphical device)
  print(plot.object)!
How it works

23. ggplot
Scatter plot as an R object
basic.plot<-qplot(data=iris,!
! ! !x=Sepal.Length,!
! ! !xlab="Sepal Width (mm)",!
! ! !y=Sepal.Width,!
! ! !ylab="Sepal Length (mm)",!
! ! !main="Sepal dimensions”)!
!
print(basic.plot)!

24. ggplot
Basic scatter plot (categorical)
categorical.plot<-qplot(data=iris,!
x=Species,!
!y=Sepal.Width)!
print(categorical.plot)

25. ggplot
Scatter plot with colour, shape
and transparency
basic.plot<-qplot(data=iris,!
x=Sepal.Length,!
xlab="Sepal Width (mm)",!
y=Sepal.Width,!
ylab="Sepal Length (mm)",!
main="Sepal dimensions",!
colour=Species,!
shape=Species,!
alpha=I(0.5))!
!
print(basic.p!
! ! !print(basic.plot)!

26. ggplot
Scatter plot with linear
regression
  Add a geom (eg. linear smooth)
plot.with.linear.smooth<-basic.plot+!
! ! !geom_smooth(method="lm", se=F)!
print(plot.with.linear.smooth)!

27. ggplot
Exercise 2
  produce a colorful plot containing
linear regressions with build in
data!
  CO2!
  ?CO2!
  msleep!
  ?msleep!
  OrchardSprays!
  data()!

28. ggplot
print(categorical.plot)!
!
print(categorical.plot+!
! !geom_boxplot())!
!
categorical.plot<-qplot(data=iris,!
x=Species,!
!y=Sepal.Width,!
!geom=c(“boxplot”))!
print(categorical.plot)
!

29. ggplot
Basic plot 2
CO2.plot<-qplot(data=CO2,!
x=conc,!
y=uptake,!
colour=Treatment)!
!
print(CO2.plot)!

30. ggplot
Facets
plot.object<-plot.object + facet_grid(rows~columns)!
!
CO2.plot<-CO2.plot+facet_grid(.~Type)!
print(CO2.plot)!

31. ggplot
Groups
print(CO2.plot+geom_line())!

32. ggplot
Groups
  Specify groups
CO2.plot<-CO2.plot+geom_line(aes(group=Plant))!
print(CO2.plot)!

33. Available elements
ggplot
Geoms
Geoms, short for geometric objects, describe the type of plot you will produce.
geom_abline
Line specified by slope and intercept.
geom_area
Area plot.
geom_bar
Bars, rectangles with bases on x-axis
geom_bin2d
Add heatmap of 2d bin counts.
geom_blank
Blank, draws nothing.
geom_boxplot
Box and whiskers plot.
geom_contour
Display contours of a 3d surface in 2d.
geom_crossbar
Hollow bar with middle indicated by horizontal line.
geom_density
Display a smooth density estimate.
geom_density2d
Contours from a 2d density estimate.
geom_dotplot
Dot plot
geom_errorbar
Error bars.
geom_errorbarh
Horizontal error bars
geom_freqpoly
Frequency polygon.
geom_hex
Hexagon bining.
geom_histogram
Histogram
geom_hline
Horizontal line.
geom_jitter
Points, jittered to reduce overplotting.
geom_line
Connect observations, ordered by x value.
geom_linerange
An interval represented by a vertical line.
geom_map
Polygons from a reference map.
geom_path
Connect observations in original order
geom_point
Points, as for a scatterplot
geom_pointrange
Depends: stats, methods
Imports: plyr, digest, grid, gtable,
reshape2, scales, proto, MASS
Suggests: quantreg, Hmisc, mapproj,
maps, hexbin, maptools, multcomp, nlme,
testthat
Extends:
http://docs.ggplot2.org!
!
for even more!
help(package=ggplot2)!

34. Exercise 3
ggplot
  Explore geoms and other plot elements with the
data you have used
  msleep!
  ?msleep!
  OrchardSprays!
  data()!

35. ggplot
Save plots
  in RStudio

36. ggplot
Save plots
  in a script
pdf(“./plots/todays_plots.pdf”)
print(basic.plot)
print(plot.with.linear.smooth)
print(categorical.plot)
print(CO2.plot)!
graphics.off()!
!
  other methods
  ?ggsave
  ?jpeg

37. ggplot
Fine tuning: scales
CO2.plot
+scale_colour_manual(values=c("nonchilled"="red"
,"chilled"="blue"))!
!
CO2.plot+!
scale_y_continuous(name = "CO2 uptake rate",!
!breaks = seq(5,50, by= 10),!
!labels = seq(5,50, by= 10), trans="log10") !
!
!

38. ggplot
Fine tuning: themes
  theme_set(theme())
  or plot+theme()
  themes
  theme_bw()
  theme_grey()
  edit themes
  mytheme <- theme_grey() +
theme(plot.title = element_text(colour = "red"))
  p + mytheme

39. ggplot
base R plotting
  qplot is not the only way
  ?plot
  has many defaults for different
object types
  similar to qplot
plot(iris)
lm.SR <- lm(sr ~ pop15 + pop75 + dpi
+ ddpi, data = LifeCycleSavings)
plot(lm.SR)

40. ID variable Factor Measured value
ID 1 Level 1 Measured value
ID 1 Level 2 Measured value
ID 2 Level 1 Measured value
ID 2 Level 2 Measured value
ID variable Level 1 Level 2
ID 1 Measured value Measured value
ID 2 Measured value Measured value
Wide
Long
reshape
ggplot likes it long…is you
data wide?

41. Melt: go long
library(reshape)!
!
molten.data<-melt(data,
id.vars=ls("id.var.1", "id.var.2"),
measure.vars=ls("measure.vars", "measure.vars"),
variable_name = "variable")!
!

reshape

42. ggplot
Working with you data
  Let us get you at least
one plot out of your data
  Don’t yet have data?
  Find some
  Create some
  Save your plots to a PDF

43. plyr
  library(plyr)
plyr

44. Split-Apply-Combine
  Equivalent
  SQL GROUP BY
  Pivot Tables (Excel, SPSS, …)
  Split
  Define a subset of your data
  Apply
  Do anything to this subset
  calculation, modeling, simulations, plotting
  Combine
  Repeat this for all subsets
  collect the results
Journal of Statistical Software
7
2
1
1
2 1,2
Figure 1: The three ways to split up a 2d matrix, labelled above by the dimensions that they
slice. Original matrix shown at top left, with dimensions labelled. A single piece under each
splitting scheme is colored blue.
3
2
1
1 2 3
1,2 1,3 2,3
1,2,3
Figure 2: The seven ways to split up a 3d array, labelled above by the dimensions that they
slice up. Original array shown at top left, with dimensions labelled. Blue indicates a single
piece of the output.
m*ply() takes a matrix, list-array, or data frame, splits it up by rows and calls the processing
function supplying each piece as its parameters. Figure 3 shows how you might use this to
draw random numbers from normal distributions with varying parameters.
Input: Data frame (d*ply)
When operating on a data frame, you usually want to split it up into groups based on com-
binations of variables in the data set. For d*ply you specify which variables (or functions
of variables) to use. These variables are speciﬁed in a special way to highlight that they are
Split
plyr

45. my.function<-function(subset.data){!
! ! ! results<-do.something(subset.data)!
return(data.frame(results))}!
!
my.function can produce as many rows as subset.data (transform)
or fewer rows than subset.data (summarize)
!
returned.results<-ddply(.data=data,!
.variable=c("variable1", "variable2”),!
! ! my.function(subset.data))!
!
!
How it works
Warning: idiosyncrasies
present
plyr

46. Example 1
  Calculate the mean of each measure for
each species using the molten data set
molten.means<-ddply(.data=molten.iris,!
!.variables=c("Species", "measure"),!
function(subset.data) data.frame(mean=mean(subset.data\$value)))
plyr

47. Example 3
  Slope of width on length
plyr
length.on.width.slope<-function(subset.data){
with(subset.data,{
slope.sepal<-lm(Sepal.Width~Sepal.Length)\$coefficients
slope.petal<-lm(Petal.Width~Petal.Length)\$coefficients
return(data.frame(slope.sepal=slope.sepal,
slope.petal=slope.petal))
})
}
iris.slopes<-ddply(.data=iris,
.variables="Species",
function(x)length.on.width.slope(x))

  change functions
  sd, length
  range=max()-min()
  apply to other data
  simesants, rats, iris, sipoo
plyr

49. You
  What was most interesting/useful?
  What do you still need to
  find it easy to plot in R?
  to have fun using R?
  Please comment on our website on
the MBSU page
  http://zerotorhero.wordpress.com/
2012/12/14/mbsu/

50. Acknowledgements
  Reshape, plyr and ggplot2 are all brought to you on
GitHub by: