Etienne
January 31, 2013
250

# 03_Ploting with ggplot

Basic plotting with ggplot and some extras. Part of the Zero to R hero series (http://zerotorhero.wordpress.com).

January 31, 2013

## Transcript

1. ### Plotting in R using ggplot2 Etienne Low-Decarie material in part

prepared by Eric Pederson

4. ### You   Have you created a plot?   With what

data?   What kind of plot?   Have plotted with R?   used ggplot?
5. ### Follow along   Code and HTML available at:   https://github.com/zeroto

hero/MBSU   Recommendation   create your own new script   refer to provided code only if needed   avoid copy pasting or running the code directly from script   ggplot is also hosted on github   https://github.com/hadley/ggplot2

7. ### Outline   your first r plot   basic scatter plot

  Exercise 1   grammar of graphics   more advanced plots   Available plot elements and when to use them   Exercise 2   saving a plot   fine tuning your plot   themes   we help you plot your data
8. ### ggplot   plotting function : “qplot” (quick plot)   ?qplot

  arguments!   data!   x!   y!   …! Basic scatter plot
9. ### ggplot   look at built in “iris” data   ?iris!

  head(iris)!   str(iris)!   names(iris)! Basic scatter plot

!

12. ### ggplot   ?qplot   other arguments!   xlab!   ylab!

  main!   log!   …! Less basic scatter plot
13. ### ggplot Scatter plot qplot(data=iris,! x=Sepal.Length,! xlab="Sepal Width (mm)",! y=Sepal.Width,! !ylab="Sepal

Length (mm)",! !main="Sepal dimensions”)!
14. ### ggplot Exercise 1   produce a basic plot with build

in data!   CO2!   ?CO2!   BOD!   data()!
15. ### 6 H. WICKHAM Figure 1. Graphics objects produced by (from

left to right): geometric objects, scales and coordinate system, plot annotations. ggplot 1.  a graphic is made of elements (layers)   data   aesthetics (aes)   transformation   geoms (geometric objects)   axis (coordinate system)   scales Grammar of graphics (gg)
16. ### ggplot   Aesthetics (aes) make data visible:   x,y :

position along the x and y axis   colour: the colour of the point   group: what group a point belongs to   shape: the figure used to plot a point   linetype: the type of line used (solid, dashed, etc)   size: the size of the point or line   alpha: the transparency of the point Grammar of graphics (gg)
17. ### ggplot   geometric objects(geoms)   point: scatterplot   line: line

plot, where lines connect points by increasing x value   path: line plot, where lines connect points in sequence of appearance   boxplot: box-and-whisker plots, for catagorical y data   bar: barplots   histogram: histograms (for 1-dimensional data) Grammar of graphics (gg)
18. ### ggplot   Aesthetics (aes) make data visible:   x,y :

position along the x and y axis   colour: the colour of the point   group: what group a point belongs to   shape: the figure used to plot a point   linetype: the type of line used (solid, dashed, etc)   size: the size of the point or line   alpha: the transparency of the point Grammar of graphics (gg)

20. ### ggplot 2. editing an element produces a new graph 

just change the coordinate system! Grammar of graphics (gg) A LAYERED GRAMMAR OF GRAPHICS 23 Figure 16. Bar chart (left) and equivalent Coxcomb plot (right) of clarity distribution. The Coxcomb plot is a bar chart in polar coordinates. Note that the categories abut in the Coxcomb, but are separated in the bar chart: this is an example of a graphical convention that differs in different coordinate systems.

22. ### ggplot 1.  create a simple plot object   plot.object<-qplot()! 2.

add graphical layers/complexity   plot.object<-plot.object+layer()!   repeat step 2 until satisfied! 3.  print your object to screen (or to graphical device)   print(plot.object)! How it works
23. ### ggplot Scatter plot as an R object basic.plot<-qplot(data=iris,! ! !

!x=Sepal.Length,! ! ! !xlab="Sepal Width (mm)",! ! ! !y=Sepal.Width,! ! ! !ylab="Sepal Length (mm)",! ! ! !main="Sepal dimensions”)! ! print(basic.plot)!

25. ### ggplot Scatter plot with colour, shape and transparency   Add

aesthetics basic.plot<-qplot(data=iris,! x=Sepal.Length,! xlab="Sepal Width (mm)",! y=Sepal.Width,! ylab="Sepal Length (mm)",! main="Sepal dimensions",! colour=Species,! shape=Species,! alpha=I(0.5))! ! print(basic.p! ! ! !print(basic.plot)!
26. ### ggplot Scatter plot with linear regression   Add a geom

(eg. linear smooth) plot.with.linear.smooth<-basic.plot+! ! ! !geom_smooth(method="lm", se=F)! print(plot.with.linear.smooth)!
27. ### ggplot Exercise 2   produce a colorful plot containing linear

regressions with build in data!   CO2!   ?CO2!   msleep!   ?msleep!   OrchardSprays!   data()!
28. ### ggplot Changing and adding geoms print(categorical.plot)! ! print(categorical.plot+! ! !geom_boxplot())!

! categorical.plot<-qplot(data=iris,! x=Species,! !y=Sepal.Width,! !geom=c(“boxplot”))! print(categorical.plot) !

33. ### Available elements ggplot Geoms Geoms, short for geometric objects, describe

the type of plot you will produce. geom_abline Line specified by slope and intercept. geom_area Area plot. geom_bar Bars, rectangles with bases on x-axis geom_bin2d Add heatmap of 2d bin counts. geom_blank Blank, draws nothing. geom_boxplot Box and whiskers plot. geom_contour Display contours of a 3d surface in 2d. geom_crossbar Hollow bar with middle indicated by horizontal line. geom_density Display a smooth density estimate. geom_density2d Contours from a 2d density estimate. geom_dotplot Dot plot geom_errorbar Error bars. geom_errorbarh Horizontal error bars geom_freqpoly Frequency polygon. geom_hex Hexagon bining. geom_histogram Histogram geom_hline Horizontal line. geom_jitter Points, jittered to reduce overplotting. geom_line Connect observations, ordered by x value. geom_linerange An interval represented by a vertical line. geom_map Polygons from a reference map. geom_path Connect observations in original order geom_point Points, as for a scatterplot geom_pointrange Depends: stats, methods Imports: plyr, digest, grid, gtable, reshape2, scales, proto, MASS Suggests: quantreg, Hmisc, mapproj, maps, hexbin, maptools, multcomp, nlme, testthat Extends: http://docs.ggplot2.org! ! for even more! help(package=ggplot2)!
34. ### Exercise 3 ggplot   Explore geoms and other plot elements

with the data you have used   msleep!   ?msleep!   OrchardSprays!   data()!

36. ### ggplot Save plots   in a script pdf(“./plots/todays_plots.pdf”) print(basic.plot) print(plot.with.linear.smooth)

print(categorical.plot) print(CO2.plot)! graphics.off()! !   other methods   ?ggsave   ?jpeg
37. ### ggplot Fine tuning: scales CO2.plot +scale_colour_manual(values=c("nonchilled"="red" ,"chilled"="blue"))! ! CO2.plot+! scale_y_continuous(name

= "CO2 uptake rate",! !breaks = seq(5,50, by= 10),! !labels = seq(5,50, by= 10), trans="log10") ! ! !
38. ### ggplot Fine tuning: themes   theme_set(theme())   or plot+theme() 

themes   theme_bw()   theme_grey()   edit themes   mytheme <- theme_grey() + theme(plot.title = element_text(colour = "red"))   p + mytheme
39. ### ggplot base R plotting   qplot is not the only

way   ?plot   has many defaults for different object types   similar to qplot plot(iris) lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) plot(lm.SR)
40. ### ID variable Factor Measured value ID 1 Level 1 Measured

value ID 1 Level 2 Measured value ID 2 Level 1 Measured value ID 2 Level 2 Measured value ID variable Level 1 Level 2 ID 1 Measured value Measured value ID 2 Measured value Measured value Wide Long reshape ggplot likes it long…is you data wide?

42. ### ggplot Working with you data   Let us get you

at least one plot out of your data   Don’t yet have data?   Find some   google   http://datadryad.org   Create some   Survey your neighbours   Save your plots to a PDF

44. ### Split-Apply-Combine   Equivalent   SQL GROUP BY   Pivot Tables

(Excel, SPSS, …)   Split   Define a subset of your data   Apply   Do anything to this subset   calculation, modeling, simulations, plotting   Combine   Repeat this for all subsets   collect the results Journal of Statistical Software 7 2 1 1 2 1,2 Figure 1: The three ways to split up a 2d matrix, labelled above by the dimensions that they slice. Original matrix shown at top left, with dimensions labelled. A single piece under each splitting scheme is colored blue. 3 2 1 1 2 3 1,2 1,3 2,3 1,2,3 Figure 2: The seven ways to split up a 3d array, labelled above by the dimensions that they slice up. Original array shown at top left, with dimensions labelled. Blue indicates a single piece of the output. m*ply() takes a matrix, list-array, or data frame, splits it up by rows and calls the processing function supplying each piece as its parameters. Figure 3 shows how you might use this to draw random numbers from normal distributions with varying parameters. Input: Data frame (d*ply) When operating on a data frame, you usually want to split it up into groups based on com- binations of variables in the data set. For d*ply you specify which variables (or functions of variables) to use. These variables are speciﬁed in a special way to highlight that they are Split plyr
45. ### my.function<-function(subset.data){! ! ! ! results<-do.something(subset.data)! return(data.frame(results))}! ! my.function can produce

as many rows as subset.data (transform) or fewer rows than subset.data (summarize) ! returned.results<-ddply(.data=data,! .variable=c("variable1", "variable2”),! ! ! my.function(subset.data))! ! ! How it works Warning: idiosyncrasies present plyr
46. ### Example 1   Calculate the mean of each measure for

each species using the molten data set molten.means<-ddply(.data=molten.iris,! !.variables=c("Species", "measure"),! function(subset.data) data.frame(mean=mean(subset.data\$value)))   plyr
47. ### Example 3   Slope of width on length plyr length.on.width.slope<-function(subset.data){

with(subset.data,{ slope.sepal<-lm(Sepal.Width~Sepal.Length)\$coefficients[2] slope.petal<-lm(Petal.Width~Petal.Length)\$coefficients[2] return(data.frame(slope.sepal=slope.sepal, slope.petal=slope.petal)) }) } iris.slopes<-ddply(.data=iris, .variables="Species", function(x)length.on.width.slope(x))
48. ### Your turn   change functions   sd, length   range=max()-min()

  apply to other data   simesants, rats, iris, sipoo plyr
49. ### You   What was most interesting/useful?   What do you

still need to   find it easy to plot in R?   to have fun using R?   Please comment on our website on the MBSU page   http://zerotorhero.wordpress.com/ 2012/12/14/mbsu/
50. ### Acknowledgements   Reshape, plyr and ggplot2 are all brought to

you on GitHub by:   Hadley Wickham   had.co.nz Wickham, H. (2011). "The split-apply- combine strategy for data analysis." Journal of Statis. Wickham, H. (2010). "A layered grammar of graphics." Journal of Computational and Graphical Statistics 19(1): 3-28.

Pederson