Upgrade to Pro — share decks privately, control downloads, hide ads and more …

03_Ploting with ggplot

Etienne
January 31, 2013

03_Ploting with ggplot

Basic plotting with ggplot and some extras. Part of the Zero to R hero series (http://zerotorhero.wordpress.com).

Etienne

January 31, 2013
Tweet

More Decks by Etienne

Other Decks in Programming

Transcript

  1. You ›  Have you created a plot? ›  With what

    data? ›  What kind of plot? ›  Have plotted with R? ›  used ggplot?
  2. Follow along ›  Code and HTML available at: ›  https://github.com/zeroto

    hero/MBSU ›  Recommendation ›  create your own new script ›  refer to provided code only if needed ›  avoid copy pasting or running the code directly from script ›  ggplot is also hosted on github ›  https://github.com/hadley/ggplot2
  3. Outline ›  your first r plot ›  basic scatter plot

    ›  Exercise 1 ›  grammar of graphics ›  more advanced plots ›  Available plot elements and when to use them ›  Exercise 2 ›  saving a plot ›  fine tuning your plot ›  themes ›  we help you plot your data
  4. ggplot ›  plotting function : “qplot” (quick plot) ›  ?qplot

    ›  arguments! ›  data! ›  x! ›  y! ›  …! Basic scatter plot
  5. ggplot ›  look at built in “iris” data ›  ?iris!

    ›  head(iris)! ›  str(iris)! ›  names(iris)! Basic scatter plot
  6. ggplot ›  ?qplot ›  other arguments! ›  xlab! ›  ylab!

    ›  main! ›  log! ›  …! Less basic scatter plot
  7. ggplot Exercise 1 ›  produce a basic plot with build

    in data! ›  CO2! ›  ?CO2! ›  BOD! ›  data()!
  8. 6 H. WICKHAM Figure 1. Graphics objects produced by (from

    left to right): geometric objects, scales and coordinate system, plot annotations. ggplot 1.  a graphic is made of elements (layers) ›  data ›  aesthetics (aes) ›  transformation ›  geoms (geometric objects) ›  axis (coordinate system) ›  scales Grammar of graphics (gg)
  9. ggplot ›  Aesthetics (aes) make data visible: ›  x,y :

    position along the x and y axis ›  colour: the colour of the point ›  group: what group a point belongs to ›  shape: the figure used to plot a point ›  linetype: the type of line used (solid, dashed, etc) ›  size: the size of the point or line ›  alpha: the transparency of the point Grammar of graphics (gg)
  10. ggplot ›  geometric objects(geoms) ›  point: scatterplot ›  line: line

    plot, where lines connect points by increasing x value ›  path: line plot, where lines connect points in sequence of appearance ›  boxplot: box-and-whisker plots, for catagorical y data ›  bar: barplots ›  histogram: histograms (for 1-dimensional data) Grammar of graphics (gg)
  11. ggplot ›  Aesthetics (aes) make data visible: ›  x,y :

    position along the x and y axis ›  colour: the colour of the point ›  group: what group a point belongs to ›  shape: the figure used to plot a point ›  linetype: the type of line used (solid, dashed, etc) ›  size: the size of the point or line ›  alpha: the transparency of the point Grammar of graphics (gg)
  12. ggplot 2. editing an element produces a new graph › 

    just change the coordinate system! Grammar of graphics (gg) A LAYERED GRAMMAR OF GRAPHICS 23 Figure 16. Bar chart (left) and equivalent Coxcomb plot (right) of clarity distribution. The Coxcomb plot is a bar chart in polar coordinates. Note that the categories abut in the Coxcomb, but are separated in the bar chart: this is an example of a graphical convention that differs in different coordinate systems.
  13. ggplot 1.  create a simple plot object ›  plot.object<-qplot()! 2. 

    add graphical layers/complexity ›  plot.object<-plot.object+layer()! ›  repeat step 2 until satisfied! 3.  print your object to screen (or to graphical device) ›  print(plot.object)! How it works
  14. ggplot Scatter plot as an R object basic.plot<-qplot(data=iris,! ! !

    !x=Sepal.Length,! ! ! !xlab="Sepal Width (mm)",! ! ! !y=Sepal.Width,! ! ! !ylab="Sepal Length (mm)",! ! ! !main="Sepal dimensions”)! ! print(basic.plot)!
  15. ggplot Scatter plot with colour, shape and transparency ›  Add

    aesthetics basic.plot<-qplot(data=iris,! x=Sepal.Length,! xlab="Sepal Width (mm)",! y=Sepal.Width,! ylab="Sepal Length (mm)",! main="Sepal dimensions",! colour=Species,! shape=Species,! alpha=I(0.5))! ! print(basic.p! ! ! !print(basic.plot)!
  16. ggplot Scatter plot with linear regression ›  Add a geom

    (eg. linear smooth) plot.with.linear.smooth<-basic.plot+! ! ! !geom_smooth(method="lm", se=F)! print(plot.with.linear.smooth)!
  17. ggplot Exercise 2 ›  produce a colorful plot containing linear

    regressions with build in data! ›  CO2! ›  ?CO2! ›  msleep! ›  ?msleep! ›  OrchardSprays! ›  data()!
  18. ggplot Changing and adding geoms print(categorical.plot)! ! print(categorical.plot+! ! !geom_boxplot())!

    ! categorical.plot<-qplot(data=iris,! x=Species,! !y=Sepal.Width,! !geom=c(“boxplot”))! print(categorical.plot) !
  19. Available elements ggplot Geoms Geoms, short for geometric objects, describe

    the type of plot you will produce. geom_abline Line specified by slope and intercept. geom_area Area plot. geom_bar Bars, rectangles with bases on x-axis geom_bin2d Add heatmap of 2d bin counts. geom_blank Blank, draws nothing. geom_boxplot Box and whiskers plot. geom_contour Display contours of a 3d surface in 2d. geom_crossbar Hollow bar with middle indicated by horizontal line. geom_density Display a smooth density estimate. geom_density2d Contours from a 2d density estimate. geom_dotplot Dot plot geom_errorbar Error bars. geom_errorbarh Horizontal error bars geom_freqpoly Frequency polygon. geom_hex Hexagon bining. geom_histogram Histogram geom_hline Horizontal line. geom_jitter Points, jittered to reduce overplotting. geom_line Connect observations, ordered by x value. geom_linerange An interval represented by a vertical line. geom_map Polygons from a reference map. geom_path Connect observations in original order geom_point Points, as for a scatterplot geom_pointrange Depends: stats, methods Imports: plyr, digest, grid, gtable, reshape2, scales, proto, MASS Suggests: quantreg, Hmisc, mapproj, maps, hexbin, maptools, multcomp, nlme, testthat Extends: http://docs.ggplot2.org! ! for even more! help(package=ggplot2)!
  20. Exercise 3 ggplot ›  Explore geoms and other plot elements

    with the data you have used ›  msleep! ›  ?msleep! ›  OrchardSprays! ›  data()!
  21. ggplot Save plots ›  in a script pdf(“./plots/todays_plots.pdf”) print(basic.plot) print(plot.with.linear.smooth)

    print(categorical.plot) print(CO2.plot)! graphics.off()! ! ›  other methods ›  ?ggsave ›  ?jpeg
  22. ggplot Fine tuning: scales CO2.plot +scale_colour_manual(values=c("nonchilled"="red" ,"chilled"="blue"))! ! CO2.plot+! scale_y_continuous(name

    = "CO2 uptake rate",! !breaks = seq(5,50, by= 10),! !labels = seq(5,50, by= 10), trans="log10") ! ! !
  23. ggplot Fine tuning: themes ›  theme_set(theme()) ›  or plot+theme() › 

    themes ›  theme_bw() ›  theme_grey() ›  edit themes ›  mytheme <- theme_grey() + theme(plot.title = element_text(colour = "red")) ›  p + mytheme
  24. ggplot base R plotting ›  qplot is not the only

    way ›  ?plot ›  has many defaults for different object types ›  similar to qplot plot(iris) lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) plot(lm.SR)
  25. ID variable Factor Measured value ID 1 Level 1 Measured

    value ID 1 Level 2 Measured value ID 2 Level 1 Measured value ID 2 Level 2 Measured value ID variable Level 1 Level 2 ID 1 Measured value Measured value ID 2 Measured value Measured value Wide Long reshape ggplot likes it long…is you data wide?
  26. Melt: go long library(reshape)! ! molten.data<-melt(data,   id.vars=ls("id.var.1", "id.var.2"),  

    measure.vars=ls("measure.vars", "measure.vars"),   variable_name = "variable")! !       reshape head(molten.iris)   head(iris)  
  27. ggplot Working with you data ›  Let us get you

    at least one plot out of your data ›  Don’t yet have data? ›  Find some ›  google ›  http://datadryad.org ›  Create some ›  Survey your neighbours ›  Save your plots to a PDF
  28. Split-Apply-Combine ›  Equivalent ›  SQL GROUP BY ›  Pivot Tables

    (Excel, SPSS, …) ›  Split ›  Define a subset of your data ›  Apply ›  Do anything to this subset ›  calculation, modeling, simulations, plotting ›  Combine ›  Repeat this for all subsets ›  collect the results Journal of Statistical Software 7 2 1 1 2 1,2 Figure 1: The three ways to split up a 2d matrix, labelled above by the dimensions that they slice. Original matrix shown at top left, with dimensions labelled. A single piece under each splitting scheme is colored blue. 3 2 1 1 2 3 1,2 1,3 2,3 1,2,3 Figure 2: The seven ways to split up a 3d array, labelled above by the dimensions that they slice up. Original array shown at top left, with dimensions labelled. Blue indicates a single piece of the output. m*ply() takes a matrix, list-array, or data frame, splits it up by rows and calls the processing function supplying each piece as its parameters. Figure 3 shows how you might use this to draw random numbers from normal distributions with varying parameters. Input: Data frame (d*ply) When operating on a data frame, you usually want to split it up into groups based on com- binations of variables in the data set. For d*ply you specify which variables (or functions of variables) to use. These variables are specified in a special way to highlight that they are Split plyr
  29. my.function<-function(subset.data){! ! ! ! results<-do.something(subset.data)! return(data.frame(results))}! ! my.function can produce

    as many rows as subset.data (transform) or fewer rows than subset.data (summarize) ! returned.results<-ddply(.data=data,! .variable=c("variable1", "variable2”),! ! ! my.function(subset.data))! ! ! How it works Warning: idiosyncrasies present plyr
  30. Example 1 ›  Calculate the mean of each measure for

    each species using the molten data set molten.means<-ddply(.data=molten.iris,! !.variables=c("Species", "measure"),! function(subset.data) data.frame(mean=mean(subset.data$value)))   plyr
  31. Example 3 ›  Slope of width on length plyr length.on.width.slope<-function(subset.data){

    with(subset.data,{ slope.sepal<-lm(Sepal.Width~Sepal.Length)$coefficients[2] slope.petal<-lm(Petal.Width~Petal.Length)$coefficients[2] return(data.frame(slope.sepal=slope.sepal, slope.petal=slope.petal)) }) } iris.slopes<-ddply(.data=iris, .variables="Species", function(x)length.on.width.slope(x))
  32. Your turn ›  change functions ›  sd, length ›  range=max()-min()

    ›  apply to other data ›  simesants, rats, iris, sipoo plyr
  33. You ›  What was most interesting/useful? ›  What do you

    still need to ›  find it easy to plot in R? ›  to have fun using R? ›  Please comment on our website on the MBSU page ›  http://zerotorhero.wordpress.com/ 2012/12/14/mbsu/
  34. Acknowledgements ›  Reshape, plyr and ggplot2 are all brought to

    you on GitHub by: ›  Hadley Wickham ›  had.co.nz Wickham, H. (2011). "The split-apply- combine strategy for data analysis." Journal of Statis. Wickham, H. (2010). "A layered grammar of graphics." Journal of Computational and Graphical Statistics 19(1): 3-28.