Upgrade to Pro — share decks privately, control downloads, hide ads and more …

03_Ploting with ggplot

Etienne
January 31, 2013

03_Ploting with ggplot

Basic plotting with ggplot and some extras. Part of the Zero to R hero series (http://zerotorhero.wordpress.com).

Etienne

January 31, 2013
Tweet

More Decks by Etienne

Other Decks in Programming

Transcript

  1. Plotting in R
    using ggplot2
    Etienne Low-Decarie
    material in part prepared
    by Eric Pederson

    View full-size slide

  2. www.meetup.com/Montreal-R-User-Group/

    View full-size slide

  3. http://www.codeschool.com/courses/try-r

    View full-size slide

  4. You
    ›  Have you created a plot?
    ›  With what data?
    ›  What kind of plot?
    ›  Have plotted with R?
    ›  used ggplot?

    View full-size slide

  5. Follow along
    ›  Code and HTML available at:
    ›  https://github.com/zeroto hero/MBSU
    ›  Recommendation
    ›  create your own new script
    ›  refer to provided code only if needed
    ›  avoid copy pasting or running the code directly
    from script
    ›  ggplot is also hosted on github
    ›  https://github.com/hadley/ggplot2

    View full-size slide

  6. Required packages
    ›  install.packages(ggplot2)
    ›  require(ggplot2)

    View full-size slide

  7. Outline
    ›  your first r plot
    ›  basic scatter plot
    ›  Exercise 1
    ›  grammar of graphics
    ›  more advanced plots
    ›  Available plot elements and when to use them
    ›  Exercise 2
    ›  saving a plot
    ›  fine tuning your plot
    ›  themes
    ›  we help you plot your data

    View full-size slide

  8. ggplot
    ›  plotting function : “qplot” (quick plot)
    ›  ?qplot
    ›  arguments!
    ›  data!
    ›  x!
    ›  y!
    ›  …!
    Basic scatter plot

    View full-size slide

  9. ggplot
    ›  look at built in “iris” data
    ›  ?iris!
    ›  head(iris)!
    ›  str(iris)!
    ›  names(iris)!
    Basic scatter plot

    View full-size slide

  10. ggplot
    Basic scatter plot
    qplot(data=iris,!
    x=Sepal.Length,!
    y=Sepal.Width)!
    !
    ! ! !

    View full-size slide

  11. ggplot
    Basic scatter plot (categorical)
    qplot(data=iris,!
    x=Species,!
    !y=Sepal.Width)

    View full-size slide

  12. ggplot
    ›  ?qplot
    ›  other arguments!
    ›  xlab!
    ›  ylab!
    ›  main!
    ›  log!
    ›  …!
    Less basic scatter plot

    View full-size slide

  13. ggplot
    Scatter plot
    qplot(data=iris,!
    x=Sepal.Length,!
    xlab="Sepal Width (mm)",!
    y=Sepal.Width,!
    !ylab="Sepal Length (mm)",!
    !main="Sepal dimensions”)!

    View full-size slide

  14. ggplot
    Exercise 1
    ›  produce a basic plot with build in
    data!
    ›  CO2!
    ›  ?CO2!
    ›  BOD!
    ›  data()!

    View full-size slide

  15. 6 H. WICKHAM
    Figure 1. Graphics objects produced by (from left to right): geometric objects, scales and coordinate system,
    plot annotations.
    ggplot
    1.  a graphic is made of elements (layers)
    ›  data
    ›  aesthetics (aes)
    ›  transformation
    ›  geoms (geometric objects)
    ›  axis (coordinate system)
    ›  scales
    Grammar of graphics (gg)

    View full-size slide

  16. ggplot
    ›  Aesthetics (aes) make data visible:
    ›  x,y : position along the x and y axis
    ›  colour: the colour of the point
    ›  group: what group a point belongs to
    ›  shape: the figure used to plot a point
    ›  linetype: the type of line used (solid, dashed, etc)
    ›  size: the size of the point or line
    ›  alpha: the transparency of the point
    Grammar of graphics (gg)

    View full-size slide

  17. ggplot
    ›  geometric objects(geoms)
    ›  point: scatterplot
    ›  line: line plot, where lines connect points by
    increasing x value
    ›  path: line plot, where lines connect points in
    sequence of appearance
    ›  boxplot: box-and-whisker plots, for catagorical y data
    ›  bar: barplots
    ›  histogram: histograms (for 1-dimensional data)
    Grammar of graphics (gg)

    View full-size slide

  18. ggplot
    ›  Aesthetics (aes) make data visible:
    ›  x,y : position along the x and y axis
    ›  colour: the colour of the point
    ›  group: what group a point belongs to
    ›  shape: the figure used to plot a point
    ›  linetype: the type of line used (solid, dashed, etc)
    ›  size: the size of the point or line
    ›  alpha: the transparency of the point
    Grammar of graphics (gg)

    View full-size slide

  19. ggplot
    Grammar of graphics (gg)

    View full-size slide

  20. ggplot
    2. editing an element produces a new
    graph
    ›  just change the coordinate system!
    Grammar of graphics (gg)
    A LAYERED GRAMMAR OF GRAPHICS 23
    Figure 16. Bar chart (left) and equivalent Coxcomb plot (right) of clarity distribution. The Coxcomb plot is a
    bar chart in polar coordinates. Note that the categories abut in the Coxcomb, but are separated in the bar chart:
    this is an example of a graphical convention that differs in different coordinate systems.

    View full-size slide

  21. ggplot
    Grammar of graphics (gg)

    View full-size slide

  22. ggplot
    1.  create a simple plot object
    ›  plot.object<-qplot()!
    2.  add graphical layers/complexity
    ›  plot.object<-plot.object+layer()!
    ›  repeat step 2 until satisfied!
    3.  print your object to screen (or to
    graphical device)
    ›  print(plot.object)!
    How it works

    View full-size slide

  23. ggplot
    Scatter plot as an R object
    basic.plot<-qplot(data=iris,!
    ! ! !x=Sepal.Length,!
    ! ! !xlab="Sepal Width (mm)",!
    ! ! !y=Sepal.Width,!
    ! ! !ylab="Sepal Length (mm)",!
    ! ! !main="Sepal dimensions”)!
    !
    print(basic.plot)!

    View full-size slide

  24. ggplot
    Basic scatter plot (categorical)
    categorical.plot<-qplot(data=iris,!
    x=Species,!
    !y=Sepal.Width)!
    print(categorical.plot)

    View full-size slide

  25. ggplot
    Scatter plot with colour, shape
    and transparency
    ›  Add aesthetics
    basic.plot<-qplot(data=iris,!
    x=Sepal.Length,!
    xlab="Sepal Width (mm)",!
    y=Sepal.Width,!
    ylab="Sepal Length (mm)",!
    main="Sepal dimensions",!
    colour=Species,!
    shape=Species,!
    alpha=I(0.5))!
    !
    print(basic.p!
    ! ! !print(basic.plot)!

    View full-size slide

  26. ggplot
    Scatter plot with linear
    regression
    ›  Add a geom (eg. linear smooth)
    plot.with.linear.smooth<-basic.plot+!
    ! ! !geom_smooth(method="lm", se=F)!
    print(plot.with.linear.smooth)!

    View full-size slide

  27. ggplot
    Exercise 2
    ›  produce a colorful plot containing
    linear regressions with build in
    data!
    ›  CO2!
    ›  ?CO2!
    ›  msleep!
    ›  ?msleep!
    ›  OrchardSprays!
    ›  data()!

    View full-size slide

  28. ggplot
    Changing and adding geoms
    print(categorical.plot)!
    !
    print(categorical.plot+!
    ! !geom_boxplot())!
    !
    categorical.plot<-qplot(data=iris,!
    x=Species,!
    !y=Sepal.Width,!
    !geom=c(“boxplot”))!
    print(categorical.plot)
    !

    View full-size slide

  29. ggplot
    Basic plot 2
    CO2.plot<-qplot(data=CO2,!
    x=conc,!
    y=uptake,!
    colour=Treatment)!
    !
    print(CO2.plot)!

    View full-size slide

  30. ggplot
    Facets
    plot.object<-plot.object + facet_grid(rows~columns)!
    !
    CO2.plot<-CO2.plot+facet_grid(.~Type)!
    print(CO2.plot)!

    View full-size slide

  31. ggplot
    Groups
    ›  add a geom (line)
    print(CO2.plot+geom_line())!

    View full-size slide

  32. ggplot
    Groups
    ›  Specify groups
    CO2.plot<-CO2.plot+geom_line(aes(group=Plant))!
    print(CO2.plot)!

    View full-size slide

  33. Available elements
    ggplot
    Geoms
    Geoms, short for geometric objects, describe the type of plot you will produce.
    geom_abline
    Line specified by slope and intercept.
    geom_area
    Area plot.
    geom_bar
    Bars, rectangles with bases on x-axis
    geom_bin2d
    Add heatmap of 2d bin counts.
    geom_blank
    Blank, draws nothing.
    geom_boxplot
    Box and whiskers plot.
    geom_contour
    Display contours of a 3d surface in 2d.
    geom_crossbar
    Hollow bar with middle indicated by horizontal line.
    geom_density
    Display a smooth density estimate.
    geom_density2d
    Contours from a 2d density estimate.
    geom_dotplot
    Dot plot
    geom_errorbar
    Error bars.
    geom_errorbarh
    Horizontal error bars
    geom_freqpoly
    Frequency polygon.
    geom_hex
    Hexagon bining.
    geom_histogram
    Histogram
    geom_hline
    Horizontal line.
    geom_jitter
    Points, jittered to reduce overplotting.
    geom_line
    Connect observations, ordered by x value.
    geom_linerange
    An interval represented by a vertical line.
    geom_map
    Polygons from a reference map.
    geom_path
    Connect observations in original order
    geom_point
    Points, as for a scatterplot
    geom_pointrange
    Depends: stats, methods
    Imports: plyr, digest, grid, gtable,
    reshape2, scales, proto, MASS
    Suggests: quantreg, Hmisc, mapproj,
    maps, hexbin, maptools, multcomp, nlme,
    testthat
    Extends:
    http://docs.ggplot2.org!
    !
    for even more!
    help(package=ggplot2)!

    View full-size slide

  34. Exercise 3
    ggplot
    ›  Explore geoms and other plot elements with the
    data you have used
    ›  msleep!
    ›  ?msleep!
    ›  OrchardSprays!
    ›  data()!

    View full-size slide

  35. ggplot
    Save plots
    ›  in RStudio

    View full-size slide

  36. ggplot
    Save plots
    ›  in a script
    pdf(“./plots/todays_plots.pdf”)
    print(basic.plot)
    print(plot.with.linear.smooth)
    print(categorical.plot)
    print(CO2.plot)!
    graphics.off()!
    !
    ›  other methods
    ›  ?ggsave
    ›  ?jpeg

    View full-size slide

  37. ggplot
    Fine tuning: scales
    CO2.plot
    +scale_colour_manual(values=c("nonchilled"="red"
    ,"chilled"="blue"))!
    !
    CO2.plot+!
    scale_y_continuous(name = "CO2 uptake rate",!
    !breaks = seq(5,50, by= 10),!
    !labels = seq(5,50, by= 10), trans="log10") !
    !
    !

    View full-size slide

  38. ggplot
    Fine tuning: themes
    ›  theme_set(theme())
    ›  or plot+theme()
    ›  themes
    ›  theme_bw()
    ›  theme_grey()
    ›  edit themes
    ›  mytheme <- theme_grey() +
    theme(plot.title = element_text(colour = "red"))
    ›  p + mytheme

    View full-size slide

  39. ggplot
    base R plotting
    ›  qplot is not the only way
    ›  ?plot
    ›  has many defaults for different
    object types
    ›  similar to qplot
    plot(iris)
    lm.SR <- lm(sr ~ pop15 + pop75 + dpi
    + ddpi, data = LifeCycleSavings)
    plot(lm.SR)

    View full-size slide

  40. ID variable Factor Measured value
    ID 1 Level 1 Measured value
    ID 1 Level 2 Measured value
    ID 2 Level 1 Measured value
    ID 2 Level 2 Measured value
    ID variable Level 1 Level 2
    ID 1 Measured value Measured value
    ID 2 Measured value Measured value
    Wide
    Long
    reshape
    ggplot likes it long…is you
    data wide?

    View full-size slide

  41. Melt: go long
    library(reshape)!
    !
    molten.data<-melt(data,  
    id.vars=ls("id.var.1", "id.var.2"),  
    measure.vars=ls("measure.vars", "measure.vars"),  
    variable_name = "variable")!
    !
     
     
     
    reshape
    head(molten.iris)  
    head(iris)  

    View full-size slide

  42. ggplot
    Working with you data
    ›  Let us get you at least
    one plot out of your data
    ›  Don’t yet have data?
    ›  Find some
    ›  google
    ›  http://datadryad.org
    ›  Create some
    ›  Survey your neighbours
    ›  Save your plots to a PDF

    View full-size slide

  43. plyr
    ›  library(plyr)
    plyr

    View full-size slide

  44. Split-Apply-Combine
    ›  Equivalent
    ›  SQL GROUP BY
    ›  Pivot Tables (Excel, SPSS, …)
    ›  Split
    ›  Define a subset of your data
    ›  Apply
    ›  Do anything to this subset
    ›  calculation, modeling, simulations, plotting
    ›  Combine
    ›  Repeat this for all subsets
    ›  collect the results
    Journal of Statistical Software
    7
    2
    1
    1
    2 1,2
    Figure 1: The three ways to split up a 2d matrix, labelled above by the dimensions that they
    slice. Original matrix shown at top left, with dimensions labelled. A single piece under each
    splitting scheme is colored blue.
    3
    2
    1
    1 2 3
    1,2 1,3 2,3
    1,2,3
    Figure 2: The seven ways to split up a 3d array, labelled above by the dimensions that they
    slice up. Original array shown at top left, with dimensions labelled. Blue indicates a single
    piece of the output.
    m*ply() takes a matrix, list-array, or data frame, splits it up by rows and calls the processing
    function supplying each piece as its parameters. Figure 3 shows how you might use this to
    draw random numbers from normal distributions with varying parameters.
    Input: Data frame (d*ply)
    When operating on a data frame, you usually want to split it up into groups based on com-
    binations of variables in the data set. For d*ply you specify which variables (or functions
    of variables) to use. These variables are specified in a special way to highlight that they are
    Split
    plyr

    View full-size slide

  45. my.function<-function(subset.data){!
    ! ! ! results<-do.something(subset.data)!
    return(data.frame(results))}!
    !
    my.function can produce as many rows as subset.data (transform)
    or fewer rows than subset.data (summarize)
    !
    returned.results<-ddply(.data=data,!
    .variable=c("variable1", "variable2”),!
    ! ! my.function(subset.data))!
    !
    !
    How it works
    Warning: idiosyncrasies
    present
    plyr

    View full-size slide

  46. Example 1
    ›  Calculate the mean of each measure for
    each species using the molten data set
    molten.means<-ddply(.data=molten.iris,!
    !.variables=c("Species", "measure"),!
    function(subset.data) data.frame(mean=mean(subset.data$value)))  
    plyr

    View full-size slide

  47. Example 3
    ›  Slope of width on length
    plyr
    length.on.width.slope<-function(subset.data){
    with(subset.data,{
    slope.sepal<-lm(Sepal.Width~Sepal.Length)$coefficients[2]
    slope.petal<-lm(Petal.Width~Petal.Length)$coefficients[2]
    return(data.frame(slope.sepal=slope.sepal,
    slope.petal=slope.petal))
    })
    }
    iris.slopes<-ddply(.data=iris,
    .variables="Species",
    function(x)length.on.width.slope(x))

    View full-size slide

  48. Your turn
    ›  change functions
    ›  sd, length
    ›  range=max()-min()
    ›  apply to other data
    ›  simesants, rats, iris, sipoo
    plyr

    View full-size slide

  49. You
    ›  What was most interesting/useful?
    ›  What do you still need to
    ›  find it easy to plot in R?
    ›  to have fun using R?
    ›  Please comment on our website on
    the MBSU page
    ›  http://zerotorhero.wordpress.com/
    2012/12/14/mbsu/

    View full-size slide

  50. Acknowledgements
    ›  Reshape, plyr and ggplot2 are all brought to you on
    GitHub by:
    ›  Hadley Wickham
    ›  had.co.nz
    Wickham, H. (2011). "The split-apply-
    combine strategy for data analysis."
    Journal of Statis.
    Wickham, H. (2010). "A layered
    grammar of graphics." Journal of
    Computational and Graphical
    Statistics 19(1): 3-28.

    View full-size slide

  51. Acknowledgements
    ›  Some material presented was first
    produced by Eric Pederson

    View full-size slide