Slide 1

Slide 1 text

Plotting in R using ggplot2 Etienne Low-Decarie material in part prepared by Eric Pederson

Slide 2

Slide 2 text

www.meetup.com/Montreal-R-User-Group/

Slide 3

Slide 3 text

http://www.codeschool.com/courses/try-r

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

You ›  Have you created a plot? ›  With what data? ›  What kind of plot? ›  Have plotted with R? ›  used ggplot?

Slide 6

Slide 6 text

Follow along ›  Code and HTML available at: ›  https://github.com/zeroto hero/MBSU ›  Recommendation ›  create your own new script ›  refer to provided code only if needed ›  avoid copy pasting or running the code directly from script ›  ggplot is also hosted on github ›  https://github.com/hadley/ggplot2

Slide 7

Slide 7 text

Required packages ›  install.packages(ggplot2) ›  require(ggplot2)

Slide 8

Slide 8 text

Outline ›  your first r plot ›  basic scatter plot ›  Exercise 1 ›  grammar of graphics ›  more advanced plots ›  Available plot elements and when to use them ›  Exercise 2 ›  saving a plot ›  fine tuning your plot ›  themes ›  we help you plot your data

Slide 9

Slide 9 text

ggplot ›  plotting function : “qplot” (quick plot) ›  ?qplot ›  arguments! ›  data! ›  x! ›  y! ›  …! Basic scatter plot

Slide 10

Slide 10 text

ggplot ›  look at built in “iris” data ›  ?iris! ›  head(iris)! ›  str(iris)! ›  names(iris)! Basic scatter plot

Slide 11

Slide 11 text

ggplot Basic scatter plot qplot(data=iris,! x=Sepal.Length,! y=Sepal.Width)! ! ! ! !

Slide 12

Slide 12 text

ggplot Basic scatter plot (categorical) qplot(data=iris,! x=Species,! !y=Sepal.Width)

Slide 13

Slide 13 text

ggplot ›  ?qplot ›  other arguments! ›  xlab! ›  ylab! ›  main! ›  log! ›  …! Less basic scatter plot

Slide 14

Slide 14 text

ggplot Scatter plot qplot(data=iris,! x=Sepal.Length,! xlab="Sepal Width (mm)",! y=Sepal.Width,! !ylab="Sepal Length (mm)",! !main="Sepal dimensions”)!

Slide 15

Slide 15 text

ggplot Exercise 1 ›  produce a basic plot with build in data! ›  CO2! ›  ?CO2! ›  BOD! ›  data()!

Slide 16

Slide 16 text

6 H. WICKHAM Figure 1. Graphics objects produced by (from left to right): geometric objects, scales and coordinate system, plot annotations. ggplot 1.  a graphic is made of elements (layers) ›  data ›  aesthetics (aes) ›  transformation ›  geoms (geometric objects) ›  axis (coordinate system) ›  scales Grammar of graphics (gg)

Slide 17

Slide 17 text

ggplot ›  Aesthetics (aes) make data visible: ›  x,y : position along the x and y axis ›  colour: the colour of the point ›  group: what group a point belongs to ›  shape: the figure used to plot a point ›  linetype: the type of line used (solid, dashed, etc) ›  size: the size of the point or line ›  alpha: the transparency of the point Grammar of graphics (gg)

Slide 18

Slide 18 text

ggplot ›  geometric objects(geoms) ›  point: scatterplot ›  line: line plot, where lines connect points by increasing x value ›  path: line plot, where lines connect points in sequence of appearance ›  boxplot: box-and-whisker plots, for catagorical y data ›  bar: barplots ›  histogram: histograms (for 1-dimensional data) Grammar of graphics (gg)

Slide 19

Slide 19 text

ggplot ›  Aesthetics (aes) make data visible: ›  x,y : position along the x and y axis ›  colour: the colour of the point ›  group: what group a point belongs to ›  shape: the figure used to plot a point ›  linetype: the type of line used (solid, dashed, etc) ›  size: the size of the point or line ›  alpha: the transparency of the point Grammar of graphics (gg)

Slide 20

Slide 20 text

ggplot Grammar of graphics (gg)

Slide 21

Slide 21 text

ggplot 2. editing an element produces a new graph ›  just change the coordinate system! Grammar of graphics (gg) A LAYERED GRAMMAR OF GRAPHICS 23 Figure 16. Bar chart (left) and equivalent Coxcomb plot (right) of clarity distribution. The Coxcomb plot is a bar chart in polar coordinates. Note that the categories abut in the Coxcomb, but are separated in the bar chart: this is an example of a graphical convention that differs in different coordinate systems.

Slide 22

Slide 22 text

ggplot Grammar of graphics (gg)

Slide 23

Slide 23 text

ggplot 1.  create a simple plot object ›  plot.object<-qplot()! 2.  add graphical layers/complexity ›  plot.object<-plot.object+layer()! ›  repeat step 2 until satisfied! 3.  print your object to screen (or to graphical device) ›  print(plot.object)! How it works

Slide 24

Slide 24 text

ggplot Scatter plot as an R object basic.plot<-qplot(data=iris,! ! ! !x=Sepal.Length,! ! ! !xlab="Sepal Width (mm)",! ! ! !y=Sepal.Width,! ! ! !ylab="Sepal Length (mm)",! ! ! !main="Sepal dimensions”)! ! print(basic.plot)!

Slide 25

Slide 25 text

ggplot Basic scatter plot (categorical) categorical.plot<-qplot(data=iris,! x=Species,! !y=Sepal.Width)! print(categorical.plot)

Slide 26

Slide 26 text

ggplot Scatter plot with colour, shape and transparency ›  Add aesthetics basic.plot<-qplot(data=iris,! x=Sepal.Length,! xlab="Sepal Width (mm)",! y=Sepal.Width,! ylab="Sepal Length (mm)",! main="Sepal dimensions",! colour=Species,! shape=Species,! alpha=I(0.5))! ! print(basic.p! ! ! !print(basic.plot)!

Slide 27

Slide 27 text

ggplot Scatter plot with linear regression ›  Add a geom (eg. linear smooth) plot.with.linear.smooth<-basic.plot+! ! ! !geom_smooth(method="lm", se=F)! print(plot.with.linear.smooth)!

Slide 28

Slide 28 text

ggplot Exercise 2 ›  produce a colorful plot containing linear regressions with build in data! ›  CO2! ›  ?CO2! ›  msleep! ›  ?msleep! ›  OrchardSprays! ›  data()!

Slide 29

Slide 29 text

ggplot Changing and adding geoms print(categorical.plot)! ! print(categorical.plot+! ! !geom_boxplot())! ! categorical.plot<-qplot(data=iris,! x=Species,! !y=Sepal.Width,! !geom=c(“boxplot”))! print(categorical.plot) !

Slide 30

Slide 30 text

ggplot Basic plot 2 CO2.plot<-qplot(data=CO2,! x=conc,! y=uptake,! colour=Treatment)! ! print(CO2.plot)!

Slide 31

Slide 31 text

ggplot Facets plot.object<-plot.object + facet_grid(rows~columns)! ! CO2.plot<-CO2.plot+facet_grid(.~Type)! print(CO2.plot)!

Slide 32

Slide 32 text

ggplot Groups ›  add a geom (line) print(CO2.plot+geom_line())!

Slide 33

Slide 33 text

ggplot Groups ›  Specify groups CO2.plot<-CO2.plot+geom_line(aes(group=Plant))! print(CO2.plot)!

Slide 34

Slide 34 text

Available elements ggplot Geoms Geoms, short for geometric objects, describe the type of plot you will produce. geom_abline Line specified by slope and intercept. geom_area Area plot. geom_bar Bars, rectangles with bases on x-axis geom_bin2d Add heatmap of 2d bin counts. geom_blank Blank, draws nothing. geom_boxplot Box and whiskers plot. geom_contour Display contours of a 3d surface in 2d. geom_crossbar Hollow bar with middle indicated by horizontal line. geom_density Display a smooth density estimate. geom_density2d Contours from a 2d density estimate. geom_dotplot Dot plot geom_errorbar Error bars. geom_errorbarh Horizontal error bars geom_freqpoly Frequency polygon. geom_hex Hexagon bining. geom_histogram Histogram geom_hline Horizontal line. geom_jitter Points, jittered to reduce overplotting. geom_line Connect observations, ordered by x value. geom_linerange An interval represented by a vertical line. geom_map Polygons from a reference map. geom_path Connect observations in original order geom_point Points, as for a scatterplot geom_pointrange Depends: stats, methods Imports: plyr, digest, grid, gtable, reshape2, scales, proto, MASS Suggests: quantreg, Hmisc, mapproj, maps, hexbin, maptools, multcomp, nlme, testthat Extends: http://docs.ggplot2.org! ! for even more! help(package=ggplot2)!

Slide 35

Slide 35 text

Exercise 3 ggplot ›  Explore geoms and other plot elements with the data you have used ›  msleep! ›  ?msleep! ›  OrchardSprays! ›  data()!

Slide 36

Slide 36 text

ggplot Save plots ›  in RStudio

Slide 37

Slide 37 text

ggplot Save plots ›  in a script pdf(“./plots/todays_plots.pdf”) print(basic.plot) print(plot.with.linear.smooth) print(categorical.plot) print(CO2.plot)! graphics.off()! ! ›  other methods ›  ?ggsave ›  ?jpeg

Slide 38

Slide 38 text

ggplot Fine tuning: scales CO2.plot +scale_colour_manual(values=c("nonchilled"="red" ,"chilled"="blue"))! ! CO2.plot+! scale_y_continuous(name = "CO2 uptake rate",! !breaks = seq(5,50, by= 10),! !labels = seq(5,50, by= 10), trans="log10") ! ! !

Slide 39

Slide 39 text

ggplot Fine tuning: themes ›  theme_set(theme()) ›  or plot+theme() ›  themes ›  theme_bw() ›  theme_grey() ›  edit themes ›  mytheme <- theme_grey() + theme(plot.title = element_text(colour = "red")) ›  p + mytheme

Slide 40

Slide 40 text

ggplot base R plotting ›  qplot is not the only way ›  ?plot ›  has many defaults for different object types ›  similar to qplot plot(iris) lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) plot(lm.SR)

Slide 41

Slide 41 text

ID variable Factor Measured value ID 1 Level 1 Measured value ID 1 Level 2 Measured value ID 2 Level 1 Measured value ID 2 Level 2 Measured value ID variable Level 1 Level 2 ID 1 Measured value Measured value ID 2 Measured value Measured value Wide Long reshape ggplot likes it long…is you data wide?

Slide 42

Slide 42 text

Melt: go long library(reshape)! ! molten.data<-melt(data,   id.vars=ls("id.var.1", "id.var.2"),   measure.vars=ls("measure.vars", "measure.vars"),   variable_name = "variable")! !       reshape head(molten.iris)   head(iris)  

Slide 43

Slide 43 text

ggplot Working with you data ›  Let us get you at least one plot out of your data ›  Don’t yet have data? ›  Find some ›  google ›  http://datadryad.org ›  Create some ›  Survey your neighbours ›  Save your plots to a PDF

Slide 44

Slide 44 text

plyr ›  library(plyr) plyr

Slide 45

Slide 45 text

Split-Apply-Combine ›  Equivalent ›  SQL GROUP BY ›  Pivot Tables (Excel, SPSS, …) ›  Split ›  Define a subset of your data ›  Apply ›  Do anything to this subset ›  calculation, modeling, simulations, plotting ›  Combine ›  Repeat this for all subsets ›  collect the results Journal of Statistical Software 7 2 1 1 2 1,2 Figure 1: The three ways to split up a 2d matrix, labelled above by the dimensions that they slice. Original matrix shown at top left, with dimensions labelled. A single piece under each splitting scheme is colored blue. 3 2 1 1 2 3 1,2 1,3 2,3 1,2,3 Figure 2: The seven ways to split up a 3d array, labelled above by the dimensions that they slice up. Original array shown at top left, with dimensions labelled. Blue indicates a single piece of the output. m*ply() takes a matrix, list-array, or data frame, splits it up by rows and calls the processing function supplying each piece as its parameters. Figure 3 shows how you might use this to draw random numbers from normal distributions with varying parameters. Input: Data frame (d*ply) When operating on a data frame, you usually want to split it up into groups based on com- binations of variables in the data set. For d*ply you specify which variables (or functions of variables) to use. These variables are specified in a special way to highlight that they are Split plyr

Slide 46

Slide 46 text

my.function<-function(subset.data){! ! ! ! results<-do.something(subset.data)! return(data.frame(results))}! ! my.function can produce as many rows as subset.data (transform) or fewer rows than subset.data (summarize) ! returned.results<-ddply(.data=data,! .variable=c("variable1", "variable2”),! ! ! my.function(subset.data))! ! ! How it works Warning: idiosyncrasies present plyr

Slide 47

Slide 47 text

Example 1 ›  Calculate the mean of each measure for each species using the molten data set molten.means<-ddply(.data=molten.iris,! !.variables=c("Species", "measure"),! function(subset.data) data.frame(mean=mean(subset.data$value)))   plyr

Slide 48

Slide 48 text

Example 3 ›  Slope of width on length plyr length.on.width.slope<-function(subset.data){ with(subset.data,{ slope.sepal<-lm(Sepal.Width~Sepal.Length)$coefficients[2] slope.petal<-lm(Petal.Width~Petal.Length)$coefficients[2] return(data.frame(slope.sepal=slope.sepal, slope.petal=slope.petal)) }) } iris.slopes<-ddply(.data=iris, .variables="Species", function(x)length.on.width.slope(x))

Slide 49

Slide 49 text

Your turn ›  change functions ›  sd, length ›  range=max()-min() ›  apply to other data ›  simesants, rats, iris, sipoo plyr

Slide 50

Slide 50 text

You ›  What was most interesting/useful? ›  What do you still need to ›  find it easy to plot in R? ›  to have fun using R? ›  Please comment on our website on the MBSU page ›  http://zerotorhero.wordpress.com/ 2012/12/14/mbsu/

Slide 51

Slide 51 text

Acknowledgements ›  Reshape, plyr and ggplot2 are all brought to you on GitHub by: ›  Hadley Wickham ›  had.co.nz Wickham, H. (2011). "The split-apply- combine strategy for data analysis." Journal of Statis. Wickham, H. (2010). "A layered grammar of graphics." Journal of Computational and Graphical Statistics 19(1): 3-28.

Slide 52

Slide 52 text

Acknowledgements ›  Some material presented was first produced by Eric Pederson