Slide 1

Slide 1 text

Data Visualization with R & ggplot2 Karthik Ram April 11, 2013 Data Visualization with R & ggplot2 Karthik Ram

Slide 2

Slide 2 text

Download this PDF github.com/karthikram/ggplot-lecture https://speakerdeck.com/karthik/ Data Visualization with R & ggplot2 Karthik Ram

Slide 3

Slide 3 text

Some housekeeping Install some packages (make sure you also have recent copies of reshape2 and plyr) install.packages("ggplot2", dependencies = TRUE) Data Visualization with R & ggplot2 Karthik Ram

Slide 4

Slide 4 text

Base graphics • Ugly, laborious, and verbose • There are better ways to describe statistical visualizations. Data Visualization with R & ggplot2 Karthik Ram

Slide 5

Slide 5 text

Why ggplot2? • Follows a grammar, just like any language. • It defines basic components that make up a sentence. In this case, the grammar defines components in a plot. • Grammar of graphics originally coined by Lee Wilkinson Data Visualization with R & ggplot2 Karthik Ram

Slide 6

Slide 6 text

Why ggplot2? • Supports a continuum of expertise. • Get started right away but with practice you can effortless build complex, publication quality figures. Data Visualization with R & ggplot2 Karthik Ram

Slide 7

Slide 7 text

Section 1 Basics Data Visualization with R & ggplot2 Karthik Ram

Slide 8

Slide 8 text

Some terminology • ggplot - The main function where you specify the dataset and variables to plot • geoms - geometric objects • geom point(), geom bar(), geom density(), geom line(), geom area() • aes - aesthetics • shape, transparency (alpha), color, fill, linetype. • scales Define how your data will be plotted • continuous, discrete, log Data Visualization with R & ggplot2 Karthik Ram

Slide 9

Slide 9 text

Section 2 Assembling your first ggplot Data Visualization with R & ggplot2 Karthik Ram

Slide 10

Slide 10 text

The iris dataset head(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 .2 setosa ## 2 4.9 3. 1.4 .2 setosa ## 3 4.7 3.2 1.3 .2 setosa ## 4 4.6 3.1 1.5 .2 setosa ## 5 5. 3.6 1.4 .2 setosa ## 6 5.4 3.9 1.7 .4 setosa Data Visualization with R & ggplot2 Karthik Ram

Slide 11

Slide 11 text

Let’s try an example ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Data Visualization with R & ggplot2 Karthik Ram

Slide 12

Slide 12 text

Basic structure ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) myplot + geom_point() • Specify the data and variables inside the ggplot function. • Anything else that goes in here becomes a global setting. • Then add layers of geometric objects, statistical models, and panels. Data Visualization with R & ggplot2 Karthik Ram

Slide 13

Slide 13 text

Quick note • Never use qplot - short for quick plot. • You’ll end up unlearning and relearning a good bit. Data Visualization with R & ggplot2 Karthik Ram

Slide 14

Slide 14 text

Increase the size of points ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(size = 3) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Data Visualization with R & ggplot2 Karthik Ram

Slide 15

Slide 15 text

Add some color ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 3) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Species q q q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 16

Slide 16 text

Differentiate points by shape ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(aes(shape = Species), size = 3) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Species q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 17

Slide 17 text

Exercise 1 # Make a small sample of the diamonds dataset d2 <- diamonds[sample(1:dim(diamonds)[1], 1 ), ] Then generate this plot below. q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q 0 5000 10000 15000 0.5 1.0 1.5 2.0 2.5 carat price color q q q q q q q D E F G H I J Data Visualization with R & ggplot2 Karthik Ram

Slide 18

Slide 18 text

Section 3 Box plots Data Visualization with R & ggplot2 Karthik Ram

Slide 19

Slide 19 text

See ?geom boxplot for list of options library(MASS) ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot() q q 1000 2000 3000 4000 5000 1 2 3 factor(race) bwt Data Visualization with R & ggplot2 Karthik Ram

Slide 20

Slide 20 text

Section 4 Histograms Data Visualization with R & ggplot2 Karthik Ram

Slide 21

Slide 21 text

See ?geom histogram for list of options h <- ggplot(faithful, aes(x = waiting)) h + geom_histogram(binwidth = 3 , colour = "black") 0 50 100 150 0 50 100 150 waiting count Data Visualization with R & ggplot2 Karthik Ram

Slide 22

Slide 22 text

h <- ggplot(faithful, aes(x = waiting)) h + geom_histogram(binwidth = 8, fill = "steelblue", colour = "black") 0 20 40 60 30 50 70 90 waiting count Data Visualization with R & ggplot2 Karthik Ram

Slide 23

Slide 23 text

Section 5 Line plots Data Visualization with R & ggplot2 Karthik Ram

Slide 24

Slide 24 text

climate <- read.csv("climate.csv", header = T) ggplot(climate, aes(Year, Anomaly1 y)) + geom_line() 0.0 0.5 1920 1950 1980 Year Anomaly10y climate <- read.csv(text = RCurl::getURL(https://raw.github.com/karthikram/ggplot-lecture/master/climate.csv)) Data Visualization with R & ggplot2 Karthik Ram

Slide 25

Slide 25 text

We can also plot confidence regions ggplot(climate, aes(Year, Anomaly1 y)) + geom_ribbon(aes(ymin = Anomaly1 y - Unc1 y, ymax = Anomaly1 y + Unc1 y), fill = "blue", alpha = .1) + geom_line(color = "steelblue") 0.0 0.5 1920 1950 1980 Year Anomaly10y Data Visualization with R & ggplot2 Karthik Ram

Slide 26

Slide 26 text

Exercise 2 • Modify the previous plot and change it such that there are three lines instead of one with a confidence band. 0.0 0.5 1920 1950 1980 Year Anomaly10y Data Visualization with R & ggplot2 Karthik Ram

Slide 27

Slide 27 text

Section 6 Bar plots Data Visualization with R & ggplot2 Karthik Ram

Slide 28

Slide 28 text

ggplot(iris, aes(Species, Sepal.Length)) + geom_bar(stat = "identity") 0 100 200 300 setosa versicolor virginica Species Sepal.Length Data Visualization with R & ggplot2 Karthik Ram

Slide 29

Slide 29 text

df <- melt(iris, id.vars = "Species") ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity") 0 250 500 750 setosa versicolor virginica Species value variable Sepal.Length Sepal.Width Petal.Length Petal.Width Data Visualization with R & ggplot2 Karthik Ram

Slide 30

Slide 30 text

Section 7 plyr and reshape are key for using R Data Visualization with R & ggplot2 Karthik Ram

Slide 31

Slide 31 text

plyr and reshape These two packages are the swiss army knives of R. • plyr 1 ddply 2 llply 3 join • reshape. 1 melt 2 dcast 3 acast Data Visualization with R & ggplot2 Karthik Ram

Slide 32

Slide 32 text

iris[1:2, ] ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 .2 setosa ## 2 4.9 3. 1.4 .2 setosa df <- melt(iris, id.vars = "Species") df[1:2, ] ## Species variable value ## 1 setosa Sepal.Length 5.1 ## 2 setosa Sepal.Length 4.9 Data Visualization with R & ggplot2 Karthik Ram

Slide 33

Slide 33 text

ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity", position = "dodge") 0 2 4 6 8 setosa versicolor virginica Species value variable Sepal.Length Sepal.Width Petal.Length Petal.Width Data Visualization with R & ggplot2 Karthik Ram

Slide 34

Slide 34 text

Exercise 3 Using the d2 dataset you created earlier, generate this plot below. Take a quick look at the data first to see if it needs to be binned. 0 25 50 75 100 I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IF clarity count cut Fair Good Very Good Premium Ideal Data Visualization with R & ggplot2 Karthik Ram

Slide 35

Slide 35 text

Exercise 4 • Using the climate dataset, create a new variable called sign. Make it logical (true/false) based on the sign of Anomaly10y. • Plot a bar plot and use sign variable as the fill. 0.0 0.5 1920 1950 1980 Year Anomaly10y sign FALSE TRUE Data Visualization with R & ggplot2 Karthik Ram

Slide 36

Slide 36 text

Section 8 Density Plots Data Visualization with R & ggplot2 Karthik Ram

Slide 37

Slide 37 text

Density plots ggplot(faithful, aes(waiting)) + geom_density() 0.00 0.01 0.02 0.03 50 60 70 80 90 waiting density Data Visualization with R & ggplot2 Karthik Ram

Slide 38

Slide 38 text

Density plots ggplot(faithful, aes(waiting)) + geom_density(fill = "blue", alpha = .1) 0.00 0.01 0.02 0.03 50 60 70 80 90 waiting density Data Visualization with R & ggplot2 Karthik Ram

Slide 39

Slide 39 text

ggplot(faithful, aes(waiting)) + geom_line(stat = "density") 0.01 0.02 0.03 50 60 70 80 90 waiting density Data Visualization with R & ggplot2 Karthik Ram

Slide 40

Slide 40 text

Section 9 Mapping Variables to colors Data Visualization with R & ggplot2 Karthik Ram

Slide 41

Slide 41 text

Colors # Map all points to one color aes(color = "black") # Or map the points to a variavble aes(color = variable) # Then add a scale for the colors. Below we manually # define colors but there are other ways (see next slide) scale_fill_manual(values = c("color1", "color2")) Data Visualization with R & ggplot2 Karthik Ram

Slide 42

Slide 42 text

The RColorBrewer package library(RColorBrewer) display.brewer.all() Data Visualization with R & ggplot2 Karthik Ram

Slide 43

Slide 43 text

Using a color brewer palette df <- melt(iris, id.vars = "Species") ggplot(df, aes(Species, value, fill = variable)) + geom_bar(stat = "identity", position = "dodge") + scale_fill_brewer(palette = "Set1") 0 2 4 6 8 setosa versicolor virginica Species value variable Sepal.Length Sepal.Width Petal.Length Petal.Width Data Visualization with R & ggplot2 Karthik Ram

Slide 44

Slide 44 text

Manual color scale ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid(Species ˜ .) + scale_color_manual(values = c("red", "green", "blue")) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 2.0 2.5 3.0 3.5 4.0 4.5 2.0 2.5 3.0 3.5 4.0 4.5 setosa versicolor virginica 5 6 7 8 Sepal.Length Sepal.Width Species q q q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 45

Slide 45 text

Refer to a color chart for beautiful visualizations http://tools.medialab.sciences-po.fr/iwanthue/ Data Visualization with R & ggplot2 Karthik Ram

Slide 46

Slide 46 text

Section 10 Faceting Data Visualization with R & ggplot2 Karthik Ram

Slide 47

Slide 47 text

Faceting along columns ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid(Species ˜ .) q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 2.0 2.5 3.0 3.5 4.0 4.5 2.0 2.5 3.0 3.5 4.0 4.5 setosa versicolor virginica 5 6 7 8 Sepal.Length Sepal.Width Species q q q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 48

Slide 48 text

and along rows ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_grid(. ˜ Species) setosa versicolor virginica q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 5 6 7 8 5 6 7 8 Sepal.Length Sepal.Width Species q q q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 49

Slide 49 text

or just wrap your panels ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point() + facet_wrap( ˜ Species) q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q setosa versicolor virginica 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 5 6 7 8 5 6 7 8 Sepal.Length Sepal.Width Species q q q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 50

Slide 50 text

Section 11 Adding smoothers Data Visualization with R & ggplot2 Karthik Ram

Slide 51

Slide 51 text

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(aes(shape = Species), size = 3) + geom_smooth(method = "lm") q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 Sepal.Length Sepal.Width Species q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 52

Slide 52 text

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(aes(shape = Species), size = 3) + geom_smooth(method = "lm") + facet_grid(. ˜ Species) setosa versicolor virginica q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 5 6 7 8 5 6 7 8 Sepal.Length Sepal.Width Species q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 53

Slide 53 text

Section 12 Themes Data Visualization with R & ggplot2 Karthik Ram

Slide 54

Slide 54 text

Adding themes Themes are a great way to define custom plots. +theme() # see ?theme() for more options Data Visualization with R & ggplot2 Karthik Ram

Slide 55

Slide 55 text

A themed plot ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 1.2, shape = 16) + facet_wrap( ˜ Species) + theme(legend.key = element_rect(fill = NA), legend.position = "bottom", strip.background = element_rect(fill = NA), axis.title.y = element_text(angle = )) Data Visualization with R & ggplot2 Karthik Ram

Slide 56

Slide 56 text

Adding themes q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q setosa versicolor virginica 2.0 2.5 3.0 3.5 4.0 4.5 5 6 7 8 5 6 7 8 5 6 7 8 Sepal.Length Sepal.Width Species q q q setosa versicolor virginica Data Visualization with R & ggplot2 Karthik Ram

Slide 57

Slide 57 text

ggthemes library install.packages("ggthemes") library(ggthemes) # Then add one of these themes to your plot +theme_stata() +theme_excel() +theme_wsj() +theme_solarized() Data Visualization with R & ggplot2 Karthik Ram

Slide 58

Slide 58 text

Section 13 Create functions to automate your plotting Data Visualization with R & ggplot2 Karthik Ram

Slide 59

Slide 59 text

Write functions for day to day plots my_custom_plot <- function(df, title = "", ...) { ggplot(df, ...) + ggtitle(title) + whatever geoms() + theme(...) } Then just call your function to generate a plot. It’s a lot easier to fix one function that do it over and over for many plots plot1 <- my_custom_plot(dataset1, title = "Figure 1") Data Visualization with R & ggplot2 Karthik Ram

Slide 60

Slide 60 text

Section 14 Scales Data Visualization with R & ggplot2 Karthik Ram

Slide 61

Slide 61 text

Commonly used scales scale_fill_discrete(), scale_colour_discrete() scale_fill_hue(), scale_color_hue() scale_fill_manual(), scale_color_manual() scale_fill_brewer(), scale_color_brewer() scale_linetype(), scale_shape_manual() Data Visualization with R & ggplot2 Karthik Ram

Slide 62

Slide 62 text

Adding a continuous scale library(MASS) ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot(width = .2) + scale_y_continuous(labels = (paste (1:4, " Kg")), breaks = seq(1 , 4 , by = 1 )) q q 1 Kg 2 Kg 3 Kg 4 Kg 1 2 3 factor(race) bwt Data Visualization with R & ggplot2 Karthik Ram

Slide 63

Slide 63 text

Another continuous scale with custom labels # Assign the plot to an object dd <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point(size = 4, shape = 16) + facet_grid(. ˜Species) # Now add a scale dd + scale_y_continuous(breaks = seq(2, 8, by = 1), labels = paste (2:8, " cm")) Data Visualization with R & ggplot2 Karthik Ram

Slide 64

Slide 64 text

gradients h + geom_histogram( aes(fill = ..count..), color="black") + scale_fill_gradient(low="green", high="red") 0 10 20 40 60 80 100 waiting count 0 5 10 15 20 25 count Data Visualization with R & ggplot2 Karthik Ram

Slide 65

Slide 65 text

Section 15 Publication quality figures Data Visualization with R & ggplot2 Karthik Ram

Slide 66

Slide 66 text

• If the plot is on your screen ggsave("˜/path/to/figure/filename.png") • If your plot is assigned to an object ggsave(plot1, file = "˜/path/to/figure/filename.png") • Specify a size ggsave(file = "/path/to/figure/filename.png", width = 6, height =4) • or any format (pdf, png, eps, svg, jpg) ggsave(file = "/path/to/figure/filename.eps") ggsave(file = "/path/to/figure/filename.jpg") ggsave(file = "/path/to/figure/filename.pdf") Data Visualization with R & ggplot2 Karthik Ram

Slide 67

Slide 67 text

Further help • You’ve just scratched the surface with ggplot2. • Practice • Read the docs (either locally in R or at http://docs.ggplot2.org/current/) • Work together Data Visualization with R & ggplot2 Karthik Ram