Slide 1

Slide 1 text

Hadley Wickham 
 @hadleywickham
 Chief Scientist, RStudio Solving 8 visualisation challenges with ggplot2 November 2016

Slide 2

Slide 2 text

http://fivethirtyeight.com/features/our-47-weirdest-charts-from-2015/

Slide 3

Slide 3 text

https://flowingdata.com/tag/upshot/

Slide 4

Slide 4 text

1Labelling plots Solved by Bob Rudis A problem ignored for too long

Slide 5

Slide 5 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 2 3 4 5 6 7 displ hwy class ● ● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv Two seaters (sports cars) are an exception because of their light weight Fuel efficiency generally decreases with engine size Data from fueleconomy.gov

Slide 6

Slide 6 text

ggplot(mpg, aes(displ, hwy)) + geom_point(aes(color = class)) + geom_smooth(se = FALSE, method = "loess") + labs( title = "Fuel efficiency generally ...", subtitle = "Two seaters (sports cars) ...", caption = "Data from fueleconomy.gov" ) Accessed with the labs() function

Slide 7

Slide 7 text

2 Axes

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Stages of visualisation system popularity 1. Someone used it and complained about a bug 2. Someone used it in an academic paper 3. Someone used it in a newspaper 4.Someone used it to commit academic fraud 5. So many people use it that google has autocompletes for bad graphics ideas

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Isenberg, Petra, et al. "A study on dual-scale data charts." IEEE Transactions on Visualization and Computer Graphics 17.12 (2011): 2469-2478. https://www.lri.fr/~isenberg/publications/papers/Isenberg_2011_ASO.pdf

Slide 14

Slide 14 text

But...

Slide 15

Slide 15 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 6 8 10 12 14 16 18 20 2 3 4 5 6 7 displ mpg l / 100 km

Slide 16

Slide 16 text

ggplot(mpg, aes(displ, hwy)) + geom_point() + scale_y_continuous( "mpg", sec.axis = sec_axis( ~ 235 / ., name = "l / 100 km", breaks = seq(2, 20, by = 2) ) ) Only 1-to-1 transformations are allowed function(x) { 235 / x }

Slide 17

Slide 17 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 6 8 10 12 14 16 18 20 2 3 4 5 6 7 displ mpg l / 100 km

Slide 18

Slide 18 text

Labelling 3
 data

Slide 19

Slide 19 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● corvette caravan 2wd altima forester awd toyota tacoma 4wd jetta new beetle 20 30 40 2 3 4 5 6 7 displ hwy class ● ● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv geom_text()

Slide 20

Slide 20 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● corvette caravan 2wd altima forester awd toyota tacoma 4wd jetta new beetle 20 30 40 2 3 4 5 6 7 displ hwy class ● ● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv geom_label()

Slide 21

Slide 21 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● corvette caravan 2wd altima forester awd toyota tacoma 4wd jetta new beetle 20 30 40 2 3 4 5 6 7 displ hwy class ● ● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv https://github.com/slowkow/ggrepel geom_label_repel()

Slide 22

Slide 22 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● corvette caravan 2wd altima forester awd toyota tacoma 4wd jetta new beetle 20 30 40 2 3 4 5 6 7 displ hwy class ● ● ● ● ● ● ● 2seater compact midsize minivan pickup subcompact suv dev version

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Two difference between a factor and a string: 1.Fixed set of possible values 2.Arbitrary order

Slide 25

Slide 25 text

relig <- gss_cat %>% group_by(relig) %>% summarise( tvhours = mean(tvhours, na.rm = TRUE), n = n() ) Some data from the general social survey

Slide 26

Slide 26 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● No answer Don't know Inter−nondenominational Native american Christian Orthodox−christian Moslem/islam Other eastern Hinduism Buddhism Other None Jewish Catholic Protestant 2 3 4 tvhours relig

Slide 27

Slide 27 text

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Other eastern Hinduism Buddhism Orthodox−christian Moslem/islam Jewish None No answer Other Christian Inter−nondenominational Catholic Protestant Native american Don't know 2 3 4 tvhours fct_reorder(relig, tvhours)

Slide 28

Slide 28 text

by_age <- gss_cat %>% filter(!is.na(age)) %>% group_by(age, marital) %>% count() %>% mutate(prop = n / sum(n)) You have the same problem with more dimensions

Slide 29

Slide 29 text

0.00 0.25 0.50 0.75 1.00 20 40 60 80 age prop marital No answer Never married Separated Divorced Widowed Married

Slide 30

Slide 30 text

0.00 0.25 0.50 0.75 1.00 20 40 60 80 age prop marital Widowed Married Divorced Never married No answer Separated

Slide 31

Slide 31 text

5 Missing values

Slide 32

Slide 32 text

An explicit missing value (NA) is the presence of an absence; 
 an implicit missing value is the absence of a presence.

Slide 33

Slide 33 text

Demo

Slide 34

Slide 34 text

6 Histograms

Slide 35

Slide 35 text

hist(1:4)

Slide 36

Slide 36 text

df <- tibble(x = 1:4) df %>% ggplot(aes(x)) + geom_histogram(binwidth = 1) Equivalent ggplot2 code is a little longer

Slide 37

Slide 37 text

0.00 0.25 0.50 0.75 1.00 1 2 3 4 x count (0.5, 1.5] (1.5, 2.5] (2.5, 3.5] (3.5, 4.5] Thanks to Randall Pruim

Slide 38

Slide 38 text

df %>% ggplot(aes(x)) + geom_histogram( binwidth = 1, boundary = 0 ) df %>% ggplot(aes(x)) + geom_histogram( binwidth = 1, boundary = 0, closed = "left" )

Slide 39

Slide 39 text

0.0 0.5 1.0 1.5 2.0 1 2 3 4 x count [1, 2] (2, 3] (3, 4]

Slide 40

Slide 40 text

0.0 0.5 1.0 1.5 2.0 1 2 3 4 x count (0.5, 1.5] [1, 2) [2, 3) [3, 4]

Slide 41

Slide 41 text

0.0 0.5 1.0 1.5 2.0 1 2 3 4 x count (0.5, 1.5] [0.99999, 1.99999) [1.99999, 2.99999) [2.99999, 4.00001]

Slide 42

Slide 42 text

7 Bar charts

Slide 43

Slide 43 text

0 20 40 60 2seater compact midsize minivan pickup subcompact suv class count ggplot(mpg, aes(class)) + geom_bar(colour = "white")

Slide 44

Slide 44 text

0 20 40 60 2seater compact midsize minivan pickup subcompact suv class count ggplot(mpg, aes(class, group = id)) + geom_bar(col = "white")

Slide 45

Slide 45 text

0 20 40 60 2seater compact midsize minivan pickup subcompact suv class count drv 4 f r ggplot(mpg, aes(class, group = id, fill = drv)) + geom_bar(col = "white")

Slide 46

Slide 46 text

0 20 40 60 2seater compact midsize minivan pickup subcompact suv class count drv 4 f r ggplot(mpg, aes(class, fill = drv)) + geom_bar(col = "white")

Slide 47

Slide 47 text

class_mpg <- mpg %>% group_by(class) %>% summarise( mean = mean(hwy), se = 1.96 * sd(hwy) / sqrt(n()) ) Another type of bar chart displays summaries

Slide 48

Slide 48 text

0 10 20 2seater compact midsize minivan pickup subcompact suv class mean ggplot(class_mpg, aes(class, mean)) + geom_bar(stat = "identity")

Slide 49

Slide 49 text

0 10 20 2seater compact midsize minivan pickup subcompact suv class mean ggplot(class_mpg, aes(class, mean)) + geom_col() # Thanks to Bob Rudis

Slide 50

Slide 50 text

● ● ● ● ● ● ● 20 24 28 2seater compact midsize minivan pickup subcompact suv class mean

Slide 51

Slide 51 text

● ● ● ● ● ● ● 15 20 25 30 2seater compact midsize minivan pickup subcompact suv class mean

Slide 52

Slide 52 text

8 ggplot2 extension 9 10 11

Slide 53

Slide 53 text

2.1.0 introduced a formal extension mechanism https://www.ggplot2-exts.org, by Daniel Emaasit

Slide 54

Slide 54 text

ggraph, by Thomas Lin Pedersen https://github.com/thomasp85/ggraph

Slide 55

Slide 55 text

ggseas by Peter Ellis https://github.com/ellisp/ggseas Uses X13-SEATS-ARIMA 
 in seasonal package

Slide 56

Slide 56 text

gganimate by David Robinson https://github.com/dgrtwo/gganimate

Slide 57

Slide 57 text

Conclusion

Slide 58

Slide 58 text

1Labelling plots Solved by Bob Rudis A problem ignored for too long

Slide 59

Slide 59 text

2 Axes

Slide 60

Slide 60 text

Labelling 3
 data

Slide 61

Slide 61 text

No content

Slide 62

Slide 62 text

5 Missing values

Slide 63

Slide 63 text

6 Histograms

Slide 64

Slide 64 text

7 Bar charts

Slide 65

Slide 65 text

8 ggplot2 extension 9 10 11

Slide 66

Slide 66 text

Many of the features I discussed here have been added in recent versions of ggplot2. 
 
 See the release notes for more detail.

Slide 67

Slide 67 text

http://ggplot2.tidyverse.org