Slide 1

Slide 1 text

How to make graphs in R From default to publication-quality graphics @anchu 2th Hanoi UseRs Meetup

Slide 2

Slide 2 text

Thank you for coming!

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

About me

Slide 5

Slide 5 text

Download this PDF https://speakerdeck.com/chuvanan

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

This talk is . . .

Slide 8

Slide 8 text

This talk is . . . not really for beginners

Slide 9

Slide 9 text

This talk is . . . not really for beginners not really for experts either

Slide 10

Slide 10 text

This talk is . . . not really for beginners not really for experts either not a comprehensive treatment of data visualization

Slide 11

Slide 11 text

This talk is . . . not really for beginners not really for experts either not a comprehensive treatment of data visualization not about design

Slide 12

Slide 12 text

This talk is . . . not really for beginners not really for experts either not a comprehensive treatment of data visualization not about design not about R’s ggplot2 (sorry ggplot2’s folks)

Slide 13

Slide 13 text

This talk is . . .

Slide 14

Slide 14 text

This talk is . . . a gentle introduction to graphing data with R

Slide 15

Slide 15 text

This talk is . . . a gentle introduction to graphing data with R (mostly) about statistical graphics

Slide 16

Slide 16 text

This talk is . . . a gentle introduction to graphing data with R (mostly) about statistical graphics (entirely) about R’s base graphics

Slide 17

Slide 17 text

This talk is . . . a gentle introduction to graphing data with R (mostly) about statistical graphics (entirely) about R’s base graphics (hopefully) helping you to improve your visualization skills

Slide 18

Slide 18 text

This talk is . . . a gentle introduction to graphing data with R (mostly) about statistical graphics (entirely) about R’s base graphics (hopefully) helping you to improve your visualization skills . . . . (hopefully) fun and entertaining

Slide 19

Slide 19 text

Statistical Graphics

Slide 20

Slide 20 text

Some (frequent) encounters from the web/publications

Slide 21

Slide 21 text

Some (frequent) encounters from the web/publications

Slide 22

Slide 22 text

Some (frequent) encounters from the web/publications

Slide 23

Slide 23 text

Some (frequent) encounters from the web/publications

Slide 24

Slide 24 text

Some (frequent) encounters from the web/publications

Slide 25

Slide 25 text

We can do better

Slide 26

Slide 26 text

We can do better with R

Slide 27

Slide 27 text

Basic principles of visualization

Slide 28

Slide 28 text

Visualization components Visual encodings Coordinate system Scale Context (Data Points, Nathan Yau)

Slide 29

Slide 29 text

Decomposing visualization 5 10 15 20 25 0 20 40 60 80 100 120 Speed (mph) Stopping distance (ft) Source: Ezekiel, M. (1930). Wiley. Drivers: Keep your speed down! Higher speed, longer distance taken to stop

Slide 30

Slide 30 text

Visual Cues Coordinate System 5 10 15 20 25 0 20 40 60 80 100 120 Scale Speed (mph) Stopping distance (ft) Source: Ezekiel, M. (1930). Wiley. Drivers: Keep your speed down! Higher speed, longer distance taken to stop Annotation

Slide 31

Slide 31 text

Visual encodings Mapping data into visual properties

Slide 32

Slide 32 text

Different methods of encoding the same data set A B C D E 0 10 20 30 A B C D E 0 5 15 25 A B C D E 15 20 25 30 A B C D E 15 25 A B C D E A B C D E

Slide 33

Slide 33 text

Posi�on Length Angle Direc�on Shapes Area Color hue Color satura�on Shading

Slide 34

Slide 34 text

How to Choose Appropriate Visual Encodings

Slide 35

Slide 35 text

What is your question?

Slide 36

Slide 36 text

Graphical perception

Slide 37

Slide 37 text

Graphical perception

Slide 38

Slide 38 text

Graphical perception 1. Position along a common scale 2. Position on identical but nonaligned scales 3. Length 4. Angle. Slope 5. Direction 6. Area 7. Volume. Density. Color saturation 8. Color hue (W. Clevelan and R. Mcgrill, 1985)

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Example

Slide 41

Slide 41 text

Redesign Q1/ 2010 Q1/ '11 Q1/ '12 Q1/ '13 Q1/ '14 Q1/ '15 Q1/ '16 Q1/ '17 0% 5% 10% 15% 4.96% CPI quý I/2017 tăng cao nhất trong 3 năm qua Đo bằng sự thay đổi so vơí cùng kỳ năm trước (CPI) Nguồn: GSO

Slide 42

Slide 42 text

Coordinate system x y Cartesian Polar Geographic r

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

No content

Slide 45

Slide 45 text

Scales

Slide 46

Slide 46 text

1 2 3 4 5 Linear 1 10 100 1000 10000 Logarithmic A B C D E Categorical strongly disagree disagree neutral agree strongly agree Ordinal 0% 25% 50% 75% 100% Percent Jan Feb Mar Apr May Time

Slide 47

Slide 47 text

Example

Slide 48

Slide 48 text

87 88 89 90 91 92 93 94 Truncated axis Full-scale axis 0 20 40 60 80 100

Slide 49

Slide 49 text

Example

Slide 50

Slide 50 text

10k 12k 14k 16k 18k None-zero baseline 0 5k 10k 15k 20k Zero baseline

Slide 51

Slide 51 text

20/09/16 05/10/16 20/10/16 04/11/16 19/11/16 20/12/16 18/02/17 06/03/17 21/03/17 05/04/17 16000 16500 17000 17500 18000 18500 19000 Biểu đồ giá xăng (VND)

Slide 52

Slide 52 text

Annotation “The annotation layer is the most important thing we do. . . otherwise it’s a case of here it is, you go figure it out.” (Amanda Cox, Graphics Editor, NYTimes)

Slide 53

Slide 53 text

Annotation Title Subtitle Axix labels Source User guide

Slide 54

Slide 54 text

Example

Slide 55

Slide 55 text

Example

Slide 56

Slide 56 text

The R Plotting System

Slide 57

Slide 57 text

R graphics system Cairo tkzDevice JavaGD lattice ggplot2 vcd grImport gridBase maps diagram plotrix gplots pixmap grid graphics grDevices

Slide 58

Slide 58 text

R graphics system grDevices: graphics engine which provides facilities such as selecting colors, fonts and output formats. Two (largely incompatible) packages built on top of the graphics engine: graphics: (aka base graphics) S’s legacy. graphics provides both high-level and low-level functions for creating plots. grid: unique to R. grid offers low-level tools used for building ggplot2 and lattice.

Slide 59

Slide 59 text

Base graphics vs ggplot2

Slide 60

Slide 60 text

Base graphics vs ggplot2 base graphics: Accpet many types of input (vectors, data frames, matrix,...) Quicker to get going (lots of single-call functions) Better performance (speedier) Easier to customize (mimics ’painters model’: output occurs in steps) Awkward workflow (sometimes) ggplot2: Dataframe-centerd Steeper learning curve (to master conceptual framework) Better default (generally) Highly extensible, more efficient in the long run (thanks to paradigm-based design) Seamlessly integration with tidyverse

Slide 61

Slide 61 text

Why I choose base graphics Both packages are super! But, base graphics might be better for beginners to make graphs quickly with as little as mental efforts.

Slide 62

Slide 62 text

Graphical Functions

Slide 63

Slide 63 text

The base graphics model First, call a high-level funtions to make a complete plot. Then, call low-level functions to add more output (if necessary).

Slide 64

Slide 64 text

plot(x = pressure$temperature, y = pressure$pressure, xlab = "temperature", ylab = "pressure") 0 50 100 150 200 250 300 350 0 200 400 600 800 temperature pressure

Slide 65

Slide 65 text

plot(x = pressure$temperature, y = pressure$pressure, ann = F) text(150, 600, "Pressure (mmHg)\nvs.\nTemperature (Celsius)") 0 50 100 150 200 250 300 350 0 200 400 600 800 Pressure (mmHg) vs. Temperature (Celsius)

Slide 66

Slide 66 text

High-level plotting functions plot() : scatterplot (lines plot) barplot() : bar chart pie() : pie chart boxplot : boxplot hist() : histogram stripchart() : 1-d scatterplot

Slide 67

Slide 67 text

Visualizing distributions Demo visualize_distributions.R

Slide 68

Slide 68 text

Revealing changes Demo reveal_changes.R:

Slide 69

Slide 69 text

Showing relationships Demo show_relationships.R

Slide 70

Slide 70 text

Making comparisions Demo make_comparisions.R

Slide 71

Slide 71 text

Some arguments are accepted by many high-level functions: xlab: x-axis label ylab: y-axis label main: plot title sub: plot subtitle col: color lty: line type lwd: line width xlim: x-axis scale limits ylim: y-axis scale limits

Slide 72

Slide 72 text

Going beyond the default

Slide 73

Slide 73 text

The layout of graphics Plot Region Margin 1 Margin 2 Margin 3 Margin 4

Slide 74

Slide 74 text

Control the margins par() is used to control low-level graphics by setting graphical parameters set the margin sizes in inches par(mai = c(2, 2, 1, 1)) set the margin sizes in lines of text par(mar = c(4, 4, 2, 2))

Slide 75

Slide 75 text

line 1 line 2 line 3 line 4 line 1 line 2 line 3 line 4 line 1 line 2 line 1 line 2

Slide 76

Slide 76 text

Multifigure layouts par can be used to set up multiple figures on the page. row-by-row: par(mfrow = c(2, 2)) column-by-column: par(mfcol = c(2, 2))

Slide 77

Slide 77 text

figure 1 figure 2 figure 3 figure 4

Slide 78

Slide 78 text

Draw axes axis() can be used to draw axes at any of the four side of a plot. This is a default call: axis(side) Possible values of side include: side = 1: bottom side = 2: left side = 3: top side = 4: right

Slide 79

Slide 79 text

axis() example par(mfrow = c(2, 2)) plot(cars, ann = F, axes = F) axis(1) plot(cars, ann = F, axes = F) axis(2) plot(cars, ann = F, axes = F) axis(3) plot(cars, ann = F, axes = F) axis(4)

Slide 80

Slide 80 text

5 10 15 20 25 5 10 15 20 25 0 20 40 60 80 120 0 20 40 60 80 120

Slide 81

Slide 81 text

Axis customization axis(at = , labels = , las = , line = , cex.axis = , col.axis = ) at: tick marks position labels: texts are to be made the tick marks las: labels rotation (0, 1, 2, 3) line: the number of lines into the margin at which the axis line will be drawn cex.axis and col.axis: size and color of axis labels

Slide 82

Slide 82 text

axis() example (cont) par(mfrow = c(2, 2), mar = c(2, 2, 2, 2)) plot(cars, ann = F, axes = F) axis(1, at = c(4, 12, 15, 15.4, 19, 25), labels = c("min", "1st Qu. ", "median", "mean", "3rd Qu.", "ma plot(cars, ann = F, axes = F) axis(3, cex.axis = 2) plot(cars, ann = F, axes = F) axis(2, las = 1) plot(cars, ann = F, axes = F) axis(4, col.axis = "gray40")

Slide 83

Slide 83 text

min 1st Qu. 3rd Qu. max 5 10 15 20 25 0 20 40 60 80 100 120 0 20 40 60 80 120

Slide 84

Slide 84 text

Plot annotation mtext() function can be used to place labels in the margins of a plot. mtext(text = , side = ) side: on which side of the plot (1=b, 2=l, 3=t, 4=r)

Slide 85

Slide 85 text

mtext() example par(mar = c(4, 4, 4, 4)) plot(1:10, 1:10, ann = F) mtext("X-axis label", side = 1, line = 2) mtext("Y-axis label", side = 2, line = 2) mtext("Title", side = 3, line = 3) mtext("Subtitle", side = 3, line = 1.5) mtext("Source", side = 1, line = 3)

Slide 86

Slide 86 text

2 4 6 8 10 2 4 6 8 10 X−axis label Y−axis label Title Subtitle Source

Slide 87

Slide 87 text

Customizng plot annotation line: the number of lines into the margin at which the axis line will be drawn adj: adjustment direction [0, 1] outer: use outer margins if available (TRUE or FALSE) cex: size (expansion factor) col: color font: 1=regular, 2=bold, 3=itali, 4=bold-itatlic

Slide 88

Slide 88 text

Recap Setting margins by par(mar) Draw axes by axis() Add annotation by mtext()

Slide 89

Slide 89 text

Special topic: Color

Slide 90

Slide 90 text

Choosing color is hard

Slide 91

Slide 91 text

Choosing color is hard R has: 657 named colors (run: colors())

Slide 92

Slide 92 text

Choosing color is hard R has: 657 named colors (run: colors()) 7 default color sets (rainbow(), heat.colors(), terrain.colors(), topo.colors(), cm.colors(), gray.colors())

Slide 93

Slide 93 text

Choosing color is hard R has: 657 named colors (run: colors()) 7 default color sets (rainbow(), heat.colors(), terrain.colors(), topo.colors(), cm.colors(), gray.colors()) and a bunch of color packages (viridis, RColorBrewer, colorspace,. . . )

Slide 94

Slide 94 text

When choosing color: Above all, do no harm. (Edward R. Tufte)

Slide 95

Slide 95 text

Think carefully what you are using color for Fundamental use of color in visualization: to label (color as noun) to measure (color as quantity) to represent and imitate reality (color as representation) to decorate (color as beauty) (Edward R. Tufte)

Slide 96

Slide 96 text

8h 10h 12h 14h 16h 18h 20h 22h 24h more submissions less submissions rtLab R&A rtSolu�on When do you submit your daily report? A survey of RTA's employees Source: Form RTA_Daily_Report rtLab's guys did it pre�y early

Slide 97

Slide 97 text

Consult experts:

Slide 98

Slide 98 text

Consult experts:

Slide 99

Slide 99 text

Going beyond the default Demo customize_traditional_graphics.R

Slide 100

Slide 100 text

No content

Slide 101

Slide 101 text

Adding details with graphical primitives

Slide 102

Slide 102 text

Some functions for drawing basic graphical primitives points(): draw data symbols at (x, y) lines(): draw lines between locations (x, y) abline(): draw straight lines segments(): draw line segments between (x0, y0) and (x1, y1) arrows(): draw line segments with arrowheads rect(): draw rectangles polygon(): draw one or more polygons text(): draw text at locations (x, y) legends(): draw legends

Slide 103

Slide 103 text

Drawing points Basic call: points(x, y, pch = , col = ) pch: plotting symbols col: color of point

Slide 104

Slide 104 text

Plotting symbols pch = 1 pch = 2 pch = 3 pch = 4 pch = 5 pch = 6 pch = 7 pch = 8 pch = 9 pch = 10 pch = 11 pch = 12 pch = 13 pch = 14 pch = 15 pch = 16 pch = 17 pch = 18 pch = 19 pch = 20 pch = 21

Slide 105

Slide 105 text

Drawing connected line Basic call: lines(x, y, lty = , lwd = , col =, type = ) lty: line texture (“blank”, “solid”, “dashed”, “dotted”, “dotdash”, “longdash”, “twodash”) lwd: line width col: color of lines type: type of lines

Slide 106

Slide 106 text

Line texture lty = 1 lty = 2 lty = 3 lty = 4 lty = 5 lty = 6

Slide 107

Slide 107 text

Line graph variations Other forms can be made by lines(): type="l": line graph (default) type="s": step - horizontal first type="S": step - vertical first type="h": high density plot type="b": both points and lines type="o": over-plotting of points and lines

Slide 108

Slide 108 text

type = l type = s type = S type = h type = b type = o

Slide 109

Slide 109 text

Drawing straight lines Basic calls: abline(a = , b = ) abline(h = ) abline(v = ) a and b: specifies a line intercept and slope h: horizontal lines v: vertical lines Other arguments: lty, col, lwd

Slide 110

Slide 110 text

Drawing line segments Basic call: segments(x0, y0, x1, y1) (x0, y0, x1, y1) gives the locations of the star and end point of the segments. Other arguments: lty, col, lwd

Slide 111

Slide 111 text

Drawing arrows Basic calls: arrows(x0, y0, x1, y1, code = , length = , angle = ) (x0, y0, x1, y1) gives the locations of the star and end point of the arrows. head=1: head at the start; head=2: head at the end and head=3: head at both ends. length: length of arrow head. angle: angle to the shaft.

Slide 112

Slide 112 text

Drawing rectangles Basic call: rect(x0, y0, x1, y1, col = , border = ) (x0, y0, x1, y1) gives the locations of opposite corners of the rectangles. col and border give color of the interior and border. Others: lty and lwd

Slide 113

Slide 113 text

Drawing polygon Basic call: polygon(x, y, col = , border = ) x, y gives the coordinates of the polygon vertexes. col and border give color of the interior and border. Others: lty and lwd.

Slide 114

Slide 114 text

Drawing text text(x, y, labels) x, y: locations of the text labels: actual strings Others: font, col, adj

Slide 115

Slide 115 text

Drawing legend Basic form: legend(x, y, legend = ,...)

Slide 116

Slide 116 text

Adding details Demo add_details.R

Slide 117

Slide 117 text

Practice

Slide 118

Slide 118 text

“Critique by redesign”

Slide 119

Slide 119 text

Case 1

Slide 120

Slide 120 text

Case 1 - Redesign Cornell Darthmouth Upenn Brown Yale Princeton Columbia Harvard 12.5% 10.4% 9.2% 8.3% 6.9% 6.1% 5.8% 5.2% Các trường đại học cạnh tranh nhất nước Mỹ So sánh dựa trên tỷ lệ trúng tuyển niên khóa 2017-2021 Nguồn: Business Insider

Slide 121

Slide 121 text

Case 2

Slide 122

Slide 122 text

Case 2 - Redesign 2011 2012 2013 2014 2015 Hải Phòng Hà Nội Cần Thơ Sài Gòn Đà Nẵng 36 24 45 28 5 1 20 6 16 14 Thứ hạng PCI của 5 thành phố trực thuộc TW Chỉ số PCI đo lường chất lượng môi trường kinh doanh, điều hành kinh tế và cải cách hành chính của chính quyền 63 tỉnh/thành phố Nguồn: VCCI

Slide 123

Slide 123 text

Case 3

Slide 124

Slide 124 text

Case 3 - Redesign 20,000$ 40,000$ 60,000$ 80,000$ 100,000$ 120,000$ 140,000$ Hungary Slovak Republic Estonia Poland Colombia Czech Republic Chile Greece Israel Mexico Slovenia Iceland Turkey Scotland Japan England Korea New Zealand Italy France Portugal OECD Average Sweden Belgium Austria Finland Ireland Netherlands Australia Canada Spain Norway United States Denmark Germany Luxembourg lương khởi điểm lương tối đa Cách biệt về lương giáo viên các quốc gia OECD So sánh lương khởi điểm và lương tối đa của giáo viên trung học cơ sở, 2013 Nguồn: OECD

Slide 125

Slide 125 text

Where to go from here

Slide 126

Slide 126 text

Blogs to follow http://flowingdata.com/, Nathan Yau (lots of R tutorials) https://eagereyes.org/, Robert Kosara http://junkcharts.typepad.com/, Kaiser Fung http://www.perceptualedge.com/library.php, Stephen Few http://www.thefunctionalart.com/, Alberto Cairo http://www.visualisingdata.com/, Andy Kirk http://www.randalolson.com/blog/, Randal S. Olson

Slide 127

Slide 127 text

Books to read. Classic The Visual Display Of Quantitative Information, Edward R.Tufte Visual Explanations, Edward R.Tufte Envisioning Information, Edward R.Tufte Beautiful Evidence, Edward R.Tufte The Elements of Graphing Data, William Cleveland Visualizing Data, William Cleveland Semiology of Graphics, Jacques Bertin Exploratory Data Analysis, John W. Tukey

Slide 128

Slide 128 text

Books to read. Accessible and Recent The Truthful Art, Albert Cairo The Functional Art, Albert Cairo Visualize This, Nathan Yau Data Points, Nathan Yau Information Dashboard Design, Stephen Few Show Me The Numbers, Stephen Few Now you see it, Stephen Few Signal, Stephen Few Storytelling With Data, Cole Nussbaumer Knaflic Creating More Effective Graphs, Naomi B. Robbins The Wall Street Journal Guide to Information Graphics, Dona M. Wong

Slide 129

Slide 129 text

Books to read. R-centric R Graphics, Paul Murrell ggplot2 - Elegant Graphics for Data Analysis, Hadley Wickham lattice - Multivariate Data Visualization with R, Deepayan Sarkar R Graphics Cookbook: Practical Recipes for Visualizing Data, Winston Chang Data Visualisation with R - 100 Examples, Thomas Rahlf Graphing Data With R, John Jay Hilfiger Graphics for Statistics and Data Analysis with R, Kevin J. Keen Graphical Data Analysis with R, Antony Unwin

Slide 130

Slide 130 text

Books to read. Design Data Visualisation: A Handbook for Data Driven Design, Andy Kirk Data Visualization: A Successful Design Process, Andy Kirk Information Visualization: Perception for Design, Colin Ware Visual Thinking for Design, Colin Ware Designing Data Visualizations: Representing Informational Relationships, Noah Iliinsky Visualization Analysis and Design, Tamara Munzner Design for Information, Isabel Meirelles The Non-designer’s Design Book, Robin Williams