Slide 1

Slide 1 text

Selecting and Doing with Tidy Eval

Slide 2

Slide 2 text

• Goal: Create functions around tidyverse pipelines • Tidy eval, the easy way • Focus on two flavours of tidyverse functions Selecting and Doing

Slide 3

Slide 3 text

Why Tidy Eval?

Slide 4

Slide 4 text

Why Tidy Eval starwars[starwars$height < 200 & starwars$gender == "male", ] starwars %>% filter( height < 200, gender == "male" ) Change the context of computation

Slide 5

Slide 5 text

Why Tidy Eval starwars %>% filter( height < 200, gender == "male" ) SELECT * FROM `starwars` WHERE ((`height` < 200.0) AND (`gender` = 'male')) Change the context of computation

Slide 6

Slide 6 text

Why Tidy Eval Need to delay computations list( height < 200, gender == "male" ) Error: object 'height' not found starwars %>% filter( height < 200, gender == "male" )

Slide 7

Slide 7 text

Why Tidy Eval How it works • Delay computations by quoting • Change the context and resume computation starwars %>% filter( height < 200, gender == "male" )

Slide 8

Slide 8 text

Quoted code is like a blueprint Flip side • Programming requires modifying the blueprint • !! is like a surgery operator for the blueprint

Slide 9

Slide 9 text

Two flavours of 
 tidy evaluation

Slide 10

Slide 10 text

Two flavours starwars %>% mutate(birth_year - 100) starwars %>% group_by(birth_year) starwars %>% select(birth_year) starwars %>% filter(birth_year < 50) One of these things is not like the other things!

Slide 11

Slide 11 text

Two flavours starwars %>% mutate(birth_year - 100) starwars %>% group_by(birth_year) starwars %>% select(birth_year) starwars %>% filter(birth_year < 50) One of these things is not like the other things! Action Selection

Slide 12

Slide 12 text

tmp <- starwars$birth_year - 100 starwars$`birth_year - 100` <- tmp starwars %>% mutate(birth_year - 100) Most verbs take actions 1. New vectors are created 2. The data frame is modified

Slide 13

Slide 13 text

Some verbs take selections 1. The position of columns is looked up 2. The data frame is reorganised starwars %>% select(birth_year) tmp <- match("birth_year", colnames(starwars)) starwars[, tmp]

Slide 14

Slide 14 text

starwars %>% select(c(1, height)) starwars %>% select(1:height) starwars %>% select(-1, -height) Selections have special properties 1. c(), `-` and `:` understand positions and names 2. Selection helpers know about current variables

Slide 15

Slide 15 text

starwars %>% select(ends_with("color")) starwars %>% select(matches("^[nm]a") starwars %>% select(10, everything()) 1. c(), `-` and `:` understand positions and names 2. Selection helpers know about current variables Selections have special properties

Slide 16

Slide 16 text

Sometimes they appear to work the same way... starwars %>% select(height) # A tibble: 87 x 1 height 1 172 2 167 3 96 # … with 84 more rows starwars %>% transmute(height) # A tibble: 87 x 1 height 1 172 2 167 3 96 # … with 84 more rows

Slide 17

Slide 17 text

starwars %>% select(1) # A tibble: 87 x 1 name 1 Luke Skywalker 2 C-3PO 3 R2-D2 # … with 84 more rows starwars %>% transmute(1) # A tibble: 87 x 1 `1` 1 1 2 1 3 1 # … with 84 more rows Sometimes they appear to work the same way...

Slide 18

Slide 18 text

What about group_by()? starwars %>% group_by(gender) # A tibble: 87 x 13 # Groups: gender [5] name height mass hair_color skin_color eye_color 1 Luke… 172 77 blond fair blue 2 C-3PO 167 75 NA gold yellow 3 R2-D2 96 32 NA white, bl… red # … with 84 more rows, and 7 more variables

Slide 19

Slide 19 text

starwars %>% group_by(ends_with("color")) Error: No tidyselect variables were registered What about group_by()? It takes actions!

Slide 20

Slide 20 text

What about group_by()? It takes actions! starwars %>% group_by(height > 170) %>% summarise(n()) # A tibble: 3 x 2 `height > 170` `n()` 1 FALSE 27 2 TRUE 54 3 NA 6

Slide 21

Slide 21 text

Tip: Use the _at dplyr variants to pass selections! starwars %>% group_by_at(vars(ends_with("color")))

Slide 22

Slide 22 text

Creating tidy eval functions

Slide 23

Slide 23 text

Three challenges 1. Taking user selections or actions like a tidy eval function 2. Modifying these selections or actions 3. Passing selections or actions to other tidy eval functions

Slide 24

Slide 24 text

Three challenges 1. Taking user selections or actions like a tidy eval function 2. Modifying these selections or actions 3. Passing selections or actions to other tidy eval functions

Slide 25

Slide 25 text

Three challenges 1. Taking user selections or actions like a tidy eval function 2. Modifying these selections or actions 3. Passing selections or actions to other tidy eval functions

Slide 26

Slide 26 text

Passing the dots

Slide 27

Slide 27 text

• Tidy eval, the easy way • Make use of existing components • Solves challenges 1 and 3 • Limited but useful! my_component <- function(.data, ...) { .data %>% summarise(...) }

Slide 28

Slide 28 text

Three examples 1. Create a selection verb with dplyr 2. Transform that verb to take actions instead 3. Add tidyr step to the pipeline Pass the dots ... !

Slide 29

Slide 29 text

• Most dplyr verbs have variants suffixed with _at • They take selections within vars(...) • mutate_at() and summarise_at() apply a function on each of those vars 1. Selections with dplyr mutate_at()
 summarise_at()
 filter_at()
 rename_at()
 arrange_at()

Slide 30

Slide 30 text

starwars %>% summarise_at( vars(height, mass), ~ mean(., na.rm = TRUE) ) starwars %>% summarise( height = mean(height, na.rm = TRUE), mass = mean(mass, na.rm = TRUE) ) # A tibble: 1 x 2 height mass 1 174. 97.3

Slide 31

Slide 31 text

starwars %>% summarise_at( vars(height:mass), summary_functions ) # A tibble: 1 x 4 height_mean mass_mean height_sd mass_sd 1 174. 97.3 34.8 169. summary_functions <- list( ~ mean(., na.rm = TRUE), ~ sd(., na.rm = TRUE) ) • Supports multiple functions • Results spread across columns

Slide 32

Slide 32 text

summarise_sels <- function(.data, ...) { summarise_at(.data, vars(...), summary_functions) } Pass the dots ... !

Slide 33

Slide 33 text

starwars %>% summarise_sels(height:mass) # A tibble: 1 x 4 height_mean mass_mean height_sd mass_sd 1 174. 97.3 34.8 169.

Slide 34

Slide 34 text

starwars %>% group_by(gender) %>% summarise_sels(height, mass) # A tibble: 5 x 5 gender height_mean mass_mean height_sd mass_sd 1 female 165. 54.0 23.0 8.37 2 hermaphrodite 175 1358 NA NA 3 male 179. 81.0 35.4 28.2 4 none 200 140 NA NA 5 NA 120 46.3 40.7 24.8 Works with groups!

Slide 35

Slide 35 text

• How could we pass actions instead of selections? • In dplyr, transmute() is the fundamental action verb • Returns as many columns as supplied actions 2. Actions with dplyr

Slide 36

Slide 36 text

summarise_acts <- function(.data, ...) { .data %>% transmute(...) %>% summarise_all(summary_functions) } summarise_sels <- function(.data, ...) { .data %>% summarise_at(vars(...), summary_functions) } Pass the dots ... !

Slide 37

Slide 37 text

starwars %>% summarise_acts( heightm = height / 100, bmi = mass / heightm^2 ) # A tibble: 1 x 4 heightm_mean bmi_mean heightm_sd bmi_sd 1 1.74 32.0 0.348 54.9

Slide 38

Slide 38 text

starwars %>% group_by(gender) %>% summarise_acts( heightm = height / 100, bmi = mass / heightm^2 ) # A tibble: 5 x 5 gender heightm_mean bmi_mean heightm_sd bmi_sd 1 female 1.65 18.8 0.230 3.71 2 hermaphrodite 1.75 443. NA NA 3 male 1.79 25.7 0.354 6.49 4 none 2 35 NA NA 5 NA 1.2 31.9 0.407 4.33

Slide 39

Slide 39 text

• What if we'd like to gather results across rows? • Let's develop the pipeline with a tidyr step • Handling groups will be trickier 3. Gather with tidyr

Slide 40

Slide 40 text

gather_summarise_acts <- function(.data, ...) { .data %>% transmute(...) %>% gather("Variable", "Value", everything()) %>% group_by(Variable) %>% summarise_at(vars(Value), summary_functions) } Pass the dots ... !

Slide 41

Slide 41 text

starwars %>% gather_summarise_acts( heightm = height / 100, bmi = mass / heightm^2 ) # A tibble: 2 x 3 Variable mean sd 1 bmi 32.0 54.9 2 heightm 1.74 0.348

Slide 42

Slide 42 text

starwars %>% group_by(gender) %>% gather_summarise_acts( heightm = height / 100, bmi = mass / heightm^2 ) Warning messages: 1: In mean.default(Value, na.rm = TRUE) : argument is not numeric or logical: returning NA 2: In mean.default(Value, na.rm = TRUE) : argument is not numeric or logical: returning NA 3: In mean.default(Value, na.rm = TRUE) : argument is not numeric or logical: returning NA • gather() also gathers grouping variables • Summaries can't be applied on character

Slide 43

Slide 43 text

gather_summarise_acts <- function(.data, ...) { .data %>% transmute(...) %>% gather("Variable", "Value", -one_of(group_vars(.))) %>% group_by(Variable) %>% summarise_at(vars(Value), my_summarisers) } Solution: Remove the grouping variables from the gathering

Slide 44

Slide 44 text

starwars %>% group_by(gender) %>% gather_summarise_acts( heightm = height / 100, bmi = mass / heightm^2 ) # A tibble: 10 x 4 # Groups: gender [?] gender Variable mean sd 1 female bmi 18.8 3.71 2 female heightm 1.65 0.230 3 hermaphrodite bmi 443. NA 4 hermaphrodite heightm 1.75 NA 5 male bmi 25.7 6.49 6 male heightm 1.79 0.354 7 none bmi 35 NA 8 none heightm 2 NA 9 NA bmi 31.9 4.33 10 NA heightm 1.2 0.407

Slide 45

Slide 45 text

• Pass dots to create tidy eval functions easily • Do you need actions or selections? • The _at variants and transmute() are useful • Requires knowledge of tidyverse verbs — transferable • Think about grouped tibbles summary()

Slide 46

Slide 46 text

No content