Why Tidy Eval
starwars %>%
filter(
height < 200,
gender == "male"
)
SELECT *
FROM `starwars`
WHERE ((`height` < 200.0) AND
(`gender` = 'male'))
Change the context of computation
Slide 6
Slide 6 text
Why Tidy Eval
Need to delay computations
list(
height < 200,
gender == "male"
)
Error: object 'height' not found
starwars %>%
filter(
height < 200,
gender == "male"
)
Slide 7
Slide 7 text
Why Tidy Eval
How it works
• Delay computations by quoting
• Change the context and resume computation
starwars %>%
filter(
height < 200,
gender == "male"
)
Slide 8
Slide 8 text
Quoted code is like a blueprint
Flip side
• Programming requires modifying the blueprint
• !! is like a surgery operator for the blueprint
Slide 9
Slide 9 text
Two flavours of
tidy evaluation
Slide 10
Slide 10 text
Two flavours
starwars %>% mutate(birth_year - 100)
starwars %>% group_by(birth_year)
starwars %>% select(birth_year)
starwars %>% filter(birth_year < 50)
One of these things is not like the other things!
Slide 11
Slide 11 text
Two flavours
starwars %>% mutate(birth_year - 100)
starwars %>% group_by(birth_year)
starwars %>% select(birth_year)
starwars %>% filter(birth_year < 50)
One of these things is not like the other things!
Action Selection
Slide 12
Slide 12 text
tmp <- starwars$birth_year - 100
starwars$`birth_year - 100` <- tmp
starwars %>% mutate(birth_year - 100)
Most verbs take actions
1. New vectors are created
2. The data frame is modified
Slide 13
Slide 13 text
Some verbs take selections
1. The position of columns is looked up
2. The data frame is reorganised
starwars %>% select(birth_year)
tmp <- match("birth_year", colnames(starwars))
starwars[, tmp]
Slide 14
Slide 14 text
starwars %>% select(c(1, height))
starwars %>% select(1:height)
starwars %>% select(-1, -height)
Selections have special properties
1. c(), `-` and `:` understand positions and names
2. Selection helpers know about current variables
Slide 15
Slide 15 text
starwars %>% select(ends_with("color"))
starwars %>% select(matches("^[nm]a")
starwars %>% select(10, everything())
1. c(), `-` and `:` understand positions and names
2. Selection helpers know about current variables
Selections have special properties
Slide 16
Slide 16 text
Sometimes they appear to work the same way...
starwars %>% select(height)
# A tibble: 87 x 1
height
1 172
2 167
3 96
# … with 84 more rows
starwars %>% transmute(height)
# A tibble: 87 x 1
height
1 172
2 167
3 96
# … with 84 more rows
Slide 17
Slide 17 text
starwars %>% select(1)
# A tibble: 87 x 1
name
1 Luke Skywalker
2 C-3PO
3 R2-D2
# … with 84 more rows
starwars %>% transmute(1)
# A tibble: 87 x 1
`1`
1 1
2 1
3 1
# … with 84 more rows
Sometimes they appear to work the same way...
Slide 18
Slide 18 text
What about group_by()?
starwars %>% group_by(gender)
# A tibble: 87 x 13
# Groups: gender [5]
name height mass hair_color skin_color eye_color
1 Luke… 172 77 blond fair blue
2 C-3PO 167 75 NA gold yellow
3 R2-D2 96 32 NA white, bl… red
# … with 84 more rows, and 7 more variables
Slide 19
Slide 19 text
starwars %>% group_by(ends_with("color"))
Error: No tidyselect variables were registered
What about group_by()? It takes actions!
Slide 20
Slide 20 text
What about group_by()? It takes actions!
starwars %>%
group_by(height > 170) %>%
summarise(n())
# A tibble: 3 x 2
`height > 170` `n()`
1 FALSE 27
2 TRUE 54
3 NA 6
Slide 21
Slide 21 text
Tip: Use the _at dplyr variants to pass selections!
starwars %>% group_by_at(vars(ends_with("color")))
Slide 22
Slide 22 text
Creating tidy eval
functions
Slide 23
Slide 23 text
Three challenges
1. Taking user selections or actions like a tidy eval function
2. Modifying these selections or actions
3. Passing selections or actions to other tidy eval functions
Slide 24
Slide 24 text
Three challenges
1. Taking user selections or actions like a tidy eval function
2. Modifying these selections or actions
3. Passing selections or actions to other tidy eval functions
Slide 25
Slide 25 text
Three challenges
1. Taking user selections or actions like a tidy eval function
2. Modifying these selections or actions
3. Passing selections or actions to other tidy eval functions
Slide 26
Slide 26 text
Passing the dots
Slide 27
Slide 27 text
• Tidy eval, the easy way
• Make use of existing components
• Solves challenges 1 and 3
• Limited but useful!
my_component <- function(.data, ...) {
.data %>% summarise(...)
}
Slide 28
Slide 28 text
Three examples
1. Create a selection verb with dplyr
2. Transform that verb to take actions instead
3. Add tidyr step to the pipeline
Pass the dots ... !
Slide 29
Slide 29 text
• Most dplyr verbs have variants suffixed with _at
• They take selections within vars(...)
• mutate_at() and summarise_at() apply a
function on each of those vars
1. Selections with dplyr
mutate_at()
summarise_at()
filter_at()
rename_at()
arrange_at()
Slide 30
Slide 30 text
starwars %>%
summarise_at(
vars(height, mass),
~ mean(., na.rm = TRUE)
)
starwars %>%
summarise(
height = mean(height, na.rm = TRUE),
mass = mean(mass, na.rm = TRUE)
)
# A tibble: 1 x 2
height mass
1 174. 97.3
starwars %>%
summarise_sels(height:mass)
# A tibble: 1 x 4
height_mean mass_mean height_sd mass_sd
1 174. 97.3 34.8 169.
Slide 34
Slide 34 text
starwars %>%
group_by(gender) %>%
summarise_sels(height, mass)
# A tibble: 5 x 5
gender height_mean mass_mean height_sd mass_sd
1 female 165. 54.0 23.0 8.37
2 hermaphrodite 175 1358 NA NA
3 male 179. 81.0 35.4 28.2
4 none 200 140 NA NA
5 NA 120 46.3 40.7 24.8
Works with groups!
Slide 35
Slide 35 text
• How could we pass actions instead of selections?
• In dplyr, transmute() is the fundamental action verb
• Returns as many columns as supplied actions
2. Actions with dplyr
starwars %>%
summarise_acts(
heightm = height / 100,
bmi = mass / heightm^2
)
# A tibble: 1 x 4
heightm_mean bmi_mean heightm_sd bmi_sd
1 1.74 32.0 0.348 54.9
Slide 38
Slide 38 text
starwars %>%
group_by(gender) %>%
summarise_acts(
heightm = height / 100,
bmi = mass / heightm^2
)
# A tibble: 5 x 5
gender heightm_mean bmi_mean heightm_sd bmi_sd
1 female 1.65 18.8 0.230 3.71
2 hermaphrodite 1.75 443. NA NA
3 male 1.79 25.7 0.354 6.49
4 none 2 35 NA NA
5 NA 1.2 31.9 0.407 4.33
Slide 39
Slide 39 text
• What if we'd like to gather results across rows?
• Let's develop the pipeline with a tidyr step
• Handling groups will be trickier
3. Gather with tidyr
starwars %>%
gather_summarise_acts(
heightm = height / 100,
bmi = mass / heightm^2
)
# A tibble: 2 x 3
Variable mean sd
1 bmi 32.0 54.9
2 heightm 1.74 0.348
Slide 42
Slide 42 text
starwars %>%
group_by(gender) %>%
gather_summarise_acts(
heightm = height / 100,
bmi = mass / heightm^2
)
Warning messages:
1: In mean.default(Value, na.rm = TRUE) :
argument is not numeric or logical: returning NA
2: In mean.default(Value, na.rm = TRUE) :
argument is not numeric or logical: returning NA
3: In mean.default(Value, na.rm = TRUE) :
argument is not numeric or logical: returning NA
• gather() also gathers
grouping variables
• Summaries can't be
applied on character
Slide 43
Slide 43 text
gather_summarise_acts <- function(.data, ...) {
.data %>%
transmute(...) %>%
gather("Variable", "Value", -one_of(group_vars(.))) %>%
group_by(Variable) %>%
summarise_at(vars(Value), my_summarisers)
}
Solution: Remove the grouping variables from the gathering
Slide 44
Slide 44 text
starwars %>%
group_by(gender) %>%
gather_summarise_acts(
heightm = height / 100,
bmi = mass / heightm^2
)
# A tibble: 10 x 4
# Groups: gender [?]
gender Variable mean sd
1 female bmi 18.8 3.71
2 female heightm 1.65 0.230
3 hermaphrodite bmi 443. NA
4 hermaphrodite heightm 1.75 NA
5 male bmi 25.7 6.49
6 male heightm 1.79 0.354
7 none bmi 35 NA
8 none heightm 2 NA
9 NA bmi 31.9 4.33
10 NA heightm 1.2 0.407
Slide 45
Slide 45 text
• Pass dots to create tidy eval functions easily
• Do you need actions or selections?
• The _at variants and transmute() are useful
• Requires knowledge of tidyverse verbs — transferable
• Think about grouped tibbles
summary()