Selecting and Doing with Tidy Eval

4f4eeaab8247b7a4221336902f376a14?s=47 Lionel Henry
January 18, 2019

Selecting and Doing with Tidy Eval

4f4eeaab8247b7a4221336902f376a14?s=128

Lionel Henry

January 18, 2019
Tweet

Transcript

  1. Selecting and Doing with Tidy Eval

  2. • Goal: Create functions around tidyverse pipelines • Tidy eval,

    the easy way • Focus on two flavours of tidyverse functions Selecting and Doing
  3. Why Tidy Eval?

  4. Why Tidy Eval starwars[starwars$height < 200 & starwars$gender == "male",

    ] starwars %>% filter( height < 200, gender == "male" ) Change the context of computation
  5. Why Tidy Eval starwars %>% filter( height < 200, gender

    == "male" ) <SQL> SELECT * FROM `starwars` WHERE ((`height` < 200.0) AND (`gender` = 'male')) Change the context of computation
  6. Why Tidy Eval Need to delay computations list( height <

    200, gender == "male" ) Error: object 'height' not found starwars %>% filter( height < 200, gender == "male" )
  7. Why Tidy Eval How it works • Delay computations by

    quoting • Change the context and resume computation starwars %>% filter( height < 200, gender == "male" )
  8. Quoted code is like a blueprint Flip side • Programming

    requires modifying the blueprint • !! is like a surgery operator for the blueprint
  9. Two flavours of 
 tidy evaluation

  10. Two flavours starwars %>% mutate(birth_year - 100) starwars %>% group_by(birth_year)

    starwars %>% select(birth_year) starwars %>% filter(birth_year < 50) One of these things is not like the other things!
  11. Two flavours starwars %>% mutate(birth_year - 100) starwars %>% group_by(birth_year)

    starwars %>% select(birth_year) starwars %>% filter(birth_year < 50) One of these things is not like the other things! Action Selection
  12. tmp <- starwars$birth_year - 100 starwars$`birth_year - 100` <- tmp

    starwars %>% mutate(birth_year - 100) Most verbs take actions 1. New vectors are created 2. The data frame is modified
  13. Some verbs take selections 1. The position of columns is

    looked up 2. The data frame is reorganised starwars %>% select(birth_year) tmp <- match("birth_year", colnames(starwars)) starwars[, tmp]
  14. starwars %>% select(c(1, height)) starwars %>% select(1:height) starwars %>% select(-1,

    -height) Selections have special properties 1. c(), `-` and `:` understand positions and names 2. Selection helpers know about current variables
  15. starwars %>% select(ends_with("color")) starwars %>% select(matches("^[nm]a") starwars %>% select(10, everything())

    1. c(), `-` and `:` understand positions and names 2. Selection helpers know about current variables Selections have special properties
  16. Sometimes they appear to work the same way... starwars %>%

    select(height) # A tibble: 87 x 1 height <int> 1 172 2 167 3 96 # … with 84 more rows starwars %>% transmute(height) # A tibble: 87 x 1 height <int> 1 172 2 167 3 96 # … with 84 more rows
  17. starwars %>% select(1) # A tibble: 87 x 1 name

    <chr> 1 Luke Skywalker 2 C-3PO 3 R2-D2 # … with 84 more rows starwars %>% transmute(1) # A tibble: 87 x 1 `1` <dbl> 1 1 2 1 3 1 # … with 84 more rows Sometimes they appear to work the same way...
  18. What about group_by()? starwars %>% group_by(gender) # A tibble: 87

    x 13 # Groups: gender [5] name height mass hair_color skin_color eye_color <chr> <int> <dbl> <chr> <chr> <chr> 1 Luke… 172 77 blond fair blue 2 C-3PO 167 75 NA gold yellow 3 R2-D2 96 32 NA white, bl… red # … with 84 more rows, and 7 more variables
  19. starwars %>% group_by(ends_with("color")) Error: No tidyselect variables were registered What

    about group_by()? It takes actions!
  20. What about group_by()? It takes actions! starwars %>% group_by(height >

    170) %>% summarise(n()) # A tibble: 3 x 2 `height > 170` `n()` <lgl> <int> 1 FALSE 27 2 TRUE 54 3 NA 6
  21. Tip: Use the _at dplyr variants to pass selections! starwars

    %>% group_by_at(vars(ends_with("color")))
  22. Creating tidy eval functions

  23. Three challenges 1. Taking user selections or actions like a

    tidy eval function 2. Modifying these selections or actions 3. Passing selections or actions to other tidy eval functions
  24. Three challenges 1. Taking user selections or actions like a

    tidy eval function 2. Modifying these selections or actions 3. Passing selections or actions to other tidy eval functions
  25. Three challenges 1. Taking user selections or actions like a

    tidy eval function 2. Modifying these selections or actions 3. Passing selections or actions to other tidy eval functions
  26. Passing the dots

  27. • Tidy eval, the easy way • Make use of

    existing components • Solves challenges 1 and 3 • Limited but useful! my_component <- function(.data, ...) { .data %>% summarise(...) }
  28. Three examples 1. Create a selection verb with dplyr 2.

    Transform that verb to take actions instead 3. Add tidyr step to the pipeline Pass the dots ... !
  29. • Most dplyr verbs have variants suffixed with _at •

    They take selections within vars(...) • mutate_at() and summarise_at() apply a function on each of those vars 1. Selections with dplyr mutate_at()
 summarise_at()
 filter_at()
 rename_at()
 arrange_at()
  30. starwars %>% summarise_at( vars(height, mass), ~ mean(., na.rm = TRUE)

    ) starwars %>% summarise( height = mean(height, na.rm = TRUE), mass = mean(mass, na.rm = TRUE) ) # A tibble: 1 x 2 height mass <dbl> <dbl> 1 174. 97.3
  31. starwars %>% summarise_at( vars(height:mass), summary_functions ) # A tibble: 1

    x 4 height_mean mass_mean height_sd mass_sd <dbl> <dbl> <dbl> <dbl> 1 174. 97.3 34.8 169. summary_functions <- list( ~ mean(., na.rm = TRUE), ~ sd(., na.rm = TRUE) ) • Supports multiple functions • Results spread across columns
  32. summarise_sels <- function(.data, ...) { summarise_at(.data, vars(...), summary_functions) } Pass

    the dots ... !
  33. starwars %>% summarise_sels(height:mass) # A tibble: 1 x 4 height_mean

    mass_mean height_sd mass_sd <dbl> <dbl> <dbl> <dbl> 1 174. 97.3 34.8 169.
  34. starwars %>% group_by(gender) %>% summarise_sels(height, mass) # A tibble: 5

    x 5 gender height_mean mass_mean height_sd mass_sd <chr> <dbl> <dbl> <dbl> <dbl> 1 female 165. 54.0 23.0 8.37 2 hermaphrodite 175 1358 NA NA 3 male 179. 81.0 35.4 28.2 4 none 200 140 NA NA 5 NA 120 46.3 40.7 24.8 Works with groups!
  35. • How could we pass actions instead of selections? •

    In dplyr, transmute() is the fundamental action verb • Returns as many columns as supplied actions 2. Actions with dplyr
  36. summarise_acts <- function(.data, ...) { .data %>% transmute(...) %>% summarise_all(summary_functions)

    } summarise_sels <- function(.data, ...) { .data %>% summarise_at(vars(...), summary_functions) } Pass the dots ... !
  37. starwars %>% summarise_acts( heightm = height / 100, bmi =

    mass / heightm^2 ) # A tibble: 1 x 4 heightm_mean bmi_mean heightm_sd bmi_sd <dbl> <dbl> <dbl> <dbl> 1 1.74 32.0 0.348 54.9
  38. starwars %>% group_by(gender) %>% summarise_acts( heightm = height / 100,

    bmi = mass / heightm^2 ) # A tibble: 5 x 5 gender heightm_mean bmi_mean heightm_sd bmi_sd <chr> <dbl> <dbl> <dbl> <dbl> 1 female 1.65 18.8 0.230 3.71 2 hermaphrodite 1.75 443. NA NA 3 male 1.79 25.7 0.354 6.49 4 none 2 35 NA NA 5 NA 1.2 31.9 0.407 4.33
  39. • What if we'd like to gather results across rows?

    • Let's develop the pipeline with a tidyr step • Handling groups will be trickier 3. Gather with tidyr
  40. gather_summarise_acts <- function(.data, ...) { .data %>% transmute(...) %>% gather("Variable",

    "Value", everything()) %>% group_by(Variable) %>% summarise_at(vars(Value), summary_functions) } Pass the dots ... !
  41. starwars %>% gather_summarise_acts( heightm = height / 100, bmi =

    mass / heightm^2 ) # A tibble: 2 x 3 Variable mean sd <chr> <dbl> <dbl> 1 bmi 32.0 54.9 2 heightm 1.74 0.348
  42. starwars %>% group_by(gender) %>% gather_summarise_acts( heightm = height / 100,

    bmi = mass / heightm^2 ) Warning messages: 1: In mean.default(Value, na.rm = TRUE) : argument is not numeric or logical: returning NA 2: In mean.default(Value, na.rm = TRUE) : argument is not numeric or logical: returning NA 3: In mean.default(Value, na.rm = TRUE) : argument is not numeric or logical: returning NA • gather() also gathers grouping variables • Summaries can't be applied on character
  43. gather_summarise_acts <- function(.data, ...) { .data %>% transmute(...) %>% gather("Variable",

    "Value", -one_of(group_vars(.))) %>% group_by(Variable) %>% summarise_at(vars(Value), my_summarisers) } Solution: Remove the grouping variables from the gathering
  44. starwars %>% group_by(gender) %>% gather_summarise_acts( heightm = height / 100,

    bmi = mass / heightm^2 ) # A tibble: 10 x 4 # Groups: gender [?] gender Variable mean sd <chr> <chr> <dbl> <dbl> 1 female bmi 18.8 3.71 2 female heightm 1.65 0.230 3 hermaphrodite bmi 443. NA 4 hermaphrodite heightm 1.75 NA 5 male bmi 25.7 6.49 6 male heightm 1.79 0.354 7 none bmi 35 NA 8 none heightm 2 NA 9 NA bmi 31.9 4.33 10 NA heightm 1.2 0.407
  45. • Pass dots to create tidy eval functions easily •

    Do you need actions or selections? • The _at variants and transmute() are useful • Requires knowledge of tidyverse verbs — transferable • Think about grouped tibbles summary()
  46. None