$30 off During Our Annual Pro Sale. View Details »

Selecting and Doing with Tidy Eval

Selecting and Doing with Tidy Eval

Lionel Henry

January 18, 2019
Tweet

More Decks by Lionel Henry

Other Decks in Technology

Transcript

  1. Selecting and Doing
    with Tidy Eval

    View Slide

  2. • Goal: Create functions around tidyverse pipelines
    • Tidy eval, the easy way
    • Focus on two flavours of tidyverse functions
    Selecting and Doing

    View Slide

  3. Why Tidy Eval?

    View Slide

  4. Why Tidy Eval
    starwars[starwars$height < 200 &
    starwars$gender == "male", ]
    starwars %>%
    filter(
    height < 200,
    gender == "male"
    )
    Change the context of computation

    View Slide

  5. Why Tidy Eval
    starwars %>%
    filter(
    height < 200,
    gender == "male"
    )

    SELECT *
    FROM `starwars`
    WHERE ((`height` < 200.0) AND
    (`gender` = 'male'))
    Change the context of computation

    View Slide

  6. Why Tidy Eval
    Need to delay computations
    list(
    height < 200,
    gender == "male"
    )
    Error: object 'height' not found
    starwars %>%
    filter(
    height < 200,
    gender == "male"
    )

    View Slide

  7. Why Tidy Eval
    How it works
    • Delay computations by quoting
    • Change the context and resume computation
    starwars %>%
    filter(
    height < 200,
    gender == "male"
    )

    View Slide

  8. Quoted code is like a blueprint
    Flip side
    • Programming requires modifying the blueprint
    • !! is like a surgery operator for the blueprint

    View Slide

  9. Two flavours of 

    tidy evaluation

    View Slide

  10. Two flavours
    starwars %>% mutate(birth_year - 100)
    starwars %>% group_by(birth_year)
    starwars %>% select(birth_year)
    starwars %>% filter(birth_year < 50)
    One of these things is not like the other things!

    View Slide

  11. Two flavours
    starwars %>% mutate(birth_year - 100)
    starwars %>% group_by(birth_year)
    starwars %>% select(birth_year)
    starwars %>% filter(birth_year < 50)
    One of these things is not like the other things!
    Action Selection

    View Slide

  12. tmp <- starwars$birth_year - 100
    starwars$`birth_year - 100` <- tmp
    starwars %>% mutate(birth_year - 100)
    Most verbs take actions
    1. New vectors are created
    2. The data frame is modified

    View Slide

  13. Some verbs take selections
    1. The position of columns is looked up
    2. The data frame is reorganised
    starwars %>% select(birth_year)
    tmp <- match("birth_year", colnames(starwars))
    starwars[, tmp]

    View Slide

  14. starwars %>% select(c(1, height))
    starwars %>% select(1:height)
    starwars %>% select(-1, -height)
    Selections have special properties
    1. c(), `-` and `:` understand positions and names
    2. Selection helpers know about current variables

    View Slide

  15. starwars %>% select(ends_with("color"))
    starwars %>% select(matches("^[nm]a")
    starwars %>% select(10, everything())
    1. c(), `-` and `:` understand positions and names
    2. Selection helpers know about current variables
    Selections have special properties

    View Slide

  16. Sometimes they appear to work the same way...
    starwars %>% select(height)
    # A tibble: 87 x 1
    height

    1 172
    2 167
    3 96
    # … with 84 more rows
    starwars %>% transmute(height)
    # A tibble: 87 x 1
    height

    1 172
    2 167
    3 96
    # … with 84 more rows

    View Slide

  17. starwars %>% select(1)
    # A tibble: 87 x 1
    name

    1 Luke Skywalker
    2 C-3PO
    3 R2-D2
    # … with 84 more rows
    starwars %>% transmute(1)
    # A tibble: 87 x 1
    `1`

    1 1
    2 1
    3 1
    # … with 84 more rows
    Sometimes they appear to work the same way...

    View Slide

  18. What about group_by()?
    starwars %>% group_by(gender)
    # A tibble: 87 x 13
    # Groups: gender [5]
    name height mass hair_color skin_color eye_color

    1 Luke… 172 77 blond fair blue
    2 C-3PO 167 75 NA gold yellow
    3 R2-D2 96 32 NA white, bl… red
    # … with 84 more rows, and 7 more variables

    View Slide

  19. starwars %>% group_by(ends_with("color"))
    Error: No tidyselect variables were registered
    What about group_by()? It takes actions!

    View Slide

  20. What about group_by()? It takes actions!
    starwars %>%
    group_by(height > 170) %>%
    summarise(n())
    # A tibble: 3 x 2
    `height > 170` `n()`

    1 FALSE 27
    2 TRUE 54
    3 NA 6

    View Slide

  21. Tip: Use the _at dplyr variants to pass selections!
    starwars %>% group_by_at(vars(ends_with("color")))

    View Slide

  22. Creating tidy eval
    functions

    View Slide

  23. Three challenges
    1. Taking user selections or actions like a tidy eval function
    2. Modifying these selections or actions
    3. Passing selections or actions to other tidy eval functions

    View Slide

  24. Three challenges
    1. Taking user selections or actions like a tidy eval function
    2. Modifying these selections or actions
    3. Passing selections or actions to other tidy eval functions

    View Slide

  25. Three challenges
    1. Taking user selections or actions like a tidy eval function
    2. Modifying these selections or actions
    3. Passing selections or actions to other tidy eval functions

    View Slide

  26. Passing the dots

    View Slide

  27. • Tidy eval, the easy way
    • Make use of existing components
    • Solves challenges 1 and 3
    • Limited but useful!
    my_component <- function(.data, ...) {
    .data %>% summarise(...)
    }

    View Slide

  28. Three examples
    1. Create a selection verb with dplyr
    2. Transform that verb to take actions instead
    3. Add tidyr step to the pipeline
    Pass the dots ... !

    View Slide

  29. • Most dplyr verbs have variants suffixed with _at
    • They take selections within vars(...)
    • mutate_at() and summarise_at() apply a
    function on each of those vars
    1. Selections with dplyr
    mutate_at()

    summarise_at()

    filter_at()

    rename_at()

    arrange_at()

    View Slide

  30. starwars %>%
    summarise_at(
    vars(height, mass),
    ~ mean(., na.rm = TRUE)
    )
    starwars %>%
    summarise(
    height = mean(height, na.rm = TRUE),
    mass = mean(mass, na.rm = TRUE)
    )
    # A tibble: 1 x 2
    height mass

    1 174. 97.3

    View Slide

  31. starwars %>%
    summarise_at(
    vars(height:mass),
    summary_functions
    )
    # A tibble: 1 x 4
    height_mean mass_mean height_sd mass_sd

    1 174. 97.3 34.8 169.
    summary_functions <- list(
    ~ mean(., na.rm = TRUE),
    ~ sd(., na.rm = TRUE)
    )
    • Supports multiple functions
    • Results spread across columns

    View Slide

  32. summarise_sels <- function(.data, ...) {
    summarise_at(.data, vars(...), summary_functions)
    }
    Pass the dots ... !

    View Slide

  33. starwars %>%
    summarise_sels(height:mass)
    # A tibble: 1 x 4
    height_mean mass_mean height_sd mass_sd

    1 174. 97.3 34.8 169.

    View Slide

  34. starwars %>%
    group_by(gender) %>%
    summarise_sels(height, mass)
    # A tibble: 5 x 5
    gender height_mean mass_mean height_sd mass_sd

    1 female 165. 54.0 23.0 8.37
    2 hermaphrodite 175 1358 NA NA
    3 male 179. 81.0 35.4 28.2
    4 none 200 140 NA NA
    5 NA 120 46.3 40.7 24.8
    Works with groups!

    View Slide

  35. • How could we pass actions instead of selections?
    • In dplyr, transmute() is the fundamental action verb
    • Returns as many columns as supplied actions
    2. Actions with dplyr

    View Slide

  36. summarise_acts <- function(.data, ...) {
    .data %>%
    transmute(...) %>%
    summarise_all(summary_functions)
    }
    summarise_sels <- function(.data, ...) {
    .data %>% summarise_at(vars(...), summary_functions)
    }
    Pass the dots ... !

    View Slide

  37. starwars %>%
    summarise_acts(
    heightm = height / 100,
    bmi = mass / heightm^2
    )
    # A tibble: 1 x 4
    heightm_mean bmi_mean heightm_sd bmi_sd

    1 1.74 32.0 0.348 54.9

    View Slide

  38. starwars %>%
    group_by(gender) %>%
    summarise_acts(
    heightm = height / 100,
    bmi = mass / heightm^2
    )
    # A tibble: 5 x 5
    gender heightm_mean bmi_mean heightm_sd bmi_sd

    1 female 1.65 18.8 0.230 3.71
    2 hermaphrodite 1.75 443. NA NA
    3 male 1.79 25.7 0.354 6.49
    4 none 2 35 NA NA
    5 NA 1.2 31.9 0.407 4.33

    View Slide

  39. • What if we'd like to gather results across rows?
    • Let's develop the pipeline with a tidyr step
    • Handling groups will be trickier
    3. Gather with tidyr

    View Slide

  40. gather_summarise_acts <- function(.data, ...) {
    .data %>%
    transmute(...) %>%
    gather("Variable", "Value", everything()) %>%
    group_by(Variable) %>%
    summarise_at(vars(Value), summary_functions)
    }
    Pass the dots ... !

    View Slide

  41. starwars %>%
    gather_summarise_acts(
    heightm = height / 100,
    bmi = mass / heightm^2
    )
    # A tibble: 2 x 3
    Variable mean sd

    1 bmi 32.0 54.9
    2 heightm 1.74 0.348

    View Slide

  42. starwars %>%
    group_by(gender) %>%
    gather_summarise_acts(
    heightm = height / 100,
    bmi = mass / heightm^2
    )
    Warning messages:
    1: In mean.default(Value, na.rm = TRUE) :
    argument is not numeric or logical: returning NA
    2: In mean.default(Value, na.rm = TRUE) :
    argument is not numeric or logical: returning NA
    3: In mean.default(Value, na.rm = TRUE) :
    argument is not numeric or logical: returning NA
    • gather() also gathers
    grouping variables
    • Summaries can't be
    applied on character

    View Slide

  43. gather_summarise_acts <- function(.data, ...) {
    .data %>%
    transmute(...) %>%
    gather("Variable", "Value", -one_of(group_vars(.))) %>%
    group_by(Variable) %>%
    summarise_at(vars(Value), my_summarisers)
    }
    Solution: Remove the grouping variables from the gathering

    View Slide

  44. starwars %>%
    group_by(gender) %>%
    gather_summarise_acts(
    heightm = height / 100,
    bmi = mass / heightm^2
    )
    # A tibble: 10 x 4
    # Groups: gender [?]
    gender Variable mean sd

    1 female bmi 18.8 3.71
    2 female heightm 1.65 0.230
    3 hermaphrodite bmi 443. NA
    4 hermaphrodite heightm 1.75 NA
    5 male bmi 25.7 6.49
    6 male heightm 1.79 0.354
    7 none bmi 35 NA
    8 none heightm 2 NA
    9 NA bmi 31.9 4.33
    10 NA heightm 1.2 0.407

    View Slide

  45. • Pass dots to create tidy eval functions easily
    • Do you need actions or selections?
    • The _at variants and transmute() are useful
    • Requires knowledge of tidyverse verbs — transferable
    • Think about grouped tibbles
    summary()

    View Slide

  46. View Slide