Upgrade to Pro — share decks privately, control downloads, hide ads and more …

purrr slides

Jennifer (Jenny) Bryan
August 12, 2016
9.2k

purrr slides

Jennifer (Jenny) Bryan

August 12, 2016
Tweet

Transcript

  1. DRAFT https://jennybc.github.io/purrr-tutorial/index.html these are not slides from a talk! I

    refer to them before and during live coding while teaching STAT 545 and DSCI 523 don’t expect them to stand on their own more material developing here:
  2. what is purrr? functional programming blah blah blah ok I

    admit it: FP not actually front of mind when I use purrr
  3. what does purrr help me do? iterate in a data-structure-informed

    way tolerate list-columns in data frames with consistent UI across a large family of fxns and return values that are ready for further computation
  4. for every X do Y return combined results like Z

    X and Z will make reference to actual R data structures Y will be a function, possibly anonymous like for i in 1 to n … but much higher level
  5. iterate in a data-structure-informed way for every GitHub username do

    GET https://api.github.com/users/username and give me HTTP responses in a list https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
  6. iterate in a data-structure-informed way for every HTTP response extract

    the “name” element and give me a character vector https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
  7. iterate in a data-structure-informed way for every HTTP response extract

    the elements "login", "name", "id", "location" and give me a data frame https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html
  8. iterate in a data-structure-informed way for every row in a

    data frame create a MIME object and give me a list https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html
  9. iterate in a data-structure-informed way for every MIME object send

    an email and return send status as a list https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html
  10. iterate in data-structure-informed way for every tuple (string, pos of

    substring starts, pos of substring ends) extract the substrings and give me a list of character vectors https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html
  11. map(.x, .f, ...) .x is a vector “for every X”

    = for every element of .x remember lists are vectors remember data frames are lists
  12. map(.x, .f, ...) .f is a function possibly specified with

    shortcuts all shown in the worked examples “do Y” = .f(.x[[i]], …)
  13. “give me a Z” map_lgl(.x, .f, ...) map_chr(.x, .f, ...)

    map_int(.x, .f, ...) map_dbl(.x, .f, …) return an atomic vector of requested type
  14. “give me a Z” map_df(.x, .f, ..., .id = NULL)

    basically: map() then dplyr::bind_rows()
  15. “for every X” map2(.x, .y, .f, …) X = (element

    i of .x, element i of .y) pmap(.l, .f, …) X = tuple of the i-th elements of the lists in .l remember a data frame is a list!
  16. how might you be such things today? maybe you don’t,

    because you don’t know how for loops apply(), [slvmt]apply(), split(), by() the plyr package: [adl][adl_]ply() with dplyr: df %>% group_by() %>% do()
  17. this is not my first R rodeo I have gone

    through intense, evangelical phases of iterating with base “apply” functions and plyr I highly recommend you give purrr a try
  18. relationship to base R approaches there’s nothing you can do

    with purrr that you cannot do with base specifically: map() is basically lapply() main reasons to use purrr: - shortcuts facilitate anonymous functions for .f - greater encouragement for type-safety - consistent API across large family of functions
  19. tolerate list-columns in data frames tidyverse lifestyle ~ work in

    a data frame when possible what about stuff that can’t be stored as an atomic vector? - stick it in a list-column but list-columns are awful! - get better at inspecting lists - get better at computing on lists use purrr::map() and friends - probably inside dplyr::mutate()
  20. tolerate list-columns in data frames tidyverse lifestyle ~ work in

    a data frame when possible ok there’s a whole section I want to write here, with more worked examples on the site, etc. but that’s not happening this round what follows are a few hints of the what I will say
  21. every time someone asks: how can I iterate over a

    list, but also access the index i or the list names at the same time? they should probably be working inside a data frame, with a list column and a variable for i or the names use tibble::enframe() on your vexing_list and have at it with mutate(new_var = map_*(vexing_list, f)) or map2() or pmap()
  22. Great example is Gapminder draw on http://r4ds.had.co.nz/many-models.html and STAT 545

    Gapminder materials (translate from plyr and dplyr) natural to nest at country level and put data in list-column fit models, etc. by mutating the data list-column extract model summaries by mutating the fits w broom fxns
  23. more far out example is https://jennybc.github.io/purrr-tutorial/ex24_xml-wrangling.html where I put XML

    nodesets in a data frame each row is one row of a Google Sheet I proceed to wrangle it on the way to get cell contents
  24. also, just to be clear: no one in their right

    mind enjoys having list-columns in a data frame but the benefits often outweigh the costs especially if you have the right tools and a productive mindset it’s always a temporary state goal is always to get back to something simpler
  25. My economic policy speech will be carried live at 12:15

    P.M. Enjoy! Join me in Fayetteville, North Carolina tomorrow evening at 6pm. Tickets now available at: https://t.co/Z80d4MYIg8 The media is going crazy. They totally distort so many things on purpose. Crimea, nuclear, "the baby" and so much more. Very dishonest! I see where Mayor Stephanie Rawlings-Blake of Baltimore is pushing Crooked hard. Look at the job she has done in Baltimore. She is a joke! Bernie Sanders started off strong, but with the selection of Kaine for V.P., is ending really weak. So much for a movement! TOTAL DISRESPECT Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT! The Cruz-Kasich pact is under great strain. This joke of a deal is falling apart, not being honored and almost dead. Very dumb! substring(text, first, last) [[1]] [1] -1 [[2]] [1] -1 [[3]] [1] 20 [[4]] [1] 134 [[5]] [1] 28 95 [[6]] [1] 87 114 [[7]] [1] 50 112 123 [[1]] [1] -3 [[2]] [1] -3 [[3]] [1] 24 [[4]] [1] 137 [[5]] [1] 33 98 [[6]] [1] 90 119 [[7]] [1] 53 115 126 tweets match_first match_last https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html pmap(list(text = tweets, first = match_first, last = match_last), substring)