Slide 1

Slide 1 text

purrr DRAFT

Slide 2

Slide 2 text

DRAFT https://jennybc.github.io/purrr-tutorial/index.html these are not slides from a talk! I refer to them before and during live coding while teaching STAT 545 and DSCI 523 don’t expect them to stand on their own more material developing here:

Slide 3

Slide 3 text

what is purrr? functional programming blah blah blah ok I admit it: FP not actually front of mind when I use purrr

Slide 4

Slide 4 text

what does purrr help me do? iterate in a data-structure-informed way tolerate list-columns in data frames with consistent UI across a large family of fxns and return values that are ready for further computation

Slide 5

Slide 5 text

for every X do Y return combined results like Z

Slide 6

Slide 6 text

for every X do Y return combined results like Z X and Z will make reference to actual R data structures Y will be a function, possibly anonymous like for i in 1 to n … but much higher level

Slide 7

Slide 7 text

iterate in a data-structure-informed way for every GitHub username do GET https://api.github.com/users/username and give me HTTP responses in a list https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html

Slide 8

Slide 8 text

iterate in a data-structure-informed way for every HTTP response extract the “name” element and give me a character vector https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html

Slide 9

Slide 9 text

iterate in a data-structure-informed way for every HTTP response extract the elements "login", "name", "id", "location" and give me a data frame https://jennybc.github.io/purrr-tutorial/ex03_github-api-json.html

Slide 10

Slide 10 text

iterate in a data-structure-informed way for every row in a data frame create a MIME object and give me a list https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html

Slide 11

Slide 11 text

iterate in a data-structure-informed way for every MIME object send an email and return send status as a list https://jennybc.github.io/purrr-tutorial/ex20_bulk-gmail.html

Slide 12

Slide 12 text

iterate in data-structure-informed way for every tuple (string, pos of substring starts, pos of substring ends) extract the substrings and give me a list of character vectors https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html

Slide 13

Slide 13 text

inspect query modify

Slide 14

Slide 14 text

inspect str() str(my_list, max.level = 1) str(my_list[[i]], list.len = 10) listviewer::jsonedit()

Slide 15

Slide 15 text

map(.x, .f, ...)

Slide 16

Slide 16 text

map(.x, .f, ...) .x is a vector “for every X” = for every element of .x remember lists are vectors remember data frames are lists

Slide 17

Slide 17 text

map(.x, .f, ...) .f is a function possibly specified with shortcuts all shown in the worked examples “do Y” = .f(.x[[i]], …)

Slide 18

Slide 18 text

“give me a Z” map(.x, .f, …) can be thought of as map_list(.x, .f, …)

Slide 19

Slide 19 text

“give me a Z” map_lgl(.x, .f, ...) map_chr(.x, .f, ...) map_int(.x, .f, ...) map_dbl(.x, .f, …) return an atomic vector of requested type

Slide 20

Slide 20 text

“give me a Z” map_df(.x, .f, ..., .id = NULL) basically: map() then dplyr::bind_rows()

Slide 21

Slide 21 text

“give me a Z” walk(.x, .f, …) can be thought of as map_nothing(.x, .f, …)

Slide 22

Slide 22 text

“for every X” map2(.x, .y, .f, …) X = (element i of .x, element i of .y) pmap(.l, .f, …) X = tuple of the i-th elements of the lists in .l remember a data frame is a list!

Slide 23

Slide 23 text

how might you be such things today? maybe you don’t, because you don’t know how for loops apply(), [slvmt]apply(), split(), by() the plyr package: [adl][adl_]ply() with dplyr: df %>% group_by() %>% do()

Slide 24

Slide 24 text

this is not my first R rodeo I have gone through intense, evangelical phases of iterating with base “apply” functions and plyr I highly recommend you give purrr a try

Slide 25

Slide 25 text

relationship to base R approaches there’s nothing you can do with purrr that you cannot do with base specifically: map() is basically lapply() main reasons to use purrr: - shortcuts facilitate anonymous functions for .f - greater encouragement for type-safety - consistent API across large family of functions

Slide 26

Slide 26 text

tolerate list-columns in data frames tidyverse lifestyle ~ work in a data frame when possible what about stuff that can’t be stored as an atomic vector? - stick it in a list-column but list-columns are awful! - get better at inspecting lists - get better at computing on lists use purrr::map() and friends - probably inside dplyr::mutate()

Slide 27

Slide 27 text

tolerate list-columns in data frames tidyverse lifestyle ~ work in a data frame when possible ok there’s a whole section I want to write here, with more worked examples on the site, etc. but that’s not happening this round what follows are a few hints of the what I will say

Slide 28

Slide 28 text

every time someone asks: how can I iterate over a list, but also access the index i or the list names at the same time? they should probably be working inside a data frame, with a list column and a variable for i or the names use tibble::enframe() on your vexing_list and have at it with mutate(new_var = map_*(vexing_list, f)) or map2() or pmap()

Slide 29

Slide 29 text

Great example is Gapminder draw on http://r4ds.had.co.nz/many-models.html and STAT 545 Gapminder materials (translate from plyr and dplyr) natural to nest at country level and put data in list-column fit models, etc. by mutating the data list-column extract model summaries by mutating the fits w broom fxns

Slide 30

Slide 30 text

more far out example is https://jennybc.github.io/purrr-tutorial/ex24_xml-wrangling.html where I put XML nodesets in a data frame each row is one row of a Google Sheet I proceed to wrangle it on the way to get cell contents

Slide 31

Slide 31 text

also, just to be clear: no one in their right mind enjoys having list-columns in a data frame but the benefits often outweigh the costs especially if you have the right tools and a productive mindset it’s always a temporary state goal is always to get back to something simpler

Slide 32

Slide 32 text

ok this is where things just peter out and we go back to live coding

Slide 33

Slide 33 text

My economic policy speech will be carried live at 12:15 P.M. Enjoy! Join me in Fayetteville, North Carolina tomorrow evening at 6pm. Tickets now available at: https://t.co/Z80d4MYIg8 The media is going crazy. They totally distort so many things on purpose. Crimea, nuclear, "the baby" and so much more. Very dishonest! I see where Mayor Stephanie Rawlings-Blake of Baltimore is pushing Crooked hard. Look at the job she has done in Baltimore. She is a joke! Bernie Sanders started off strong, but with the selection of Kaine for V.P., is ending really weak. So much for a movement! TOTAL DISRESPECT Crooked Hillary Clinton is unfit to serve as President of the U.S. Her temperament is weak and her opponents are strong. BAD JUDGEMENT! The Cruz-Kasich pact is under great strain. This joke of a deal is falling apart, not being honored and almost dead. Very dumb! substring(text, first, last) [[1]] [1] -1 [[2]] [1] -1 [[3]] [1] 20 [[4]] [1] 134 [[5]] [1] 28 95 [[6]] [1] 87 114 [[7]] [1] 50 112 123 [[1]] [1] -3 [[2]] [1] -3 [[3]] [1] 24 [[4]] [1] 137 [[5]] [1] 33 98 [[6]] [1] 90 119 [[7]] [1] 53 115 126 tweets match_first match_last https://jennybc.github.io/purrr-tutorial/ex10_trump-tweets.html pmap(list(text = tweets, first = match_first, last = match_last), substring)