Data Rectangling

Data Rectangling

Talk about data rectangling and list-columns at RStudio Conf 2018 in San Diego
Gist of code I showed:


Jennifer (Jenny) Bryan

February 02, 2018


  1. Data Wrangling @JennyBryan @jennybc  

  2. Data Wrangling @JennyBryan @jennybc   Rect

  3. None
  4. None
  5. None
  6. atomic vectors logical factor integer, double

  7. vectors of same length? DATA FRAME!

  8. vectors don’t have to be atomic works for lists too!

    list column
  9. name
 <chr> stuff
 <list> this is a data frame! a

    tibble, specifically
  10. a list

  11. a homogeneous list

  12. Why work with lists? You have no choice. •String processing,

    e.g., splitting •JSON or XML, e.g. web APIs •Models, plots, & collections thereof
  13. An API Of Ice And Fire

  14. "Combines the excitement of iris and mtcars, with the complexity

    of recursive lists. W00t!" install.packages("repurrrsive")

  16. got_chars[[9]][["name"]] got_chars[[9]][["titles"]]

  17. x[[i]] x[i] x from

  18. pit of success

  19. gentle hill of striving

  20. None
  21. map(.x, .f, ...) purrr::

  22. map(.x, .f, ...) for every element of .x apply .f

  23. map(.x, .f, ...) .f has some special shortcuts to make

    common tasks easy map(.x, "TEXT") map(.x, i)
  24. .x = minis

  25. map(minis, "pants")

  26. go to R

  27. map_lgl(.x, .f, ...) map_int(.x, .f, ...) map_dbl(.x, .f, ...) map_chr(.x,

    .f, ...)
  28. map_dfr(minis, `[`, c("pants", "torso", "head")

  29. If everything is equally easy, everything is equally hard. paraphrasing

    David Heinemeier Hansson re: Ruby on Rails
  30. map(.x, .f, ...) .f can take many forms • existing

    function • anonymous function • formula
  31. .x = minis

  32. map(minis, antennate)

  33. library(glue) 
 list(name = "Jenny", born = "in

 "{name} was born {born}."
 #> Jenny was born in Atlanta. 
 glue_data(got_chars[[2]], "{name} was born {born}.")
 #> Tyrion Lannister was born In 273 AC, at Casterly Rock. 
 glue_data(got_chars[[9]], "{name} was born {born}.")
 #> Daenerys Targaryen was born In 284 AC, at Dragonstone.
  34. glue_data(got_chars[[9]], "{name} was born {born}.") ~ glue_data( .x , "{name}

    was born {born}.") replace your example with .x prefix with ~ to say "it's a formula!"
  35. map_chr(got_chars, ~ glue_data(.x, "{name} was born {born}."))
 #> [1] "Theon

    Greyjoy was born In 278 AC or 279 AC, at Pyke." #> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock." #> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke." #> [4] "Will was born ." #> [5] "Areo Hotah was born In 257 AC or before, at Norvos." #> [6] "Chett was born At Hag's Mire." #> [7] "Cressen was born In 219 AC or 220 AC." #> [8] "Arianne Martell was born In 276 AC, at Sunspear." #> [9] "Daenerys Targaryen was born In 284 AC, at Dragonstone." drop-in to any member of the map_*() family
  36. name
 <chr> stuff
 <list> this is a data frame! a

    tibble, specifically
  37. Why put a list into a data frame? safety &

    convenience •Manage multiple vectors holistically •Use existing toolkit for filter, select, etc.
  38. What happens in the data frame Stays in the data

  39. last R example: list in a data frame = list-column

  40. lists are part of life RStudio Object viewer helps tibbles

    are list-friendly map() functions help you compute on & simplify lists