Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Rectangling

Data Rectangling

Talk about data rectangling and list-columns at RStudio Conf 2018 in San Diego
https://www.rstudio.com/conference/
Gist of code I showed:
https://gist.github.com/jennybc/3afafce0a06fde314b5c9844912d6bd7

Jennifer (Jenny) Bryan

February 02, 2018
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1. Why work with lists? You have no choice. •String processing,

    e.g., splitting •JSON or XML, e.g. web APIs •Models, plots, & collections thereof
  2. "Combines the excitement of iris and mtcars, with the complexity

    of recursive lists. W00t!" install.packages("repurrrsive")
  3. map(.x, .f, ...) .f has some special shortcuts to make

    common tasks easy map(.x, "TEXT") map(.x, i)
  4. map(.x, .f, ...) .f can take many forms • existing

    function • anonymous function • formula
  5. library(glue) 
 
 glue_data(
 list(name = "Jenny", born = "in

    Atlanta"),
 "{name} was born {born}."
 )
 #> Jenny was born in Atlanta. 
 
 glue_data(got_chars[[2]], "{name} was born {born}.")
 #> Tyrion Lannister was born In 273 AC, at Casterly Rock. 
 glue_data(got_chars[[9]], "{name} was born {born}.")
 #> Daenerys Targaryen was born In 284 AC, at Dragonstone.
  6. glue_data(got_chars[[9]], "{name} was born {born}.") ~ glue_data( .x , "{name}

    was born {born}.") replace your example with .x prefix with ~ to say "it's a formula!"
  7. map_chr(got_chars, ~ glue_data(.x, "{name} was born {born}."))
 #> [1] "Theon

    Greyjoy was born In 278 AC or 279 AC, at Pyke." #> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock." #> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke." #> [4] "Will was born ." #> [5] "Areo Hotah was born In 257 AC or before, at Norvos." #> [6] "Chett was born At Hag's Mire." #> [7] "Cressen was born In 219 AC or 220 AC." #> [8] "Arianne Martell was born In 276 AC, at Sunspear." #> [9] "Daenerys Targaryen was born In 284 AC, at Dragonstone." drop-in to any member of the map_*() family
  8. Why put a list into a data frame? safety &

    convenience •Manage multiple vectors holistically •Use existing toolkit for filter, select, etc.
  9. lists are part of life RStudio Object viewer helps tibbles

    are list-friendly map() functions help you compute on & simplify lists