$30 off During Our Annual Pro Sale. View Details »

Data Rectangling

Data Rectangling

Talk about data rectangling and list-columns at RStudio Conf 2018 in San Diego
https://www.rstudio.com/conference/
Gist of code I showed:
https://gist.github.com/jennybc/3afafce0a06fde314b5c9844912d6bd7

Jennifer (Jenny) Bryan

February 02, 2018
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1. Data Wrangling
    @JennyBryan
    @jennybc


    View Slide

  2. Data Wrangling
    @JennyBryan
    @jennybc


    Rect

    View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. atomic vectors
    logical factor
    integer, double

    View Slide

  7. vectors of same length? DATA FRAME!

    View Slide

  8. vectors don’t have to be atomic
    works for lists too! list column

    View Slide

  9. name


    stuff


    this is a data frame!
    a tibble, specifically

    View Slide

  10. a list

    View Slide

  11. a homogeneous list

    View Slide

  12. Why work with lists?
    You have no choice.
    •String processing, e.g., splitting
    •JSON or XML, e.g. web APIs
    •Models, plots, & collections thereof

    View Slide

  13. An API Of Ice And Fire
    https://anapioficeandfire.com
    https://cran.r-project.org/package=repurrrsive

    View Slide

  14. "Combines the excitement of iris and mtcars,
    with the complexity of recursive lists.
    W00t!"
    install.packages("repurrrsive")

    View Slide

  15. https://blog.rstudio.com/2017/08/22/rstudio-v1-1-preview-object-explorer/
    View(YOUR_HAIRY_LIST)

    View Slide

  16. got_chars[[9]][["name"]]
    got_chars[[9]][["titles"]]

    View Slide

  17. x[[i]]
    x[i]
    x
    from
    http://r4ds.had.co.nz/vectors.html#lists-of-condiments

    View Slide

  18. http://blog.codinghorror.com/falling-into-the-pit-o
    pit of success

    View Slide

  19. https://shibumo.wordpress.com
    gentle hill of striving

    View Slide

  20. View Slide

  21. map(.x, .f, ...)
    purrr::

    View Slide

  22. map(.x, .f, ...)
    for every element of .x
    apply .f

    View Slide

  23. map(.x, .f, ...)
    .f has some special shortcuts
    to make common tasks easy
    map(.x, "TEXT")
    map(.x, i)

    View Slide

  24. .x = minis

    View Slide

  25. map(minis, "pants")

    View Slide

  26. go to R

    View Slide

  27. map_lgl(.x, .f, ...)
    map_int(.x, .f, ...)
    map_dbl(.x, .f, ...)
    map_chr(.x, .f, ...)

    View Slide

  28. map_dfr(minis, `[`,
    c("pants", "torso", "head")

    View Slide

  29. If everything is equally easy,
    everything is equally hard.
    paraphrasing David Heinemeier Hansson re: Ruby on Rails

    View Slide

  30. map(.x, .f, ...)
    .f can take many forms
    • existing function
    • anonymous function
    • formula

    View Slide

  31. .x = minis

    View Slide

  32. map(minis, antennate)

    View Slide

  33. library(glue)


    glue_data(

    list(name = "Jenny", born = "in Atlanta"),

    "{name} was born {born}."

    )

    #> Jenny was born in Atlanta.


    glue_data(got_chars[[2]], "{name} was born {born}.")

    #> Tyrion Lannister was born In 273 AC, at Casterly Rock.

    glue_data(got_chars[[9]], "{name} was born {born}.")

    #> Daenerys Targaryen was born In 284 AC, at Dragonstone.

    View Slide

  34. glue_data(got_chars[[9]], "{name} was born {born}.")
    ~ glue_data( .x , "{name} was born {born}.")
    replace your
    example with .x
    prefix with ~ to say
    "it's a formula!"

    View Slide

  35. map_chr(got_chars, ~ glue_data(.x, "{name} was born {born}."))

    #> [1] "Theon Greyjoy was born In 278 AC or 279 AC, at Pyke."
    #> [2] "Tyrion Lannister was born In 273 AC, at Casterly Rock."
    #> [3] "Victarion Greyjoy was born In 268 AC or before, at Pyke."
    #> [4] "Will was born ."
    #> [5] "Areo Hotah was born In 257 AC or before, at Norvos."
    #> [6] "Chett was born At Hag's Mire."
    #> [7] "Cressen was born In 219 AC or 220 AC."
    #> [8] "Arianne Martell was born In 276 AC, at Sunspear."
    #> [9] "Daenerys Targaryen was born In 284 AC, at Dragonstone."
    drop-in to any member
    of the map_*() family

    View Slide

  36. name


    stuff


    this is a data frame!
    a tibble, specifically

    View Slide

  37. Why put a list into a data frame?
    safety & convenience
    •Manage multiple vectors holistically
    •Use existing toolkit for filter, select, etc.

    View Slide

  38. What happens in the
    data frame
    Stays in the data frame

    View Slide

  39. last R example:
    list in a data frame = list-column

    View Slide

  40. lists are part of life
    RStudio Object viewer helps
    tibbles are list-friendly
    map() functions help you
    compute on & simplify lists

    View Slide