Upgrade to Pro — share decks privately, control downloads, hide ads and more …

workflow: you should have one

workflow: you should have one

Keynote talk at EARL London 2017 on the value of developing an intentional workflow. Find links to all the goodies here: https://github.com/jennybc/earl-london-2017-bryan#readme

Jennifer (Jenny) Bryan

September 13, 2017
Tweet

More Decks by Jennifer (Jenny) Bryan

Other Decks in Programming

Transcript

  1.  
    Jennifer Bryan 

    RStudio, University of British Columbia
    @JennyBryan @jennybc
    bit.ly/jenny-earl
    Go here for useful links to stuff
    mentioned in this talk!!

    View full-size slide

  2. workflow
    you should have one

    View full-size slide

  3. ‑decision fatigue
    ‑unique and special ❆❄❅
    ‐ predictability
    ‐ proficiency
    ‐ access to help

    View full-size slide

  4. Here’s my highly
    polished blog post
    about deep learning.
    Here’s how I
    organized the files and
    wrangled the data.

    View full-size slide

  5. Import
    Tidy
    Communicate
    Transform
    Visualise
    Model

    View full-size slide

  6. Import
    Tidy
    Communicate
    Transform
    Visualise
    Model

    View full-size slide

  7. Everything that exists in R is an object.
    Everything that happens in R is a function call.
    Interfaces to other software are part of R.
    — John Chambers

    View full-size slide

  8. Import
    Tidy
    Communicate
    Transform
    Visualise
    Model

    View full-size slide

  9. Import
    Tidy
    Communicate
    Transform
    Visualise
    Model

    View full-size slide

  10. http://readxl.tidyverse.org
    readxl
    www.rstudio.com

    View full-size slide

  11. http://googledrive.tidyverse.org

    View full-size slide

  12. googlesheets
    +
    googledrive
    googlesheets4
    =

    View full-size slide

  13. What is your development environment?
    How do you organize a project?
    How do you manage a project over time?
    What about collaboration?

    View full-size slide

  14. What is your default data receptacle?
    How do you manipulate data?
    How do you iterate?

    View full-size slide

  15. http://stat545.com

    View full-size slide

  16. Good enough practices in scientific computing
    Wilson, Bryan, Cranston, Kitzes, Nederbragt, Teal
    https://doi.org/10.1371/journal.pcbi.1005510
    http://bit.ly/good-enuff

    View full-size slide

  17. Excuse me, do you have a moment
    to talk about version control?
    https://doi.org/10.7287/peerj.preprints.3159v2

    View full-size slide

  18. happygitwithr.com

    View full-size slide

  19. http://reprex.tidyverse.org

    View full-size slide

  20. workflow
    example #1

    View full-size slide

  21. One folder per project
    That folder is an
    • RStudio Project (package? website? whatever)
    • Git repo, with associated GitHub remote
    Work on multiple projects at once w/ multiple
    instances of RStudio
    • Each gets own child R process
    • R & file browser have sane working directory

    View full-size slide

  22. If the first line of your #rstats script is
    setwd("C:\Users\jenny\path\that\only\I\have"),
    I will come into your lab and SET YOUR COMPUTER ON FIRE .
    — Mash-up of rage tweets by @jennybc and @tpoi.

    View full-size slide

  23. Use here package to build paths within a Project
    Paths are robust to different working directories
    within the Project
    • Render .R and .Rmd that live in sub-folders!
    • Write paths in tests and vignettes w/o fear!
    here wraps the more powerful rprojroot package

    View full-size slide

  24. library(here)
    #> here() starts at /here-demo
    system("tree")
    #> .
    #> !"" one
    #> !"" two
    #> !"" awesome.txt
    here("one", "two", "awesome.txt")
    #> [1] "/here-demo/one/two/awesome.txt"
    cat(readLines(here("one", "two", "awesome.txt")))
    #> OMG this is so awesome!
    setwd(here("one"))
    getwd()
    #> [1] "/here-demo/one"
    here("one", "two", "awesome.txt")
    #> [1] "/here-demo/one/two/awesome.txt”
    cat(readLines(here("one", "two", "awesome.txt")))
    #> OMG this is so awesome!

    View full-size slide

  25. workflow
    example #2

    View full-size slide

  26. list-columns
    EmbRAce
    tHe
    aWkwArd

    View full-size slide

  27. #rstats lists via lego

    View full-size slide

  28. map(.x, .f, ...)
    purrr::

    View full-size slide

  29. map(.x, .f, ...)
    for every element of .x
    apply .f

    View full-size slide

  30. map(minis, antennate)

    View full-size slide

  31. map(minis, "pants")

    View full-size slide

  32. .y = hair
    .x = minis

    View full-size slide

  33. map2(minis, hair, enhair)

    View full-size slide

  34. .y = weapons
    .x = minis

    View full-size slide

  35. map2(minis, weapons, arm)

    View full-size slide

  36. minis %>%
    map2(hair, enhair) %>%
    map2(weapons, arm)

    View full-size slide

  37. this is a data frame!
    atomic
    vector
    list
    column

    View full-size slide

  38. data frame nested data frame

    View full-size slide

  39. gap_nested <- gapminder %>%

    group_by(country) %>%

    nest()

    gap_nested

    #> # A tibble: 142 × 2

    #> country data

    #> 

    #> 1 Afghanistan 

    #> 2 Albania 

    #> 3 Algeria 

    #> 4 Angola 

    #> 5 Argentina 

    #> 6 Australia 

    #> 7 Austria 

    #> 8 Bahrain 

    #> 9 Bangladesh 

    #> 10 Belgium 

    #> # ... with 132 more rows

    View full-size slide

  40. gap_fits <- gap_nested %>%

    mutate(fit = map(data, ~ lm(lifeExp ~ year, data = .x)))


    gap_fits %>% tail(3)

    #> # A tibble: 3 × 3

    #> country data fit

    #> 

    #> 1 Yemen, Rep. 

    #> 2 Zambia 

    #> 3 Zimbabwe 

    canada <- which(gap_fits$country == "Canada")

    summary(gap_fits$fit[[canada]])

    #> . . .

    #> Coefficients:

    #> Estimate Std. Error t value Pr(>|t|) 

    #> (Intercept) -3.583e+02 8.252e+00 -43.42 1.01e-12 ***

    #> year 2.189e-01 4.169e-03 52.50 1.52e-13 ***

    #> . . . 

    #> Residual standard error: 0.2492 on 10 degrees of freedom

    #> Multiple R-squared: 0.9964, Adjusted R-squared: 0.996 

    #> F-statistic: 2757 on 1 and 10 DF, p-value: 1.521e-1

    View full-size slide

  41. gap_fits %>%

    mutate(rsq = map_dbl(fit, ~ summary(.x)[["r.squared"]])) %>%

    arrange(rsq)

    #> # A tibble: 142 × 4

    #> country data fit rsq

    #> 

    #> 1 Rwanda 0.01715964

    #> 2 Botswana 0.03402340

    #> 3 Zimbabwe 0.05623196

    #> 4 Zambia 0.05983644

    #> 5 Swaziland 0.06821087

    #> 6 Lesotho 0.08485635

    #> 7 Cote d'Ivoire 0.28337240

    #> 8 South Africa 0.31246865

    #> 9 Uganda 0.34215382

    #> 10 Congo, Dem. Rep. 0.34820278

    #> # ... with 132 more rows

    View full-size slide

  42. gap_fits %>%

    mutate(coef = map(fit, broom::tidy)) %>%

    unnest(coef)

    #> # A tibble: 284 × 6

    #> country term estimate std.error statistic

    #> 

    #> 1 Afghanistan (Intercept) -507.5342716 40.484161954 -12.536613

    #> 2 Afghanistan year 0.2753287 0.020450934 13.462890

    #> 3 Albania (Intercept) -594.0725110 65.655359062 -9.048348

    #> 4 Albania year 0.3346832 0.033166387 10.091036

    #> 5 Algeria (Intercept) -1067.8590396 43.802200843 -24.379118

    #> 6 Algeria year 0.5692797 0.022127070 25.727749

    #> 7 Angola (Intercept) -376.5047531 46.583370599 -8.082385

    #> 8 Angola year 0.2093399 0.023532003 8.895964

    #> 9 Argentina (Intercept) -389.6063445 9.677729641 -40.258031

    #> 10 Argentina year 0.2317084 0.004888791 47.395847

    #> # ... with 274 more rows, and 1 more variables: p.value

    View full-size slide

  43. maybe you don’t, because it’s too painful
    for loops
    apply(), [slvmt]apply(), split(), by()
    with plyr: [adl][adl_]ply()
    with dplyr: df %>% group_by() %>% do()
    How do you do such things today?

    View full-size slide

  44. Many other worked examples here:
    https://jennybc.github.io/purrr-tutorial/

    View full-size slide

  45. @JennyBryan
    @jennybc


    bit.ly/jenny-earl

    View full-size slide