  Jennifer Bryan 
 RStudio, University of British Columbia @JennyBryan @jennybc Go here for useful links to stuff mentioned in this talk!!

workflow you should have one

‑decision fatigue ‑unique and special ❆❄❅ ‐ predictability ‐ proficiency ‐ access to help

Here’s my highly polished blog post about deep learning. Here’s how I organized the files and wrangled the data.

Import Tidy Communicate Transform Visualise Model

Import Tidy Communicate Transform Visualise Model

Everything that exists in R is an object. Everything that happens in R is a function call. Interfaces to other software are part of R. — John Chambers

Import Tidy Communicate Transform Visualise Model

Import Tidy Communicate Transform Visualise Model

Slide 11 text readxl

googlesheets + googledrive googlesheets4 =

What is your development environment? How do you organize a project? How do you manage a project over time? What about collaboration?

What is your default data receptacle? How do you manipulate data? How do you iterate?

Good enough practices in scientific computing Wilson, Bryan, Cranston, Kitzes, Nederbragt, Teal

Excuse me, do you have a moment to talk about version control?

workflow example #1

One folder per project That folder is an • RStudio Project (package? website? whatever) • Git repo, with associated GitHub remote Work on multiple projects at once w/ multiple instances of RStudio • Each gets own child R process • R & file browser have sane working directory

If the first line of your #rstats script is setwd("C:\Users\jenny\path\that\only\I\have"), I will come into your lab and SET YOUR COMPUTER ON FIRE . — Mash-up of rage tweets by @jennybc and @tpoi.

Use here package to build paths within a Project Paths are robust to different working directories within the Project • Render .R and .Rmd that live in sub-folders! • Write paths in tests and vignettes w/o fear! here wraps the more powerful rprojroot package

library(here) #> here() starts at /here-demo system("tree") #> . #> !"" one #> !"" two #> !"" awesome.txt here("one", "two", "awesome.txt") #> [1] "/here-demo/one/two/awesome.txt" cat(readLines(here("one", "two", "awesome.txt"))) #> OMG this is so awesome! setwd(here("one")) getwd() #> [1] "/here-demo/one" here("one", "two", "awesome.txt") #> [1] "/here-demo/one/two/awesome.txt” cat(readLines(here("one", "two", "awesome.txt"))) #> OMG this is so awesome!

workflow example #2

list-columns EmbRAce tHe aWkwArd

#rstats lists via lego

map(.x, .f, ...) purrr::

map(.x, .f, ...) for every element of .x apply .f

.x = minis

map(minis, antennate)

.x = minis

map(minis, "pants")

.y = hair .x = minis

map2(minis, hair, enhair)

.y = weapons .x = minis

map2(minis, weapons, arm)

minis %>% map2(hair, enhair) %>% map2(weapons, arm)

this is a data frame! atomic vector list column

data frame nested data frame

gap_nested <- gapminder %>%
 group_by(country) %>%
 #> # A tibble: 142 × 2
 #> country data
 #> 1 Afghanistan 
 #> 2 Albania 
 #> 3 Algeria 
 #> 4 Angola 
 #> 5 Argentina 
 #> 6 Australia 
 #> 7 Austria 
 #> 8 Bahrain 
 #> 9 Bangladesh 
 #> 10 Belgium 
 #> # ... with 132 more rows

gap_fits <- gap_nested %>%
 mutate(fit = map(data, ~ lm(lifeExp ~ year, data = .x)))
 gap_fits %>% tail(3)
 #> # A tibble: 3 × 3
 #> country data fit
 #> 1 Yemen, Rep. 
 #> 2 Zambia 
 #> 3 Zimbabwe 
 canada <- which(gap_fits$country == "Canada")
 #> . . .
 #> Coefficients:
 #> Estimate Std. Error t value Pr(>|t|) 
 #> (Intercept) -3.583e+02 8.252e+00 -43.42 1.01e-12 ***
 #> year 2.189e-01 4.169e-03 52.50 1.52e-13 ***
 #> . . . 
 #> Residual standard error: 0.2492 on 10 degrees of freedom
 #> Multiple R-squared: 0.9964, Adjusted R-squared: 0.996 
 #> F-statistic: 2757 on 1 and 10 DF, p-value: 1.521e-1

gap_fits %>%
 mutate(rsq = map_dbl(fit, ~ summary(.x)[["r.squared"]])) %>%
 #> # A tibble: 142 × 4
 #> country data fit rsq
 #> 1 Rwanda 0.01715964
 #> 2 Botswana 0.03402340
 #> 3 Zimbabwe 0.05623196
 #> 4 Zambia 0.05983644
 #> 5 Swaziland 0.06821087
 #> 6 Lesotho 0.08485635
 #> 7 Cote d'Ivoire 0.28337240
 #> 8 South Africa 0.31246865
 #> 9 Uganda 0.34215382
 #> 10 Congo, Dem. Rep. 0.34820278
 #> # ... with 132 more rows

gap_fits %>%
 mutate(coef = map(fit, broom::tidy)) %>%
 #> # A tibble: 284 × 6
 #> country term estimate std.error statistic
 #> 1 Afghanistan (Intercept) -507.5342716 40.484161954 -12.536613
 #> 2 Afghanistan year 0.2753287 0.020450934 13.462890
 #> 3 Albania (Intercept) -594.0725110 65.655359062 -9.048348
 #> 4 Albania year 0.3346832 0.033166387 10.091036
 #> 5 Algeria (Intercept) -1067.8590396 43.802200843 -24.379118
 #> 6 Algeria year 0.5692797 0.022127070 25.727749
 #> 7 Angola (Intercept) -376.5047531 46.583370599 -8.082385
 #> 8 Angola year 0.2093399 0.023532003 8.895964
 #> 9 Argentina (Intercept) -389.6063445 9.677729641 -40.258031
 #> 10 Argentina year 0.2317084 0.004888791 47.395847
 #> # ... with 274 more rows, and 1 more variables: p.value

maybe you don’t, because it’s too painful for loops apply(), [slvmt]apply(), split(), by() with plyr: [adl][adl_]ply() with dplyr: df %>% group_by() %>% do() How do you do such things today?

Many other worked examples here:

@JennyBryan @jennybc  