The workflowr R package: a
framework for reproducible
and collaborative data
science
John Blischak (@jdblischak)
2018-07-11
useR! 2018 Brisbane, Australia
github.com/jdblischak/workflowr
Slide 2
Slide 2 text
My computational challenges
Organizing files
Tracking intermediate results
Sharing results
John Blischak - github.com/jdblischak/workflowr
R Markdown websites
John Blischak - github.com/jdblischak/workflowr rmarkdown.rstudio.com
Slide 6
Slide 6 text
Version control
John Blischak - github.com/jdblischak/workflowr
version: 2rko6xn
message: Start new…
version: d1zyskv
message: Update parameters…
version: z6o3b97
message: Label axes…
git-scm.com
github.com/ropensci/git2r
Slide 7
Slide 7 text
Version control terminology
repository – the tracked files and their revision history
commit – a snapshot of the current state of the files
John Blischak - github.com/jdblischak/workflowr
Slide 8
Slide 8 text
Web hosting
GitHub Pages – hosts one website per code repository
John Blischak - github.com/jdblischak/workflowr pages.github.com
Slide 9
Slide 9 text
workflowr
Organized
Reproducible
Shareable
John Blischak - github.com/jdblischak/workflowr
Version-controlled websites
Slide 10
Slide 10 text
Organized
John Blischak - github.com/jdblischak/workflowr
Slide 11
Slide 11 text
Start a new project
> wflow_start("myproject")
1. Creates directory with template files
2. Changes working directory
3. Initiates Git repository and commits files
Also available as RStudio Project Template
John Blischak - github.com/jdblischak/workflowr
Slide 12
Slide 12 text
Organized directory structure
John Blischak - github.com/jdblischak/workflowr
R Markdown files
HTML files
Website options
Slide 13
Slide 13 text
Reproducible
John Blischak - github.com/jdblischak/workflowr
Slide 14
Slide 14 text
Run code in clean environment
John Blischak - github.com/jdblischak/workflowr
> wflow_build(c("f1.Rmd", "f2.Rmd"))
f1.Rmd
f2.Rmd
github.com/r-lib/callr
Combining rmarkdown and Git
John Blischak - github.com/jdblischak/workflowr
Source code Results
1ong9jt ln412fy
Source code Results
wr1q7bk 3tg6lse
Slide 17
Slide 17 text
View past results
John Blischak - github.com/jdblischak/workflowr
Slide 18
Slide 18 text
Other reproducibility features
output: workflowr::wflow_html
Records the session information at the end
Sets a seed prior to running code
John Blischak - github.com/jdblischak/workflowr
Slide 19
Slide 19 text
Reproducibility report
John Blischak - github.com/jdblischak/workflowr
Slide 20
Slide 20 text
Shareable
John Blischak - github.com/jdblischak/workflowr
Installation
1. Install R
◦ (Recommended) Install RStudio
◦ (Optional) Install pandoc
◦ (Optional) Install Git
2. Install workflowr from CRAN
◦ install.packages("workflowr")
3. Create an account on GitHub
Documentation: https://jdblischak.github.io/workflowr
John Blischak - github.com/jdblischak/workflowr
Slide 23
Slide 23 text
In summary, using workflowr…
Enables you to start working reproducibly immediately
Allows you to focus on your analysis
Shares your results online
John Blischak - github.com/jdblischak/workflowr
Slide 24
Slide 24 text
Acknowledgements
Co-authors: Peter Carbonetto, Matthew Stephens
Early adopters for testing and feedback
Authors and contributors to knitr, rmarkdown, git2r, callr
John Blischak - github.com/jdblischak/workflowr
Slide 25
Slide 25 text
workflowr
Organized
Reproducible
Shareable
John Blischak - github.com/jdblischak/workflowr
Version-controlled websites