The workflowr R package: a framework for reproducible and collaborative data science

The workflowr R package: a framework for reproducible and collaborative data science

The workflowr R package helps scientists organize their research in a way that promotes effective project management, reproducibility, collaboration, and sharing of results. workflowr combines literate programming (knitr and rmarkdown) and version control (Git, via git2r) to generate a website containing time-stamped, versioned, and documented results. Any R user can quickly and easily adopt workflowr, which includes four key features: (1) workflowr automatically creates a directory structure for organizing data, code, and results; (2) workflowr uses the version control system Git to track different versions of the code and results without the user needing to understand Git syntax; (3) to support reproducibility, workflowr automatically includes code version information in webpages displaying results and; (4) workflowr facilitates online Web hosting (e.g. GitHub Pages) to share results. Our goal is that workflowr will make it easier for scientists to organize and communicate reproducible research results. Documentation and source code are available at


John Blischak

July 11, 2018


  1. The  workflowr R  package:  a   framework  for  reproducible  

    and  collaborative  data   science John  Blischak  (@jdblischak) 2018-­07-­11 useR!  2018  Brisbane,  Australia
  2. My  computational  challenges Organizing  files Tracking  intermediate  results Sharing  results

    John  Blischak  -­
  3. John  Blischak  -­

  4. Literate  programming John  Blischak  -­ Source  code Results file.Rmd

  5. R  Markdown  websites John  Blischak  -­

  6. Version  control John  Blischak  -­ version:  2rko6xn message:  Start

     new… version:  d1zyskv message:  Update  parameters… version:  z6o3b97 message:  Label  axes… git-­
  7. Version  control  terminology repository – the  tracked  files  and  their

     revision  history commit – a  snapshot  of  the  current  state  of  the  files John  Blischak  -­
  8. Web  hosting GitHub  Pages  – hosts  one  website  per  code

     repository John  Blischak  -­
  9. workflowr Organized Reproducible Shareable John  Blischak  -­ Version-­controlled  websites

  10. Organized John  Blischak  -­

  11. Start  a  new  project >  wflow_start("myproject") 1.  Creates  directory  with

     template  files 2.  Changes  working  directory 3.  Initiates  Git  repository  and  commits  files Also  available  as  RStudio Project  Template John  Blischak  -­
  12. Organized  directory  structure John  Blischak  -­ R  Markdown  files

    HTML  files Website  options
  13. Reproducible John  Blischak  -­

  14. Run  code  in  clean  environment John  Blischak  -­ >

     wflow_build(c("f1.Rmd",  "f2.Rmd")) f1.Rmd f2.Rmd­lib/callr
  15. Tracking  intermediate  results >  wflow_publish("analysis/file.Rmd") Performs  3-­steps: 1. Commits  analysis/file.Rmd

    2. Builds analysis/file.Rmd 3. Commits  docs/file.html and  figure  files John  Blischak  -­
  16. Combining  rmarkdown and  Git John  Blischak  -­ Source  code

    Results 1ong9jt ln412fy Source  code Results wr1q7bk 3tg6lse
  17. View  past  results John  Blischak  -­

  18. Other  reproducibility  features output:  workflowr::wflow_html Records  the  session  information  at

     the  end Sets  a  seed  prior  to  running  code John  Blischak  -­
  19. Reproducibility  report John  Blischak  -­

  20. Shareable John  Blischak  -­

  21. Distribute  results  for  sharing Create  new  GitHub  repository >  wflow_git_push()

    John  Blischak  -­ ©  2018  GitHub  Inc.
  22. Installation 1. Install  R ◦ (Recommended)  Install  RStudio ◦ (Optional)

     Install  pandoc ◦ (Optional)  Install  Git 2. Install  workflowr from  CRAN ◦ install.packages("workflowr") 3. Create  an  account  on  GitHub Documentation: John  Blischak  -­
  23. In  summary,  using  workflowr… Enables  you  to  start  working  reproducibly

     immediately Allows  you  to  focus  on  your  analysis Shares  your  results  online John  Blischak  -­
  24. Acknowledgements Co-­authors:  Peter  Carbonetto,  Matthew  Stephens Early  adopters  for  testing

     and  feedback Authors  and  contributors  to  knitr,  rmarkdown,  git2r,  callr John  Blischak  -­
  25. workflowr Organized Reproducible Shareable John  Blischak  -­ Version-­controlled  websites