The workflowr R package: a framework for reproducible and collaborative data science

The workflowr R package: a framework for reproducible and collaborative data science

The workflowr R package helps scientists organize their research in a way that promotes effective project management, reproducibility, collaboration, and sharing of results. workflowr combines literate programming (knitr and rmarkdown) and version control (Git, via git2r) to generate a website containing time-stamped, versioned, and documented results. Any R user can quickly and easily adopt workflowr, which includes four key features: (1) workflowr automatically creates a directory structure for organizing data, code, and results; (2) workflowr uses the version control system Git to track different versions of the code and results without the user needing to understand Git syntax; (3) to support reproducibility, workflowr automatically includes code version information in webpages displaying results and; (4) workflowr facilitates online Web hosting (e.g. GitHub Pages) to share results. Our goal is that workflowr will make it easier for scientists to organize and communicate reproducible research results. Documentation and source code are available at


John Blischak

July 11, 2018


  1. 1.

    The  workflowr R  package:  a   framework  for  reproducible  

    and  collaborative  data   science John  Blischak  (@jdblischak) 2018-­07-­11 useR!  2018  Brisbane,  Australia
  2. 6.

    Version  control John  Blischak  -­ version:  2rko6xn message:  Start

     new… version:  d1zyskv message:  Update  parameters… version:  z6o3b97 message:  Label  axes… git-­
  3. 7.

    Version  control  terminology repository – the  tracked  files  and  their

     revision  history commit – a  snapshot  of  the  current  state  of  the  files John  Blischak  -­
  4. 8.

    Web  hosting GitHub  Pages  – hosts  one  website  per  code

     repository John  Blischak  -­
  5. 11.

    Start  a  new  project >  wflow_start("myproject") 1.  Creates  directory  with

     template  files 2.  Changes  working  directory 3.  Initiates  Git  repository  and  commits  files Also  available  as  RStudio Project  Template John  Blischak  -­
  6. 14.

    Run  code  in  clean  environment John  Blischak  -­ >

     wflow_build(c("f1.Rmd",  "f2.Rmd")) f1.Rmd f2.Rmd­lib/callr
  7. 15.

    Tracking  intermediate  results >  wflow_publish("analysis/file.Rmd") Performs  3-­steps: 1. Commits  analysis/file.Rmd

    2. Builds analysis/file.Rmd 3. Commits  docs/file.html and  figure  files John  Blischak  -­
  8. 16.
  9. 18.

    Other  reproducibility  features output:  workflowr::wflow_html Records  the  session  information  at

     the  end Sets  a  seed  prior  to  running  code John  Blischak  -­
  10. 21.

    Distribute  results  for  sharing Create  new  GitHub  repository >  wflow_git_push()

    John  Blischak  -­ ©  2018  GitHub  Inc.
  11. 22.

    Installation 1. Install  R ◦ (Recommended)  Install  RStudio ◦ (Optional)

     Install  pandoc ◦ (Optional)  Install  Git 2. Install  workflowr from  CRAN ◦ install.packages("workflowr") 3. Create  an  account  on  GitHub Documentation: John  Blischak  -­
  12. 23.

    In  summary,  using  workflowr… Enables  you  to  start  working  reproducibly

     immediately Allows  you  to  focus  on  your  analysis Shares  your  results  online John  Blischak  -­
  13. 24.

    Acknowledgements Co-­authors:  Peter  Carbonetto,  Matthew  Stephens Early  adopters  for  testing

     and  feedback Authors  and  contributors  to  knitr,  rmarkdown,  git2r,  callr John  Blischak  -­