Reproducible Science Workflow

4a8d62be623e1a3e3cde003a6a810c0b?s=47 mllewis
June 01, 2018
29

Reproducible Science Workflow

4a8d62be623e1a3e3cde003a6a810c0b?s=128

mllewis

June 01, 2018
Tweet

Transcript

  1. Workflow for Reproducible Science (+ a few NLP tools) Molly

    Lewis 1 June 2018
  2. Ideal properties of research workflow • Reproducible by you and

    someone else (do the same thing again) • Transparent • Minimize errors and maximize forgiveness of errors • Fast and efficient • Easy to collaborate with other people • Longevity
  3. My workflow 1. Directory organization and version control • Git/Github

    2. [Analysis preregistration] 3. [Data collection] 4. Data analysis • R, Rstudio, tidyverse • Rmarkdown • Shiny 5. Reproducibile paper • Rmarkdown (papaja package)
  4. Case Study: Still suspicious: The suspicious coincidence effect revisited (Lewis

    & Frank, in press). • Series of 12 studies on Turk • Replicating previous experiments • “Suspicious Coincidence” “dax” Learning Exemplars One Subordinate Basic Superordinate Generalization Exemplars
  5. Directory Structure Thoughts on file naming (via Jenny Bryan): •

    Human readable - interpretable • Machine readable – easy to search, read and write names • Plays well with default order in OS – put something numeric first
  6. Git/Github • Version control system - takes snapshots of your

    work periodically • Git = local; Github in the cloud • Lots of good tutorials (e.g. https://try.github.io/levels/1/challenges/1) • Command line interface and point-and-click GUIs https://github.com/ git push git checkout / git pull / git clone
  7. Some notes on git workflow • Public vs. private repositories

    (https://github.com/mllewis) • Can always go back to previous commit (with checkout) • Can link to OSF (useful for preregistrations): https://osf.io/yekhj/ • .gitignore files • Readme.md
  8. Data analysis via R markdowns • R markdowns allow for

    “literate programming” – integrate code with comments about code, output, and plots. • Contain three components: • Header • Body • Code chunks • Can render the raw file into pretty version (“knitting”) – pdf, html, or word document • Then can push to Rpubs.com as way to share with other people • Can also create interactive reports with Shiny (e.g., https://mlewis.shinyapps.io/xtmem_SI/)
  9. Writing papers in the markdown workflow • Just like normal

    writing of papers – but no copying pasting of results! • If you change one aspect of your analysis, automatically propagates through out • Use package called papaja (https://github.com/crsh/papaja) - use Rmarkdown to write APA style journal articles • Fewer errors but also way easier
  10. NLP tools https://github.com/bmschmidt/wordVectors install.packages(“tidytext”) install.packages("devtools") devtools::install_github("bmschmidt/wordVectors") Pretrained vectors on Wikipedia:

    https://github.com/facebookresearch/fastText