Reproducible Science Workflow

Workflow for Reproducible Science (+ a few NLP tools) Molly
Lewis 1 June 2018

Ideal properties of research workflow • Reproducible by you and
someone else (do the same thing again) • Transparent • Minimize errors and maximize forgiveness of errors • Fast and efficient • Easy to collaborate with other people • Longevity

My workflow 1. Directory organization and version control • Git/Github
2. [Analysis preregistration] 3. [Data collection] 4. Data analysis • R, Rstudio, tidyverse • Rmarkdown • Shiny 5. Reproducibile paper • Rmarkdown (papaja package)

Case Study: Still suspicious: The suspicious coincidence effect revisited (Lewis
& Frank, in press). • Series of 12 studies on Turk • Replicating previous experiments • “Suspicious Coincidence” “dax” Learning Exemplars One Subordinate Basic Superordinate Generalization Exemplars

Directory Structure Thoughts on file naming (via Jenny Bryan): •
Human readable - interpretable • Machine readable – easy to search, read and write names • Plays well with default order in OS – put something numeric first

Git/Github • Version control system - takes snapshots of your
work periodically • Git = local; Github in the cloud • Lots of good tutorials (e.g. https://try.github.io/levels/1/challenges/1) • Command line interface and point-and-click GUIs https://github.com/ git push git checkout / git pull / git clone

Some notes on git workflow • Public vs. private repositories
(https://github.com/mllewis) • Can always go back to previous commit (with checkout) • Can link to OSF (useful for preregistrations): https://osf.io/yekhj/ • .gitignore files • Readme.md

Data analysis via R markdowns • R markdowns allow for
“literate programming” – integrate code with comments about code, output, and plots. • Contain three components: • Header • Body • Code chunks • Can render the raw file into pretty version (“knitting”) – pdf, html, or word document • Then can push to Rpubs.com as way to share with other people • Can also create interactive reports with Shiny (e.g., https://mlewis.shinyapps.io/xtmem_SI/)

Writing papers in the markdown workflow • Just like normal
writing of papers – but no copying pasting of results! • If you change one aspect of your analysis, automatically propagates through out • Use package called papaja (https://github.com/crsh/papaja) - use Rmarkdown to write APA style journal articles • Fewer errors but also way easier

NLP tools https://github.com/bmschmidt/wordVectors install.packages(“tidytext”) install.packages("devtools") devtools::install_github("bmschmidt/wordVectors") Pretrained vectors on Wikipedia:
https://github.com/facebookresearch/fastText

Reproducible Science Workflow

Reproducible Science Workflow

mllewis

More Decks by mllewis

Featured

Transcript

Workflow for Reproducible Science (+ a few NLP tools) Molly

Ideal properties of research workflow • Reproducible by you and

My workflow 1. Directory organization and version control • Git/Github

Case Study: Still suspicious: The suspicious coincidence effect revisited (Lewis

Directory Structure Thoughts on file naming (via Jenny Bryan): •

Git/Github • Version control system - takes snapshots of your

Some notes on git workflow • Public vs. private repositories

Data analysis via R markdowns • R markdowns allow for

Writing papers in the markdown workflow • Just like normal

NLP tools https://github.com/bmschmidt/wordVectors install.packages(“tidytext”) install.packages("devtools") devtools::install_github("bmschmidt/wordVectors") Pretrained vectors on Wikipedia: