Slide 1

Slide 1 text

Workflow for Reproducible Science (+ a few NLP tools) Molly Lewis 1 June 2018

Slide 2

Slide 2 text

Ideal properties of research workflow • Reproducible by you and someone else (do the same thing again) • Transparent • Minimize errors and maximize forgiveness of errors • Fast and efficient • Easy to collaborate with other people • Longevity

Slide 3

Slide 3 text

My workflow 1. Directory organization and version control • Git/Github 2. [Analysis preregistration] 3. [Data collection] 4. Data analysis • R, Rstudio, tidyverse • Rmarkdown • Shiny 5. Reproducibile paper • Rmarkdown (papaja package)

Slide 4

Slide 4 text

Case Study: Still suspicious: The suspicious coincidence effect revisited (Lewis & Frank, in press). • Series of 12 studies on Turk • Replicating previous experiments • “Suspicious Coincidence” “dax” Learning Exemplars One Subordinate Basic Superordinate Generalization Exemplars

Slide 5

Slide 5 text

Directory Structure Thoughts on file naming (via Jenny Bryan): • Human readable - interpretable • Machine readable – easy to search, read and write names • Plays well with default order in OS – put something numeric first

Slide 6

Slide 6 text

Git/Github • Version control system - takes snapshots of your work periodically • Git = local; Github in the cloud • Lots of good tutorials (e.g. https://try.github.io/levels/1/challenges/1) • Command line interface and point-and-click GUIs https://github.com/ git push git checkout / git pull / git clone

Slide 7

Slide 7 text

Some notes on git workflow • Public vs. private repositories (https://github.com/mllewis) • Can always go back to previous commit (with checkout) • Can link to OSF (useful for preregistrations): https://osf.io/yekhj/ • .gitignore files • Readme.md

Slide 8

Slide 8 text

Data analysis via R markdowns • R markdowns allow for “literate programming” – integrate code with comments about code, output, and plots. • Contain three components: • Header • Body • Code chunks • Can render the raw file into pretty version (“knitting”) – pdf, html, or word document • Then can push to Rpubs.com as way to share with other people • Can also create interactive reports with Shiny (e.g., https://mlewis.shinyapps.io/xtmem_SI/)

Slide 9

Slide 9 text

Writing papers in the markdown workflow • Just like normal writing of papers – but no copying pasting of results! • If you change one aspect of your analysis, automatically propagates through out • Use package called papaja (https://github.com/crsh/papaja) - use Rmarkdown to write APA style journal articles • Fewer errors but also way easier

Slide 10

Slide 10 text

NLP tools https://github.com/bmschmidt/wordVectors install.packages(“tidytext”) install.packages("devtools") devtools::install_github("bmschmidt/wordVectors") Pretrained vectors on Wikipedia: https://github.com/facebookresearch/fastText