Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Reproducible Science Workflow

mllewis
June 01, 2018
54

Reproducible Science Workflow

mllewis

June 01, 2018
Tweet

Transcript

  1. Workflow for
    Reproducible Science
    (+ a few NLP tools)
    Molly Lewis
    1 June 2018

    View Slide

  2. Ideal properties of research workflow
    • Reproducible by you and someone else (do the same thing again)
    • Transparent
    • Minimize errors and maximize forgiveness of errors
    • Fast and efficient
    • Easy to collaborate with other people
    • Longevity

    View Slide

  3. My workflow
    1. Directory organization and version control
    • Git/Github
    2. [Analysis preregistration]
    3. [Data collection]
    4. Data analysis
    • R, Rstudio, tidyverse
    • Rmarkdown
    • Shiny
    5. Reproducibile paper
    • Rmarkdown (papaja package)

    View Slide

  4. Case Study: Still suspicious: The suspicious coincidence
    effect revisited (Lewis & Frank, in press).
    • Series of 12 studies on Turk
    • Replicating previous experiments
    • “Suspicious Coincidence”
    “dax”
    Learning Exemplars
    One
    Subordinate
    Basic
    Superordinate
    Generalization Exemplars

    View Slide

  5. Directory Structure
    Thoughts on file naming (via Jenny Bryan):
    • Human readable - interpretable
    • Machine readable – easy to search,
    read and write names
    • Plays well with default order in OS –
    put something numeric first

    View Slide

  6. Git/Github
    • Version control system - takes snapshots of your work periodically
    • Git = local; Github in the cloud
    • Lots of good tutorials (e.g. https://try.github.io/levels/1/challenges/1)
    • Command line interface and point-and-click GUIs
    https://github.com/
    git push
    git checkout / git pull / git clone

    View Slide

  7. Some notes on git workflow
    • Public vs. private repositories (https://github.com/mllewis)
    • Can always go back to previous commit (with checkout)
    • Can link to OSF (useful for preregistrations):
    https://osf.io/yekhj/
    • .gitignore files
    • Readme.md

    View Slide

  8. Data analysis via R markdowns
    • R markdowns allow for “literate programming” – integrate code with
    comments about code, output, and plots.
    • Contain three components:
    • Header
    • Body
    • Code chunks
    • Can render the raw file into pretty version (“knitting”) – pdf, html, or
    word document
    • Then can push to Rpubs.com as way to share with other people
    • Can also create interactive reports with Shiny (e.g.,
    https://mlewis.shinyapps.io/xtmem_SI/)

    View Slide

  9. Writing papers in the markdown
    workflow
    • Just like normal writing of papers – but no copying pasting of
    results!
    • If you change one aspect of your analysis, automatically
    propagates through out
    • Use package called papaja (https://github.com/crsh/papaja) -
    use Rmarkdown to write APA style journal articles
    • Fewer errors but also way easier

    View Slide

  10. NLP tools
    https://github.com/bmschmidt/wordVectors
    install.packages(“tidytext”)
    install.packages("devtools")
    devtools::install_github("bmschmidt/wordVectors")
    Pretrained vectors on Wikipedia: https://github.com/facebookresearch/fastText

    View Slide