Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to run a lab for reproducible research

Lorena A. Barba
February 21, 2017

How to run a lab for reproducible research

Invited talk at the NSF SI2 PI Workshop, Arlington, VA (21 Feb. 2017).

Please cite as:
Barba, Lorena A. (2017): How to run a lab for reproducible research. figshare.
https://doi.org/10.6084/m9.figshare.4676170.v1

Summary:
As a principal investigator, how do you run your lab for reproducibility? I submit the following action areas: commitment, transparency and open science, onboarding, collaboration, community and leadership. Make a public commitment to reproducible research—what this means for you could differ from others, but an essential core is common to all. Transparency is an essential value, and embracing open science is the best route to realize it. Onboarding every lab member with a deliberate group “syllabus” for reproducibility sets the expectations high. What is your list of must-read literature on reproducible research? I can share mine with you: my lab members helped to make it. For collaborating efficiently and building community, we take inspiration from the open-source world. We adopt its technology platforms to work on software and to communicate, openly and collaboratively. Key to the open-source culture is to give credit—give lots of credit for every contribution: code, documentation, tests, issue reports! The tools and methods require training, but running a lab for reproducibility is your decision. Start here–>commitment.

See also:
—Barba, Lorena A. (2016): "The hard road to reproducibility," Science, Vol. 354, Issue 6308, pp. 142 http://science.sciencemag.org/content/354/6308/142
—Barba, Lorena A. (2016): "Barba-group reproducibility syllabus," on Medium / Hacker Noon, https://hackernoon.com/barba-group-reproducibility-syllabus-e3757ee635cf#.a9qtcndo6
—Barba, Lorena A. (2012): Reproducibility PI Manifesto. figshare.
https://doi.org/10.6084/m9.figshare.104539.v1

Lorena A. Barba

February 21, 2017
Tweet

More Decks by Lorena A. Barba

Other Decks in Research

Transcript

  1. NSF SI2 PI Workshop 2017
    How to run a lab for reproducible research
    @LorenaABarba

    View Slide

  2. “… if everyone on a research team knows
    that everything they do is going to someday be
    published for reproducibility, they’ll behave
    differently from day one.”
    Donoho et al., Comput. Science Eng. 2009

    View Slide

  3. Action items
    1. Commitment
    2. Transparency &
    Open Science
    3. Onboarding
    4. Collaboration
    5. Community &
    Leadership

    View Slide

  4. Commitment

    View Slide

  5. Jon F. Claerbout
    Professor Emeritus of Geophysics
    Stanford University
    … pioneered the use of computers
    in processing and filtering seismic
    exploration data [Wikipedia]
    … from 1991, he required theses
    to conform to a standard of
    reproducibility.

    View Slide

  6. Def.— Reproducible research
    Authors provide all the necessary data and the
    computer codes to run the analysis again, re-
    creating the results.
    Schwab, M., Karrenbach, N., Claerbout, J. (2000) “Making
    scientific computations reproducible,” Computing in Science and
    Engineering Vol. 2(6):61–67

    View Slide

  7. http://lorenabarba.com/gallery/reproducibility-pi-manifesto/

    View Slide

  8. ‣ I teach my graduate students about reproducibility
    ‣ All our research code (and writing) is under version control
    ‣ We always carry out verification & validation (and make them public)
    ‣ For main results, we share data, plotting script & figure under CC-BY
    ‣ We upload preprint to arXiv at the time of submission to a journal
    ‣ We release code at the time of submission of a paper to a journal
    ‣ We add a “Reproducibility” declaration at the end of each paper
    ‣ I develop a consistent open-science policy & keep an up-to-date web
    presence
    Reproducibility PI Manifesto (2012)

    View Slide

  9. Not everyone
    agrees
    Two points of
    contention:
    — scripted figures (vs.
    GUI-based tools)
    — version control

    View Slide

  10. “I’ve learned that interactive programs are slavery
    (unless they include the ability to arrive in any
    previous state by means of a script).”
    — Jon Claerbout
    GUIs

    View Slide

  11. View Slide

  12. Version control

    View Slide

  13. “private reproducibility”
    …we can rebuild our own past research
    results from the precise version of the code
    that was used to create them.

    View Slide

  14. What is Science?
    ‣ American Physical Society:
    - Ethics and Values, 1999
    "The success and credibility of science are anchored
    in the willingness of scientists to […] Expose their
    ideas and results to independent testing and
    replication by others. This requires the open
    exchange of data, procedures and materials."
    https://www.aps.org/policy/statements/99_6.cfm

    View Slide

  15. Transparency &
    Open Science
    Donoho, D. et al. (2009) “Reproducible research in computational
    harmonic analysis,” Computing in Science and Engineering Vol. 11(1):8–18.

    View Slide

  16. Data and Code Sharing Recommendations
    ‣ assign a unique identifier to every version of the data and code
    ‣ describe in each publication the computing environment used
    ‣ use open licenses and non-proprietary formats
    ‣ publish under open-access conditions (and/or post pre-prints)

    View Slide

  17. Open-source licenses:
    People can coordinate their work freely, within
    the confines of copyright law, while making
    access and wide distribution a priority.

    View Slide

  18. arXiv
    https://www.simonsfoundation.org/report2015/stories/arxiv.html

    View Slide

  19. View Slide

  20. http://dx.doi.org/10.1073/pnas.1421412111

    View Slide

  21. “The key is prevention via the training of
    more people on techniques for data
    analysis and reproducible research.”
    Leek & Peng, PNAS 2015

    View Slide

  22. Onboarding

    View Slide

  23. https://medium.com/@lorenaabarba

    View Slide

  24. A syllabus for research computing
    1. command line utilities in Unix/Linux
    2. an open-source scientific software ecosystem (our favorite is
    Python's)
    3. software version control (we like the distributed kind: our
    favorite is git / GitHub)
    4. good practices for scientific software development: code
    hygiene and testing
    5. knowledge of licensing options for sharing software
    https://barbagroup.github.io/essential_skills_RRC/

    View Slide

  25. Collaboration
    https://github.com/barbagroup

    View Slide

  26. View Slide

  27. Community & Leadership
    … clustering of similar areas of interest that
    allows for interaction, sharing, dialoguing,
    and thinking together
    —George Siemens, 2004

    View Slide

  28. View Slide

  29. ReproPacks
    For main results in a paper, we
    share data, plotting script &
    figure under CC-BY.
    File bundle with input data,
    running scripts, plotting scripts,
    and figure.
    We cite our own figure in the
    caption!

    View Slide

  30. NSF SI2 PI Workshop 2017
    How to run a lab for reproducible research
    @LorenaABarba

    View Slide