Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Science Reproducibility Taxonomy

Science Reproducibility Taxonomy

Presentation slides for the 2017 Workshop on Reproducibility Taxonomies for Computing and Computational Science
July 25, 2017
https://collegeville.github.io/repeto/ReproducibilityWorkshop2017.html

Cite as:
Barba, Lorena A. (2017): Science Reproducibility Taxonomy. figshare.
https://doi.org/10.6084/m9.figshare.5248273.v1

Also follow the link above for the text of the presenter notes.

Lorena A. Barba

July 26, 2017
Tweet

More Decks by Lorena A. Barba

Other Decks in Science

Transcript

  1. Reproducibility Taxonomies for Computing and Computational Science
    National Science Foundation, 25 July 2017
    Science Reproducibility Taxonomy
    @LorenaABarba

    View Slide

  2. Jon F. Claerbout
    Professor Emeritus of Geophysics
    Stanford University
    … pioneered the use of computers
    in processing and filtering seismic
    exploration data [Wikipedia]
    … from 1991, he required theses
    to conform to a standard of
    reproducibility.

    View Slide

  3. Def.— Reproducible research
    Authors provide all the necessary data and the
    computer codes to run the analysis again, re-
    creating the results.
    Schwab, M., Karrenbach, N., Claerbout, J. (2000) “Making
    scientific computations reproducible,” Computing in Science and
    Engineering Vol. 2(6):61–67

    View Slide

  4. Invited paper at the October 1992 meeting of the
    Society of Exploration Geophysics
    http://library.seg.org/doi/abs/10.1190/1.1822162

    View Slide

  5. “In 1990, we set this sequence of goals:
    1.Learn how to merge a publication with its underlying computational analysis.
    2.Teach researchers how to prepare a document in a form where they themselves
    can reproduce their own research results a year or more later by “pressing a
    single button”.
    3.Learn how to leave finished work in a condition where coworkers can reproduce
    the calculation including the final illustration by pressing a button in its caption.
    4.Prepare a complete copy of our local software environment so that graduating
    students can take their work away with them to other sites, press a button, and
    reproduce their Stanford work.
    5.Merge electronic documents written by multiple authors (SEP reports).
    6.Export electronic documents to numerous other sites (sponsors) so they can
    readily reproduce a substantial portion of our Stanford research.

    View Slide

  6. View Slide

  7. “… because of the time, expense, and opportunism
    of many current epidemiologic studies, it is often
    impossible to fully replicate their findings. An
    attainable minimum standard is ‘reproducibility,’
    which calls for data sets and software to be made
    available for verifying published findings and
    conducting alternative analyses."

    View Slide

  8. http://dx.doi.org/10.1109/MCSE.2009.15

    View Slide

  9. Yale Roundtable on Data and Code Sharing
    ‣ Nov. 2009: 14 contributed thought pieces
    ‣ “Data and Code Sharing Declaration”

    ... demanding a resolution to the credibility crisis from the
    lack of reproducible research in computational science.
    SEPT/OCT 2010 | COMPUTING IN SCIENCE AND ENGINEERING

    View Slide

  10. Data and Code Sharing Recommendations
    ‣ assign a unique identifier to every version of the data and code
    ‣ describe in each publication the computing environment used
    ‣ use open licenses and non-proprietary formats
    ‣ publish under open-access conditions (and/or post pre-prints)

    View Slide

  11. View Slide

  12. View Slide

  13. Def.— Replication
    Arriving at the same scientific findings as
    another study, collecting new data (possibly
    with different methods) and completing new
    analyses.
    Roger D. Peng (2011), “Reproducible Research in Computational
    Science” Science, Vol. 334, Issue 6060, pp. 1226-1227

    View Slide

  14. http://dx.doi.org/10.1109/MCSE.2012.38

    View Slide

  15. “private reproducibility”
    …we can rebuild our own past research
    results from the precise version of the code
    that was used to create them.

    View Slide

  16. What is Science?
    ‣ American Physical Society:
    - Ethics and Values, 1999
    "The success and credibility of science are anchored
    in the willingness of scientists to […] Expose their
    ideas and results to independent testing and
    replication by others. This requires the open
    exchange of data, procedures and materials."
    https://www.aps.org/policy/statements/99_6.cfm

    View Slide

  17. http://dx.doi.org/10.1371/journal.pcbi.1003285

    View Slide

  18. Reproducible Research 10 Simple Rules
    1. For every result, keep track of how it was produced
    2. Avoid manual data-manipulation steps
    3. Archive the exact versions of all external programs used
    4. Version-control all custom scripts
    5. Record all intermediate results, when possible in standard formats
    6. For analyses that include randomness, note underlying random seeds
    7. Always store raw data behind plots
    8. Generate hierarchical analysis output, allowing layers of increasing detail
    to be inspected
    9. Connect textual statements to underlying results
    10.Provide public access to scripts, runs, and results

    View Slide

  19. https://doi.org/10.1371/journal.pone.0080278

    View Slide

  20. https://doi.org/10.1371/journal.pbio.1002333

    View Slide

  21. NSF 17-022
    Dear Colleague Letter: Encouraging Reproducibility
    in Computing and Communications Research
    CISE, October 21, 2016
    https://www.nsf.gov/pubs/2017/nsf17022/nsf17022.jsp

    View Slide

  22. NSF SBE subcommittee on replicability in science:
    “reproducibility refers to the ability of a researcher to
    duplicate results of a prior study using the same materials as
    were used by the original investigator."
    “… new evidence is provided by new experimentation,
    defined in the NSF report as ‘replicability’ “
    SBE, May 2015

    View Slide

  23. View Slide

  24. https://cos.io/our-services/top-guidelines/

    View Slide

  25. Reproducible Research Track
    (peer reviewed)
    Lorena A. Barba
    George Washington University
    [email protected]
    George K. Thiruvathukal
    Loyola University Chicago
    [email protected]
    https://www.computer.org/cise/

    View Slide

  26. Technical Consortium on High Performance Computing
    New initiative on Reproducibility, led by Barba.
    https://www.computer.org/web/tchpc

    View Slide

  27. http://joss.theoj.org

    View Slide