Science Reproducibility Taxonomy

Reproducibility Taxonomies for Computing and Computational Science National Science Foundation,
25 July 2017 Science Reproducibility Taxonomy @LorenaABarba

Jon F. Claerbout Professor Emeritus of Geophysics Stanford University …
pioneered the use of computers in processing and filtering seismic exploration data [Wikipedia] … from 1991, he required theses to conform to a standard of reproducibility.

Def.— Reproducible research Authors provide all the necessary data and
the computer codes to run the analysis again, re- creating the results. Schwab, M., Karrenbach, N., Claerbout, J. (2000) “Making scientiﬁc computations reproducible,” Computing in Science and Engineering Vol. 2(6):61–67

Invited paper at the October 1992 meeting of the Society
of Exploration Geophysics http://library.seg.org/doi/abs/10.1190/1.1822162

“In 1990, we set this sequence of goals: 1.Learn how
to merge a publication with its underlying computational analysis. 2.Teach researchers how to prepare a document in a form where they themselves can reproduce their own research results a year or more later by “pressing a single button”. 3.Learn how to leave ﬁnished work in a condition where coworkers can reproduce the calculation including the ﬁnal illustration by pressing a button in its caption. 4.Prepare a complete copy of our local software environment so that graduating students can take their work away with them to other sites, press a button, and reproduce their Stanford work. 5.Merge electronic documents written by multiple authors (SEP reports). 6.Export electronic documents to numerous other sites (sponsors) so they can readily reproduce a substantial portion of our Stanford research.

“… because of the time, expense, and opportunism of many
current epidemiologic studies, it is often impossible to fully replicate their ﬁndings. An attainable minimum standard is ‘reproducibility,’ which calls for data sets and software to be made available for verifying published ﬁndings and conducting alternative analyses."

http://dx.doi.org/10.1109/MCSE.2009.15

Yale Roundtable on Data and Code Sharing ‣ Nov. 2009:
14 contributed thought pieces ‣ “Data and Code Sharing Declaration”  ... demanding a resolution to the credibility crisis from the lack of reproducible research in computational science. SEPT/OCT 2010 | COMPUTING IN SCIENCE AND ENGINEERING

Data and Code Sharing Recommendations ‣ assign a unique identiﬁer
to every version of the data and code ‣ describe in each publication the computing environment used ‣ use open licenses and non-proprietary formats ‣ publish under open-access conditions (and/or post pre-prints)

Def.— Replication Arriving at the same scientific findings as another
study, collecting new data (possibly with different methods) and completing new analyses. Roger D. Peng (2011), “Reproducible Research in Computational Science” Science, Vol. 334, Issue 6060, pp. 1226-1227

http://dx.doi.org/10.1109/MCSE.2012.38

“private reproducibility” …we can rebuild our own past research results
from the precise version of the code that was used to create them.

What is Science? ‣ American Physical Society: - Ethics and
Values, 1999 "The success and credibility of science are anchored in the willingness of scientists to […] Expose their ideas and results to independent testing and replication by others. This requires the open exchange of data, procedures and materials." https://www.aps.org/policy/statements/99_6.cfm

http://dx.doi.org/10.1371/journal.pcbi.1003285

Reproducible Research 10 Simple Rules 1. For every result, keep
track of how it was produced 2. Avoid manual data-manipulation steps 3. Archive the exact versions of all external programs used 4. Version-control all custom scripts 5. Record all intermediate results, when possible in standard formats 6. For analyses that include randomness, note underlying random seeds 7. Always store raw data behind plots 8. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected 9. Connect textual statements to underlying results 10.Provide public access to scripts, runs, and results

https://doi.org/10.1371/journal.pone.0080278

https://doi.org/10.1371/journal.pbio.1002333

NSF 17-022 Dear Colleague Letter: Encouraging Reproducibility in Computing and
Communications Research CISE, October 21, 2016 https://www.nsf.gov/pubs/2017/nsf17022/nsf17022.jsp

NSF SBE subcommittee on replicability in science: “reproducibility refers to
the ability of a researcher to duplicate results of a prior study using the same materials as were used by the original investigator." “… new evidence is provided by new experimentation, deﬁned in the NSF report as ‘replicability’ “ SBE, May 2015

https://cos.io/our-services/top-guidelines/

Reproducible Research Track (peer reviewed) Lorena A. Barba George Washington
University [email protected] George K. Thiruvathukal Loyola University Chicago [email protected] https://www.computer.org/cise/

Technical Consortium on High Performance Computing New initiative on Reproducibility,
led by Barba. https://www.computer.org/web/tchpc

http://joss.theoj.org

Science Reproducibility Taxonomy

Science Reproducibility Taxonomy

Lorena A. Barba

More Decks by Lorena A. Barba

Other Decks in Science

Featured

Transcript

Reproducibility Taxonomies for Computing and Computational Science National Science Foundation,

Jon F. Claerbout Professor Emeritus of Geophysics Stanford University …

Def.— Reproducible research Authors provide all the necessary data and

Invited paper at the October 1992 meeting of the Society

“In 1990, we set this sequence of goals: 1.Learn how

“… because of the time, expense, and opportunism of many

http://dx.doi.org/10.1109/MCSE.2009.15

Yale Roundtable on Data and Code Sharing ‣ Nov. 2009:

Data and Code Sharing Recommendations ‣ assign a unique identiﬁer

Def.— Replication Arriving at the same scientiﬁc ﬁndings as another

http://dx.doi.org/10.1109/MCSE.2012.38

“private reproducibility” …we can rebuild our own past research results

What is Science? ‣ American Physical Society: - Ethics and

http://dx.doi.org/10.1371/journal.pcbi.1003285

Reproducible Research 10 Simple Rules 1. For every result, keep

https://doi.org/10.1371/journal.pone.0080278

https://doi.org/10.1371/journal.pbio.1002333

NSF 17-022 Dear Colleague Letter: Encouraging Reproducibility in Computing and

NSF SBE subcommittee on replicability in science: “reproducibility refers to

https://cos.io/our-services/top-guidelines/

Reproducible Research Track (peer reviewed) Lorena A. Barba George Washington

Technical Consortium on High Performance Computing New initiative on Reproducibility,

http://joss.theoj.org