Science Reproducibility Taxonomy

Science Reproducibility Taxonomy

Presentation slides for the 2017 Workshop on Reproducibility Taxonomies for Computing and Computational Science
July 25, 2017
https://collegeville.github.io/repeto/ReproducibilityWorkshop2017.html

Cite as:
Barba, Lorena A. (2017): Science Reproducibility Taxonomy. figshare.
https://doi.org/10.6084/m9.figshare.5248273.v1

Also follow the link above for the text of the presenter notes.

C10c1cc1bd01eb53c616f2d0a1786fe5?s=128

Lorena A. Barba

July 26, 2017
Tweet

Transcript

  1. Reproducibility Taxonomies for Computing and Computational Science National Science Foundation,

    25 July 2017 Science Reproducibility Taxonomy @LorenaABarba
  2. Jon F. Claerbout Professor Emeritus of Geophysics Stanford University …

    pioneered the use of computers in processing and filtering seismic exploration data [Wikipedia] … from 1991, he required theses to conform to a standard of reproducibility.
  3. Def.— Reproducible research Authors provide all the necessary data and

    the computer codes to run the analysis again, re- creating the results. Schwab, M., Karrenbach, N., Claerbout, J. (2000) “Making scientific computations reproducible,” Computing in Science and Engineering Vol. 2(6):61–67
  4. Invited paper at the October 1992 meeting of the Society

    of Exploration Geophysics http://library.seg.org/doi/abs/10.1190/1.1822162
  5. “In 1990, we set this sequence of goals: 1.Learn how

    to merge a publication with its underlying computational analysis. 2.Teach researchers how to prepare a document in a form where they themselves can reproduce their own research results a year or more later by “pressing a single button”. 3.Learn how to leave finished work in a condition where coworkers can reproduce the calculation including the final illustration by pressing a button in its caption. 4.Prepare a complete copy of our local software environment so that graduating students can take their work away with them to other sites, press a button, and reproduce their Stanford work. 5.Merge electronic documents written by multiple authors (SEP reports). 6.Export electronic documents to numerous other sites (sponsors) so they can readily reproduce a substantial portion of our Stanford research.
  6. None
  7. “… because of the time, expense, and opportunism of many

    current epidemiologic studies, it is often impossible to fully replicate their findings. An attainable minimum standard is ‘reproducibility,’ which calls for data sets and software to be made available for verifying published findings and conducting alternative analyses."
  8. http://dx.doi.org/10.1109/MCSE.2009.15

  9. Yale Roundtable on Data and Code Sharing ‣ Nov. 2009:

    14 contributed thought pieces ‣ “Data and Code Sharing Declaration”
 ... demanding a resolution to the credibility crisis from the lack of reproducible research in computational science. SEPT/OCT 2010 | COMPUTING IN SCIENCE AND ENGINEERING
  10. Data and Code Sharing Recommendations ‣ assign a unique identifier

    to every version of the data and code ‣ describe in each publication the computing environment used ‣ use open licenses and non-proprietary formats ‣ publish under open-access conditions (and/or post pre-prints)
  11. None
  12. None
  13. Def.— Replication Arriving at the same scientific findings as another

    study, collecting new data (possibly with different methods) and completing new analyses. Roger D. Peng (2011), “Reproducible Research in Computational Science” Science, Vol. 334, Issue 6060, pp. 1226-1227
  14. http://dx.doi.org/10.1109/MCSE.2012.38

  15. “private reproducibility” …we can rebuild our own past research results

    from the precise version of the code that was used to create them.
  16. What is Science? ‣ American Physical Society: - Ethics and

    Values, 1999 "The success and credibility of science are anchored in the willingness of scientists to […] Expose their ideas and results to independent testing and replication by others. This requires the open exchange of data, procedures and materials." https://www.aps.org/policy/statements/99_6.cfm
  17. http://dx.doi.org/10.1371/journal.pcbi.1003285

  18. Reproducible Research 10 Simple Rules 1. For every result, keep

    track of how it was produced 2. Avoid manual data-manipulation steps 3. Archive the exact versions of all external programs used 4. Version-control all custom scripts 5. Record all intermediate results, when possible in standard formats 6. For analyses that include randomness, note underlying random seeds 7. Always store raw data behind plots 8. Generate hierarchical analysis output, allowing layers of increasing detail to be inspected 9. Connect textual statements to underlying results 10.Provide public access to scripts, runs, and results
  19. https://doi.org/10.1371/journal.pone.0080278

  20. https://doi.org/10.1371/journal.pbio.1002333

  21. NSF 17-022 Dear Colleague Letter: Encouraging Reproducibility in Computing and

    Communications Research CISE, October 21, 2016 https://www.nsf.gov/pubs/2017/nsf17022/nsf17022.jsp
  22. NSF SBE subcommittee on replicability in science: “reproducibility refers to

    the ability of a researcher to duplicate results of a prior study using the same materials as were used by the original investigator." “… new evidence is provided by new experimentation, defined in the NSF report as ‘replicability’ “ SBE, May 2015
  23. None
  24. https://cos.io/our-services/top-guidelines/

  25. Reproducible Research Track (peer reviewed) Lorena A. Barba George Washington

    University labarba@gwu.edu George K. Thiruvathukal Loyola University Chicago gkt@cs.luc.edu https://www.computer.org/cise/
  26. Technical Consortium on High Performance Computing New initiative on Reproducibility,

    led by Barba. https://www.computer.org/web/tchpc
  27. http://joss.theoj.org