How to write a reproducible paper #2

61ebbb91dfade94095c87d16dbcffdbc?s=47 Damien Irving
February 25, 2016

How to write a reproducible paper #2

Seminar for Data Science Hobart

61ebbb91dfade94095c87d16dbcffdbc?s=128

Damien Irving

February 25, 2016
Tweet

Transcript

  1. How to write a reproducible paper Damien Irving CSIRO @DrClimate

    https://drclimate.wordpress.com
  2. Irving D, Simmonds I (2015). A novel approach to diagnosing

    Southern Hemisphere planetary wave activity and its influence on regional climate variability. Journal of Climate. 28, 9041-57. doi:10.1175/JCLI-D-15-0287.1 Irving D (in press). A minimum standard for publishing computational results in the weather and climate sciences. Bulletin of the American Meteorological Society. doi:10.1175/BAMS-D-15-00010.1
  3. The reproducibility crisis ž  Our field has rapidly transitioned to

    a computational science ž  Conventions around communicating our methods have hardly changed —  Have you ever seen a paper provide (ancillary) code/software details? ž  It’s impossible to replicate the results presented in journal papers today
  4. The crisis response ž  Funding agencies + journals1 —  Some

    progress on dataset disclosure ◦  Funders like NSF, ARC have policies ◦  Most weather/climate journals have policies ◦  Not consistently enforced —  Weak or non-existent code requirements ž  It’s not all their fault —  No examples to base new standards on —  I set about addressing this deficiency… 1. Stodden et al. 2013. PLoS ONE, 8, e67111
  5. A plan for change 1.  Consult the literature —  Why

    don’t people publish their code? —  Best practices for scientific computing 2.  Devise and implement an approach —  Irving & Simmonds (2015) 3.  Lobby journals —  Propose a communication standard (BAMS) —  Contact decision makers 4.  Help scientists improve their skills —  Software Carpentry —  Research Bazaar
  6. 1. The literature ž  Barriers to overcome1 —  Perceived lack

    of time —  Low computational competency è minimise time and complexity ž  Computational best practice2 —  Write scripts —  Modularise, don’t copy/paste -> code library —  Use version control 1. Stodden (2010). doi:10.2139/ssrn.1550193 2. Wilson et al. 2014. PLoS Biol, 12, e1001745
  7. ž  Add a computation section that contains: —  Brief overview

    of software packages ◦  Academic credit for software authors —  Link to collection of supplementary materials: ◦  Software description, code, log files ◦  Host with journal, institution or Figshare/Zenodo 2. The approach http://dx.doi.org/10.6084/m9.figshare.1385387
  8. Software Description ž  Name, version number, release date, institution and

    DOI or URL —  i.e. sufficient detail to recreate environment
  9. Code ž  [compulsory] Provide a copy of your version controlled

    code repository ž  [desirable] Link to an external hosting service —  Allows readers to view updates & submit pull requests ž  Your everyday repository is fine —  github.com/DamienIrving/climate-analysis
  10. Log files ž  Step-by-step account, download to result ž  My

    suggestion: the NCO / CDO approach —  Can generate timestamps with any language —  Features: Simple, read/writeable by anyone, easy to regenerate (no manual editing)
  11. 3. Lobby decision makers ž  Proposed minimum standard: —  Authors

    must include brief computation section which cites software and points to supplementary materials: ◦  Software description ◦  Code (suggest version controlled, public) ◦  Log files —  Authors not obliged to provide assistance —  Reviewers only need to check availability —  Editorial discretion re code privacy
  12. ž  AMS Board on Data Stewardship —  Still assessing impact

    of data standards —  Follow community trends ◦  Will you try the approach for your next paper? https://drclimate.wordpress.com/2015/11/05/ a-call-for-reproducible-research-volunteers/
  13. 4. Helping scientists ž  Software Carpentry —  AMOS Conference, 2013-15

    —  DaSH, UNSW, 2015 ž  Content (two-days): —  Unix Shell —  Programming (in Python) —  Version control —  Workflow automation (with Make) —  Data management: damienirving.github.io/capstone-oceanography/
  14. None
  15. None
  16. Aim higher! ž  Minimum standard is reproducible, but not very

    comprehensible ž  Ideas: —  README files —  Write packages ◦  e.g. eofs, windspharm, SkewT —  VisTrails / CWSLab workflow tool ◦  http://cwslab.nci.org.au/ —  Docker
  17. https://github.com/CWSL/cwsl-mas/wiki/Tutorial

  18. Why aim for reproducibility? ž  To literally reproduce results /

    catch people out —  Software evolves so quickly —  Most don’t have access to suitable hardware ž  To build on each other’s ideas faster —  The risk of people doing nothing with your work is much greater than the risk of being “scooped”
  19. Summary ž  There is a reproducibility crisis in weather/ climate/ocean

    research ž  This can be solved by adding a brief computation section to papers which points to supplementary materials: —  Software description —  Code (version controlled, public) —  Log files ž  Journals could adopt this framework as a formal minimum standard
  20. Questions? https://drclimate.wordpress.com/ orientation-guide/