Slide 1

Slide 1 text

How to write a reproducible paper Damien Irving University of Melbourne

Slide 2

Slide 2 text

Irving D, Simmonds I (2015). A novel approach to diagnosing Southern Hemisphere planetary wave activity and its influence on regional climate variability. Journal of Climate. 28, 9041-57. doi:10.1175/JCLI-D-15-0287.1 Irving D (in press). A minimum standard for publishing computational results in the weather and climate sciences. Bulletin of the American Meteorological Society. doi:10.1175/BAMS-D-15-00010.1

Slide 3

Slide 3 text

The reproducibility crisis ž  Our field has rapidly transitioned to a computational science ž  Conventions around communicating our methods have hardly changed —  Have you ever seen a paper provide (ancillary) code/software details? ž  It’s impossible to replicate the results presented in journal papers today

Slide 4

Slide 4 text

The crisis response ž  Funding agencies + journals1 —  Some progress on dataset disclosure ○  Funders like NSF, ARC have policies ○  Most weather/climate journals have policies ○  Not consistently enforced —  Weak or non-existent code requirements ž  It’s not all their fault —  No examples to base new standards on —  I set about addressing this deficiency… 1. Stodden et al. 2013. PLoS ONE, 8, e67111

Slide 5

Slide 5 text

A plan for change 1.  Consult the literature —  Why don’t people publish their code? —  Best practices for scientific computing 2.  Devise and implement an approach —  Irving & Simmonds (2015) 3.  Lobby journals —  Propose a communication standard (BAMS) —  Contact decision makers 4.  Help scientists improve their skills —  Software Carpentry

Slide 6

Slide 6 text

1. The literature ž  Barriers to overcome1 —  Perceived lack of time —  Low computational competency è minimise time and complexity ž  Computational best practice2 —  Write scripts —  Modularise, don’t copy/paste -> code library —  Use version control 1. Stodden (2010). doi:10.2139/ssrn.1550193 2. Wilson et al. 2014. PLoS Biol, 12, e1001745

Slide 7

Slide 7 text

ž  Add a computation section that contains: —  Brief overview of software packages ○  Academic credit for software authors —  Link to collection of supplementary materials: ○  Software description, code, log files ○  Host with journal, institution or Figshare, Zenodo 2. The approach http://dx.doi.org/10.6084/m9.figshare.1385387

Slide 8

Slide 8 text

Software Description ž  Name, version number, release date, institution and DOI or URL —  i.e. sufficient detail to recreate environment

Slide 9

Slide 9 text

Code ž  [desirable] Link to version controlled repository at an external hosting service —  Allows for revision history, pull requests —  Your everyday repository is fine ○  github.com/DamienIrving/climate-analysis ž  [compulsory] Latest version of code —  With software description and log files

Slide 10

Slide 10 text

Log files ž  Step-by-step account, download to result ž  My suggestion: the NCO / CDO approach —  Can generate timestamps with any language —  Features: Simple, read/writeable by anyone, easy to regenerate (no manual editing)

Slide 11

Slide 11 text

3. Lobby decision makers ž  Proposed minimum standard: —  Authors must include brief computation section which cites software and points to supplementary materials: ○  Software description ○  Code (suggest public, version controlled) ○  Log files —  Authors not obliged to provide assistance —  Reviewers only need to check availability —  Editorial discretion re code privacy

Slide 12

Slide 12 text

ž  Next steps —  AMS Board on Data Stewardship —  Will you volunteer to try the approach for your next paper? https://drclimate.wordpress.com/2015/11/05/ a-call-for-reproducible-research-volunteers/

Slide 13

Slide 13 text

4. Helping scientists ž  Software Carpentry —  AMOS Conference 2013-15 —  Upcoming training: http://go.unimelb.edu.au/7cra ž  Content (two-days): —  Unix Shell —  Programming (in Python) —  Version control —  Workflow automation (with Make) —  Data management: damienirving.github.io/capstone-oceanography/

Slide 14

Slide 14 text

1-3 February, 2016 resbaz.com

Slide 15

Slide 15 text

Aim higher! ž  Minimum standard is reproducible, but not very comprehensible ž  Ideas: —  README files —  Write packages ○  e.g. eofs, windspharm, SkewT —  VisTrails / CWSLab workflow tool ○  http://cwslab.nci.org.au/ —  Docker / RunMyCode.org —  Write the new workflow management tool?

Slide 16

Slide 16 text

https://github.com/CWSL/cwsl-mas/wiki/Tutorial

Slide 17

Slide 17 text

Why aim for reproducibility? ž  To literally reproduce results / catch people out —  Software evolves so quickly —  Most don’t have access to suitable hardware ž  To build on each other’s ideas faster —  The risk of people doing nothing with your work is much greater than the risk of being “scooped”

Slide 18

Slide 18 text

Summary ž  There is a reproducibility crisis in weather/ climate/ocean research ž  This can be solved by adding a brief computation section to papers which points to supplementary materials: —  Software description —  Code repository (public, version controlled) —  Log files ž  Journals could adopt this framework as a formal minimum standard

Slide 19

Slide 19 text

Questions? https://drclimate.wordpress.com/ orientation-guide/