reproducibility as the engine of science: tools for reproducible research

D08b934b10b0874e0a8230a5c30892b3?s=47 Lindsey Heagy
February 11, 2020

reproducibility as the engine of science: tools for reproducible research

Presented at the workshop: "Scientific publication beyond the text: Sharing research objects"

https://library.stanford.edu/events/scientific-publication-beyond-text-sharing-research-objects

Thanks to Rowan Cockett, Chris Holdgraf and Fernando Perez for helping shape these ideas and this presentation.

D08b934b10b0874e0a8230a5c30892b3?s=128

Lindsey Heagy

February 11, 2020
Tweet

Transcript

  1. 2.

    hello (a bit about me) geophysical inversions open-source software open

    research & education geoscience + data science +
  2. 3.

    questions in the geosciences Observations / Data After Hamman, 2018

    Theory & Ideas EMAG2: Earth Magnetic Anomaly Grid (2-arc-minute resolution). Image credit: Dom Fournier (toolkit.geosci.xyz) Simulations, Computation
  3. 5.

    evolving research outputs & audiences Variety of “consumers”: • peers

    • students • decision makers & the public Drives diversity in outputs • journal publications • web apps • educational resources • ...
  4. 9.

    interactive, exploratory computing a community of people and an ecosystem

    of open tools and standards for interactive computing
  5. 12.

    JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  6. 16.

    JupyterLab is extensible: FlyBrainLab An Interactive Computing Platform for the

    Fly Brain BIONET Group, Columbia University http://www.bionet.ee.columbia.edu Aurel A. Lazar (PI) Tingkai Liu Mehmet K. Turkcan Chung-Heng Yeh Yiyin Zhou http://fruitflybrain.org
  7. 17.
  8. 21.
  9. 25.

    JupyterHub distributions The Littlest JupyterHub tljh.jupyter.org JupyterHub on Kubernetes z2jh.jupyter.org

    A pre-configured JupyterHub setup with sensible defaults and lots of documentation, fit for many use-cases ☁
  10. 26.

    Scalable in both users and in resources Uses Docker for

    environment management Agnostic to the provider and hardware configuration Zero to JupyterHub for Kubernetes z2jh.jupyter.org
  11. 30.

    Harnessing the power of cloud computing to study the whole

    Earth interactively Interactivity Distributed computing Data models / numerics
  12. 32.

    Jupyter meets the Earth: an NSF grant (2M / 3Y)!

    Fernando Pérez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments • Climate data analysis • Hydrology • Geophysics • Data discovery • Interactivity • Cloud/HPC infrastructure For more: http://bit.ly/jupytearth
  13. 34.

    the science more than the paper An article about computational

    science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  14. 35.

    An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. (and a place to run the code?) the science more than the paper -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  15. 39.
  16. 40.

    New development: publishing executable books QuantEcon IAB Jupyter Book PDF

    HTML ... execution and text content sync citations, cross-refs, rich metadata
  17. 47.

    Groundwater in Myanmar • Bring DC resistivity equipment to Mon

    state • Train local stakeholders • Provide open-source software and educational resources
  18. 48.

    Reaching new audiences Diverse research outputs: • Papers • Notebooks

    • Apps • Web-based textbooks “Consumers” of science • Scientists • Students • Public
  19. 53.

    Blurring the line between scientists and audience? • Open tools

    are ◦ accessible ◦ explorable ◦ extensible
  20. 54.

    An open ecosystem supports the engine of science • Open

    tools are a starting point for… ◦ reproducibility of work ◦ collaboration at the level of computation ◦ extension of ideas • And provide a trajectory for “consumers” to become creators
  21. 56.