Pro Yearly is on sale from $80 to $50! »

Jupyter meets the Earth: from geophysical inversions to open, collaborative geoscience

D08b934b10b0874e0a8230a5c30892b3?s=47 Lindsey Heagy
November 01, 2019

Jupyter meets the Earth: from geophysical inversions to open, collaborative geoscience

Presented at the Women in Data Science (WiDS) @ Stanford Earth event: https://earth.stanford.edu/events/women-data-science-stanford-earth#gs.f6efqj

Abstract
-----------

Today’s most critical questions in the geosciences, from climate studies to the management of natural resources, require that we integrate knowledge and methods across domains. With the widespread availability of large-scale computational resources, and the unprecedented quality and quantity of scientific data being collected today, we have opportunities to ask in-depth questions and perform analyses that would have been impossible only years ago. These cross-cutting questions also require that we bridge across technical and social barriers that exist between disciplines. Furthermore, these topics involve a complex spectrum of stakeholders, from individual researchers to policy makers and the general public. The adoption of open-source tools, such as those in the Python ecosystem, is one mechanism for fostering communities and making progress on these challenges.

Within geophysics, I have been a part of one of the teams leading a transition towards open-science practices. In 2013, we started the SimPEG project, an open-source software package that integrates multiple methods (e.g. gravity, magnetics, electromagnetics, fluid flow) into a common framework. SimPEG offers both an architecture that fosters technical interoperability of algorithms, and a community approach that encourages multidisciplinary collaboration. Growing these collaborations to include geologists, hydrologists, geochemists, engineers and others motivates the need for educational resources that provide context for how geophysical techniques fit into the broader goal of answering a question about the subsurface. SimPEG is the research foundation atop which we’ve built the Geosci.xyz, a collection of open educational resources for geosciences. SimPEG and GeoSci.xyz exist within a broad ecosystem of open tools that are now transforming the practices of research, education and scientific communication. We use (and contribute to) Project Jupyter, and we now participate in the Pangeo initiative. Pangeo helps geoscientists explore petabyte-scale datasets and simulations interactively on modern computational environments (from HPC centers to the cloud).

In this talk I will outline my own trajectory in our efforts to develop an open, collaborative, and reproducible model that we think is needed for geoscience to tackle the scientific and social challenges that lie ahead.

D08b934b10b0874e0a8230a5c30892b3?s=128

Lindsey Heagy

November 01, 2019
Tweet

Transcript

  1. Jupyter meets the Earth: from geophysical inversions to open, collaborative

    geoscience Lindsey Heagy UC Berkeley @lindsey_jh
  2. hello (a bit about me) geophysical inversions open-source software open

    research & education geoscience + data science +
  3. Observations / Data After Hamman, 2018 Theory & Ideas EMAG2:

    Earth Magnetic Anomaly Grid (2-arc-minute resolution). Image credit: Dom Fournier (toolkit.geosci.xyz) what drives progress in geoscience? Simulations, Computation
  4. what drives progress in geoscience?

  5. a better approach: CORE science* Collaborative Open Reproducible Extensible *

    With a nod to the FAIR principles of open data
  6. imaging the subsurface: some important problems

  7. forward and inverse problems in geophysics

  8. forward and inverse problems in geophysics Numerical simulations: predict data

    Optimization: estimate a model
  9. tools by and for researchers • Modular, multi-physics ◦ Gravity

    ◦ Magnetics ◦ Direct current resistivity ◦ Induced Polarization ◦ Electromagnetics ▪ Frequency Domain ▪ Time Domain ◦ Fluid Flow ▪ Richards Equation https://simpeg.xyz 3D Airborne Time Domain EM
  10. Interactive exploration at scale?

  11. Harnessing the power of cloud computing to study the whole

    Earth interactively Interactivity Distributed computing Data models / numerics
  12. Jupyter meets the Earth: an NSF grant (2M / 3Y)!

    Fernando Pérez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments • Climate data analysis • Hydrology • Geophysics • Data discovery • Interactivity • Cloud/HPC infrastructure
  13. None
  14. http://em.geosci.xyz/apps.html

  15. GeoSci.xyz https://geosci.xyz EOSC 350 26 locations worldwide

  16. the science more than the paper An article about computational

    science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  17. An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. (and a place to run the code?) the science more than the paper -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  18. mybinder.org shareable, interactive, reproducible environments from your public git repository

  19. http://bit.ly/black-holes-woop Black holes! LIGO, Sept 14, 2015

  20. enabling new science • Integrating multiple geophysical data types for

    richer geological models • Physics + machine learning in electromagnetics
  21. Astic, T. gravity magnetics invert joint inversion T. Astic

  22. physics + machine learning ML to estimate a source term

    for the correction simulating EM with steel cased wells
  23. A few avenues for involvement • SimPEG: Methods ◦ Seismic

    ◦ Fluid flow • Research areas ◦ Data integration (geologic, hydrologic, …) ◦ Role of ML in inversions ◦ ML connections with physical models ◦ Question-driven assessments of uncertainty ◦ ... • Communities ◦ SimPEG ◦ Pangeo ◦ Jupyter
  24. Open Source Academia Credit Distributed PI & hierarchy Output/artifacts Continuous

    & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative” Challenges: contrasts in cultures and incentives
  25. Thank you! @lheagy lheagy@berkeley.edu @lindsey_jh