Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jupyter meets the Earth: from geophysical inve...

Lindsey Heagy
November 01, 2019

Jupyter meets the Earth: from geophysical inversions to open, collaborative geoscience

Presented at the Women in Data Science (WiDS) @ Stanford Earth event: https://earth.stanford.edu/events/women-data-science-stanford-earth#gs.f6efqj

Abstract
-----------

Today’s most critical questions in the geosciences, from climate studies to the management of natural resources, require that we integrate knowledge and methods across domains. With the widespread availability of large-scale computational resources, and the unprecedented quality and quantity of scientific data being collected today, we have opportunities to ask in-depth questions and perform analyses that would have been impossible only years ago. These cross-cutting questions also require that we bridge across technical and social barriers that exist between disciplines. Furthermore, these topics involve a complex spectrum of stakeholders, from individual researchers to policy makers and the general public. The adoption of open-source tools, such as those in the Python ecosystem, is one mechanism for fostering communities and making progress on these challenges.

Within geophysics, I have been a part of one of the teams leading a transition towards open-science practices. In 2013, we started the SimPEG project, an open-source software package that integrates multiple methods (e.g. gravity, magnetics, electromagnetics, fluid flow) into a common framework. SimPEG offers both an architecture that fosters technical interoperability of algorithms, and a community approach that encourages multidisciplinary collaboration. Growing these collaborations to include geologists, hydrologists, geochemists, engineers and others motivates the need for educational resources that provide context for how geophysical techniques fit into the broader goal of answering a question about the subsurface. SimPEG is the research foundation atop which we’ve built the Geosci.xyz, a collection of open educational resources for geosciences. SimPEG and GeoSci.xyz exist within a broad ecosystem of open tools that are now transforming the practices of research, education and scientific communication. We use (and contribute to) Project Jupyter, and we now participate in the Pangeo initiative. Pangeo helps geoscientists explore petabyte-scale datasets and simulations interactively on modern computational environments (from HPC centers to the cloud).

In this talk I will outline my own trajectory in our efforts to develop an open, collaborative, and reproducible model that we think is needed for geoscience to tackle the scientific and social challenges that lie ahead.

Lindsey Heagy

November 01, 2019
Tweet

More Decks by Lindsey Heagy

Other Decks in Science

Transcript

  1. Jupyter meets the Earth: from geophysical inversions to open, collaborative

    geoscience Lindsey Heagy UC Berkeley @lindsey_jh
  2. hello (a bit about me) geophysical inversions open-source software open

    research & education geoscience + data science +
  3. Observations / Data After Hamman, 2018 Theory & Ideas EMAG2:

    Earth Magnetic Anomaly Grid (2-arc-minute resolution). Image credit: Dom Fournier (toolkit.geosci.xyz) what drives progress in geoscience? Simulations, Computation
  4. tools by and for researchers • Modular, multi-physics ◦ Gravity

    ◦ Magnetics ◦ Direct current resistivity ◦ Induced Polarization ◦ Electromagnetics ▪ Frequency Domain ▪ Time Domain ◦ Fluid Flow ▪ Richards Equation https://simpeg.xyz 3D Airborne Time Domain EM
  5. Harnessing the power of cloud computing to study the whole

    Earth interactively Interactivity Distributed computing Data models / numerics
  6. Jupyter meets the Earth: an NSF grant (2M / 3Y)!

    Fernando Pérez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments • Climate data analysis • Hydrology • Geophysics • Data discovery • Interactivity • Cloud/HPC infrastructure
  7. the science more than the paper An article about computational

    science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  8. An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. (and a place to run the code?) the science more than the paper -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  9. enabling new science • Integrating multiple geophysical data types for

    richer geological models • Physics + machine learning in electromagnetics
  10. physics + machine learning ML to estimate a source term

    for the correction simulating EM with steel cased wells
  11. A few avenues for involvement • SimPEG: Methods ◦ Seismic

    ◦ Fluid flow • Research areas ◦ Data integration (geologic, hydrologic, …) ◦ Role of ML in inversions ◦ ML connections with physical models ◦ Question-driven assessments of uncertainty ◦ ... • Communities ◦ SimPEG ◦ Pangeo ◦ Jupyter
  12. Open Source Academia Credit Distributed PI & hierarchy Output/artifacts Continuous

    & Project-specific Discrete papers Collaborators Fluid: professionals, volunteers, … Structured, funding-dependent Governance/decision making Open, community based Top-down, PI Authorship Fluid, roles can evolve, no clear “first/senior” author Need to say more? Peer review Continuous, open, pervasive, friendly The opposite Value metric Utility, need, impact “Novel and transformative” Challenges: contrasts in cultures and incentives