Heiland Lecture at Colorado School of Mines

D08b934b10b0874e0a8230a5c30892b3?s=47 Lindsey Heagy
November 20, 2019

Heiland Lecture at Colorado School of Mines

Jupyter meets the Earth: from geophysical inversions to open, collaborative geoscience

Today’s most critical questions in the geosciences, from climate studies to the management of natural resources, require that we integrate knowledge and methods across domains. With the widespread availability of large-scale computational resources, and the unprecedented quality and quantity of scientific data being collected today, we have opportunities to ask in-depth questions and perform analyses that would have been impossible only years ago. These cross-cutting questions also require that we bridge across technical and social barriers that exist between disciplines. Furthermore, these topics involve a complex spectrum of stakeholders, from individual researchers to policy makers and the general public. The adoption of open-source tools, such as those in the Python ecosystem, is one mechanism for fostering communities and making progress on these challenges.

Within geophysics, I have been a part of one of the teams leading a transition towards open-science practices. In 2013, we started the SimPEG project, an open-source software package that integrates multiple methods (e.g. gravity, magnetics, electromagnetics, fluid flow) into a common framework. SimPEG offers both an architecture that fosters technical interoperability of algorithms, and a community approach that encourages multidisciplinary collaboration. Growing these collaborations to include geologists, hydrologists, geochemists, engineers and others motivates the need for educational resources that provide context for how geophysical techniques fit into the broader goal of answering a question about the subsurface. SimPEG is the research foundation atop which we’ve built the Geosci.xyz, a collection of open educational resources for geosciences. SimPEG and GeoSci.xyz exist within a broad ecosystem of open tools that are now transforming the practices of research, education and scientific communication. We use (and contribute to) Project Jupyter, and we now participate in the Pangeo initiative. Pangeo helps geoscientists explore petabyte-scale datasets and simulations interactively on modern computational environments (from HPC centers to the cloud).

In this talk I will outline my own trajectory in our efforts to develop an open, collaborative, and reproducible model that we think is needed for geoscience to tackle the scientific and social challenges that lie ahead.

D08b934b10b0874e0a8230a5c30892b3?s=128

Lindsey Heagy

November 20, 2019
Tweet

Transcript

  1. Jupyter meets the Earth: from geophysical inversions to open, collaborative

    geoscience Lindsey Heagy UC Berkeley @lindsey_jh
  2. hello (a bit about me) geophysical inversions open-source software open

    research & education geoscience + data science +
  3. Observations / Data After Hamman, 2018 Theory & Ideas EMAG2:

    Earth Magnetic Anomaly Grid (2-arc-minute resolution). Image credit: Dom Fournier (toolkit.geosci.xyz) what drives progress in geoscience? Simulations, Computation
  4. what drives progress in geoscience?

  5. imaging the subsurface: some important problems

  6. forward and inverse problems in geophysics

  7. forward and inverse problems in geophysics Numerical simulations: predict data

    Optimization: estimate a model
  8. tools by and for researchers • Modular, multi-physics ◦ Gravity

    ◦ Magnetics ◦ Direct current resistivity ◦ Induced Polarization ◦ Electromagnetics ▪ Frequency Domain ▪ Time Domain ◦ Fluid Flow ▪ Richards Equation https://simpeg.xyz 3D Airborne Time Domain EM
  9. simulations: create a mesh 9

  10. simulations: discretize & solve 10 DC resistivity discrete equations A

  11. inversions Data Misfit Regularization Inverse Problem

  12. a better approach: CORE science* Collaborative Open Reproducible Extensible *

    With a nod to the FAIR principles of open data
  13. Collaborative?

  14. Collaborative scientific community • Contributors, users: Academic & industry •

    Applications: mining, groundwater, tectonic studies, …
  15. Open?

  16. Dimensions of openness • Open source code • Open (FAIR)

    data • Open access publications & artifacts • Open standards: interoperability (even with proprietary tools) • Open community: all welcome! • …
  17. Reproducible? the foundation of collaboration

  18. the science more than the paper An article about computational

    science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  19. An article about computational science in a scientific publication is

    not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. (and a place to run the code?) the science more than the paper -- Buckheit and Donoho (paraphrasing Claerbout) WaveLab and Reproducible Research, 1995
  20. mybinder.org shareable, interactive, reproducible environments from your public git repository

  21. http://bit.ly/black-holes-woop Black holes! LIGO, Sept 14, 2015

  22. We have access to all the same tools

  23. Extensible?

  24. inversion gravity magnetics density mag. susceptibility invert

  25. inversion gravity magnetics invert

  26. inversion gravity magnetics invert T. Astic

  27. an iterative process

  28. testing & refactoring confidence mathematical properties analytic solutions code comparisons

    ?
  29. an open, modular ecosystem remix for diverse use-cases: research &

    education
  30. Impacts in research and education

  31. http://em.geosci.xyz/apps.html interactive geophysics

  32. http://em.geosci.xyz/apps.html interactive geophysics

  33. http://em.geosci.xyz/apps.html

  34. GeoSci.xyz https://geosci.xyz EOSC 350 26 locations worldwide

  35. Data 100: ~800 students Data 8: ~1,300 students Berkeley Data

    Science Education: Fall 2018
  36. https://blog.jupyter.org/teaching-and-learning-with-jupyter-c1d965f7b93a Teaching and Learning with Jupyter

  37. R. Abernathey Columbia/Lamont Oceanography Pangeo co-Lead An Introduction to Earth

    and Environmental Data Science https://earth-env-data-science.github.io/intro PyEarth: A Python Introduction to Earth Science JupyterBook N. Swanson-Hysell Berkeley EPS Earth Data Science: Open Education
  38. enabling new science • Integrating multiple geophysical data types for

    richer geological models • Physics + machine learning in electromagnetics
  39. large-scale magnetic vector inversions Satellite data: 2-arc minute EMAG v2

    Divide and conquer Global problem Tiled forward D. Fournier J. Capriotti OcTree Meshes! B. Sullivan
  40. joint inversion of QUEST data Susceptibility model Density model Jae

    Deok Kim Jiajia Sun
  41. electromagnetics with steel cased wells

  42. modeling electromagnetics on cylindrical meshes • Finite volume discretization ◦

    cylindrically symmetric ◦ 3D cylindrical meshes • DC, FDEM, TDEM 42 Heagy & Oldenburg, 2018
  43. modeling electromagnetics on cylindrical meshes 43 Validating the physics •

    Kaufman (1990): Charges, currents, electric fields
  44. 44 time-domain EM response

  45. physics + machine learning ML to estimate a source term

    for the correction True solution Error At DC: can replace well with solid cylinder
  46. scaling computation & communities

  47. Interactive exploration at scale?

  48. Harnessing the power of cloud computing to study the whole

    Earth interactively Interactivity Distributed computing Data models / numerics
  49. Jupyter meets the Earth: an NSF grant (2M / 3Y)!

    Fernando Pérez Joe Hamman Laurel Larsen Kevin Paul Lindsey Heagy Chris Holdgraf Yuvi Panda Research use-cases Tech developments • Climate data analysis • Hydrology • Geophysics • Data discovery • Interactivity • Cloud/HPC infrastructure For more: http://bit.ly/jupytearth
  50. Next-generation LIDAR satellite https://icesat-2.gsfc.nasa.gov A. Arendt J. Scheik L. Heagy

    M. Siegfried F. Pérez
  51. Open, collaborative geoscience • Data volumes & computational needs: •

    Bridging across technical & disciplinary lines: ◦ Interoperability of tools ◦ Resources for communicating ideas ◦ Interplay between research & education • Challenges are bigger than an individual: need an open ecosystem of tools and collaborative communities. • Problems of major societal impact: close the scientist / public / policymaker gap.
  52. Thank you! @lheagy lheagy@berkeley.edu @lindsey_jh Slides: http://bit.ly/csm-heagy-2019