Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Earth Science Data in the 2020s: Rising Tide or Catastrophic Flood?”

Earth Science Data in the 2020s: Rising Tide or Catastrophic Flood?”

Presentation given at the National Academy of Science for a symposium honoring Michael Freilich's career.

https://web.cvent.com/event/95da1782-9572-4bcd-ba66-63b9ed60db8d/summary

Ryan Abernathey

January 21, 2020
Tweet

More Decks by Ryan Abernathey

Other Decks in Science

Transcript

  1. E a r t h S c i e n

    c e D ata i n t h e 2 0 2 0 s : R i s i n g T i d e o r C ata s t r o p h i c F l o o d ? ” R y a n A b e r n a t h e y C o l u m b i a / L D E O
  2. W h at S c i e n c e

    d o w e w a n t t o d o w i t h S at e l l i t e D ata? !4
  3. !5 Take the mean! W h at S c i

    e n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata?
  4. !6 Analyze spatiotemporal variability W h at S c i

    e n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata?
  5. !7 Machine learning! Credit: Berkeley Lab W h at S

    c i e n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata?
  6. !8 Data Assimilation W h at S c i e

    n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata? Lahoz, W. A., & Schneider, P. (2014). Data assimilation: making sense of Earth Observation. Frontiers in Environmental Science, 2, 16. https://doi.org/10.3389/fenvs.2014.00016
  7. N e v e r M i n d …

    !15 H o w ? Let’s “bring the compute to the data”!
  8. U s e a “ P l at f o

    r m ” !16 Database
  9. U s e a “ P l at f o

    r m ” !16 Database
  10. U s e a “ P l at f o

    r m ” !16 Database
  11. • Scientists’ creativity nearly always exceeds pre-baked capabilities. • What

    if you want to access data that isn’t included? • Who pays? Are platform priorities aligned with scientific community? T h e T r o u b l e w i t h “ P l at f o r m s ” !17
  12. F e d e r at e d C l

    o u d A r c h i t e c t u r e !18
  13. F e d e r at e d C l

    o u d A r c h i t e c t u r e !19 Analysis Ready Data
 Cloud Optimized Formats
  14. F e d e r at e d C l

    o u d A r c h i t e c t u r e !20 Analysis Ready Data
 Cloud Optimized Formats Scalable Parallel Computing Frameworks
  15. • Open Community • Open Source Software • Open Source

    Infrastructure !21 W h at i s Pa n g e o ? “A community platform for Big Data geoscience”
  16. !22 Pa n g e o C o m m

    u n i t y http://pangeo.io
  17. !23 Pa n g e o A r c h

    i t e c t u r e Jupyter for interactive access remote systems Cloud / HPC Xarray provides data structures and intuitive interface for interacting with datasets Parallel computing system allows users deploy clusters of compute nodes for data processing. Dask tells the nodes what to do. Distributed storage “Analysis Ready Data”
 stored on globally-available distributed storage.
  18. aospy Pa n g e o S o f t

    w a r e E c o s y s t e m !24 SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015)
  19. G r a s s - R o o t

    s A d o p t i o n !25
  20. • How do we support / sustain open-source foundational software

    tools?
 (No agency or lab “owns” these, but they are critical infrastructure.) • Who provides cloud-style computing to the science community? • How do we avoid data silos?
 (I want both NASA + NOAA data in the same place) • How do we train (and retrain) scientists to feel comfortable with new tools cloud-native workflows? F u t u r e C h a l l e n g e s !26 Pangeo has the potential to transform Earth-System Science. But it’s not clear how to scale it.
  21. J o i n t h e C o m

    m u n i t y ! !27 http://pangeo.io