Save 37% off PRO during our Black Friday Sale! »

Earth Science Data in the 2020s: Rising Tide or Catastrophic Flood?”

Earth Science Data in the 2020s: Rising Tide or Catastrophic Flood?”

Presentation given at the National Academy of Science for a symposium honoring Michael Freilich's career.

https://web.cvent.com/event/95da1782-9572-4bcd-ba66-63b9ed60db8d/summary

654d48d6c1c10c50c160954ba31207a2?s=128

Ryan Abernathey

January 21, 2020
Tweet

Transcript

  1. E a r t h S c i e n

    c e D ata i n t h e 2 0 2 0 s : R i s i n g T i d e o r C ata s t r o p h i c F l o o d ? ” R y a n A b e r n a t h e y C o l u m b i a / L D E O
  2. !2 Credit: NASA SVS / PODAAC

  3. !2 Credit: NASA SVS / PODAAC

  4. !3 https://earthdata.nasa.gov/eosdis/cloud-evolution SWOT NISAR

  5. W h at S c i e n c e

    d o w e w a n t t o d o w i t h S at e l l i t e D ata? !4
  6. !5 Take the mean! W h at S c i

    e n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata?
  7. !6 Analyze spatiotemporal variability W h at S c i

    e n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata?
  8. !7 Machine learning! Credit: Berkeley Lab W h at S

    c i e n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata?
  9. !8 Data Assimilation W h at S c i e

    n c e d o w e w a n t t o d o w i t h S at e l l i t e D ata? Lahoz, W. A., & Schneider, P. (2014). Data assimilation: making sense of Earth Observation. Frontiers in Environmental Science, 2, 16. https://doi.org/10.3389/fenvs.2014.00016
  10. H o w ? !9

  11. D o w n l o a d !10

  12. D o w n l o a d !11 MB

  13. D o w n l o a d !12 GB

  14. D o w n l o a d !13 TB

  15. D o w n l o a d !14 PB

  16. N e v e r M i n d …

    !15 H o w ? Let’s “bring the compute to the data”!
  17. U s e a “ P l at f o

    r m ” !16 Database
  18. U s e a “ P l at f o

    r m ” !16 Database
  19. U s e a “ P l at f o

    r m ” !16 Database
  20. • Scientists’ creativity nearly always exceeds pre-baked capabilities. • What

    if you want to access data that isn’t included? • Who pays? Are platform priorities aligned with scientific community? T h e T r o u b l e w i t h “ P l at f o r m s ” !17
  21. F e d e r at e d C l

    o u d A r c h i t e c t u r e !18
  22. F e d e r at e d C l

    o u d A r c h i t e c t u r e !19 Analysis Ready Data
 Cloud Optimized Formats
  23. F e d e r at e d C l

    o u d A r c h i t e c t u r e !20 Analysis Ready Data
 Cloud Optimized Formats Scalable Parallel Computing Frameworks
  24. • Open Community • Open Source Software • Open Source

    Infrastructure !21 W h at i s Pa n g e o ? “A community platform for Big Data geoscience”
  25. !22 Pa n g e o C o m m

    u n i t y http://pangeo.io
  26. !23 Pa n g e o A r c h

    i t e c t u r e Jupyter for interactive access remote systems Cloud / HPC Xarray provides data structures and intuitive interface for interacting with datasets Parallel computing system allows users deploy clusters of compute nodes for data processing. Dask tells the nodes what to do. Distributed storage “Analysis Ready Data”
 stored on globally-available distributed storage.
  27. aospy Pa n g e o S o f t

    w a r e E c o s y s t e m !24 SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015)
  28. G r a s s - R o o t

    s A d o p t i o n !25
  29. • How do we support / sustain open-source foundational software

    tools?
 (No agency or lab “owns” these, but they are critical infrastructure.) • Who provides cloud-style computing to the science community? • How do we avoid data silos?
 (I want both NASA + NOAA data in the same place) • How do we train (and retrain) scientists to feel comfortable with new tools cloud-native workflows? F u t u r e C h a l l e n g e s !26 Pangeo has the potential to transform Earth-System Science. But it’s not clear how to scale it.
  30. J o i n t h e C o m

    m u n i t y ! !27 http://pangeo.io