Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ocean Cloud (OceanShot NAS Plenary Talk)

Ocean Cloud (OceanShot NAS Plenary Talk)

Plenary talk presented at the 2021 February 3-4 launch meeting in response to the U.S. National Committee for the Ocean Decade call for disruptive advances in ocean science.

https://www.nationalacademies.org/our-work/us-national-committee-on-ocean-science-for-sustainable-development-2021-2030/ocean-shot-directory

Ryan Abernathey

June 21, 2021
Tweet

More Decks by Ryan Abernathey

Other Decks in Science

Transcript

  1. O c e a n C l o u d
    Transforming oceanography with a new
    approach to data and computing
    Ryan Abernathey

    View Slide

  2. Physical Oceanographer
    Ph.D. From MIT, 2012
    Associate Prof. at Columbia / LDEO
    https://ocean-transport.github.io/
    Co-founder of Pangeo
    Open Source Developer
    Open Science Advocate
    O c e a n C l o u d
    Transforming oceanography with a new
    approach to data and computing
    Ryan Abernathey

    View Slide

  3. Problem:
    Ocean data are huge and complex! 🤯
    This limits scientific inquiry and restricts participation. 😔
    Solution:
    OceanCloud: a new approach to infrastructure based on
    cloud computing, open data, and open-source software. 😎
    T h i s Ta l k
    3

    View Slide

  4. 4
    Credit: NASA's Goddard Space Flight Center

    View Slide

  5. 4
    Credit: NASA's Goddard Space Flight Center

    View Slide

  6. 4
    Credit: NASA's Goddard Space Flight Center
    https://earthdata.nasa.gov/eosdis/cloud-evolution
    SWOT
    NISAR

    View Slide

  7. 5
    Credit: NASA's Goddard Space Flight Center

    View Slide

  8. 5
    Credit: NASA's Goddard Space Flight Center

    View Slide

  9. 6
    “Brb, let me just go download the data to my laptop…”

    View Slide

  10. 6
    PB
    “Brb, let me just go download the data to my laptop…”

    View Slide

  11. P r i v i l e g e d I n s t i t u t i o n s c r e at e
    “ D ata F o r t r e s s e s * ”
    7
    Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann

    View Slide

  12. P r i v i l e g e d I n s t i t u t i o n s c r e at e
    “ D ata F o r t r e s s e s * ”
    7
    Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann

    View Slide

  13. P r i v i l e g e d I n s t i t u t i o n s c r e at e
    “ D ata F o r t r e s s e s * ”
    8
    Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons
    Data
    *Coined by Chelle Gentemann

    View Slide

  14. P r i v i l e g e d I n s t i t u t i o n s c r e at e
    “ D ata F o r t r e s s e s * ”
    8
    Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons
    Data
    *Coined by Chelle Gentemann

    View Slide

  15. P r i v i l e g e d I n s t i t u t i o n s c r e at e
    “ D ata F o r t r e s s e s * ”
    9
    Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons
    Data
    ❌ Results not reproducible outside fortress
    ❌ Barrier to collaboration
    ❌ Inefficient / duplicative
    ❌ Can’t scale to future data needs
    ❌ Limits inclusion and knowledge transfer
    *Coined by Chelle Gentemann

    View Slide

  16. • Grass-roots collaboration between scientists,
    software developers around open-source tools
    for solving real problems
    • Foundational support from
    NSF EarthCube
    • International partners,
    industry connections
    10
    A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e
    http://pangeo.io

    View Slide

  17. • Grass-roots collaboration between scientists,
    software developers around open-source tools
    for solving real problems
    • Foundational support from
    NSF EarthCube
    • International partners,
    industry connections
    10
    A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e
    http://pangeo.io

    View Slide

  18. Scientific users / use cases
    Open-source software libraries
    HPC and cloud infrastructure
    • Define science questions
    • Use software / infrastructure
    • Identify bugs / bottlenecks
    • Provide feedback to developers
    • Contribute widely the the open source
    scientific python ecosystem
    • Maintain / extend existing libraries,
    start new ones reluctantly
    • Solve integration challenges
    • Deploy interactive analysis environments
    • Curate analysis-ready datasets
    • Platform agnostic
    Agile
    development
    👩💻
    Pa n g e o B u i l d s w i t h O p e n D e v e l o p m e n t
    11

    View Slide

  19. T h r e e P i l l a r s o f C l o u d D ata
    E n v i r o n m e n t s
    12
    “Analysis Ready Data”
    Cleaned, curated open-
    access datasets available via
    high-performance globally
    available strorage system
    “Elastic Scaling”
    Automatically provision many
    computers on demand to
    accelerate big data processing.
    cloud
    “Data Proximate Computing”
    Bring analysis to the data. Web-
    based access provides “on-click
    to compute” access.

    View Slide

  20. T h r e e P i l l a r s o f C l o u d D ata
    E n v i r o n m e n t s
    12
    “Analysis Ready Data”
    Cleaned, curated open-
    access datasets available via
    high-performance globally
    available strorage system
    “Elastic Scaling”
    Automatically provision many
    computers on demand to
    accelerate big data processing.
    cloud
    “Data Proximate Computing”
    Bring analysis to the data. Web-
    based access provides “on-click
    to compute” access.

    View Slide

  21. O c e a n - C l o u d W i l l b e a
    “ D ata W at e r i n g H o l e * ”
    13
    *Coined by Fernando Perez

    View Slide

  22. 14
    *Coined by Fernando Perez
    O c e a n - C l o u d W i l l b e a
    “ D ata W at e r i n g H o l e * ”

    View Slide

  23. 15
    👩💻👨💻👩💻
    Group A:
    Air-Sea Interaction
    👩💻👨💻👩💻
    Group B:
    Seasonal Forecasting
    Research
    Education
    Industry
    *Coined by Fernando Perez
    O c e a n - C l o u d W i l l b e a
    “ D ata W at e r i n g H o l e * ”

    View Slide

  24. 15
    👩💻👨💻👩💻
    Group A:
    Air-Sea Interaction
    👩💻👨💻👩💻
    Group B:
    Seasonal Forecasting
    Research
    Education
    Industry
    ✅ Faster science, more discoveries
    ✅ Inherently reproducible
    ✅ Allows seamless global collaboration
    ✅ Unleashes creativity
    ✅ Cost effective
    ✅ Accessible to all
    ✅ Connects with industry
    *Coined by Fernando Perez
    O c e a n - C l o u d W i l l b e a
    “ D ata W at e r i n g H o l e * ”

    View Slide

  25. • Many agencies (e.g. NASA, NOAA) are already moving data distribution to
    cloud…
    • …but the missing link is accessible cloud computing environments for all.
    Pangeo and its partners can help with this.
    • We must avoid building new fortresses in the cloud and ensure
    interoperability from the start!
    • A National Oceanographic Partnership Program (NOPP) could provide
    funding, help facilitate inter-agency collaboration, and support user
    adoption.
    H o w c a n w e A c h i e v e t h i s ?
    16

    View Slide

  26. L e a r n M o r e
    17
    http://pangeo.io

    https://github.com/pangeo-data/

    https://medium.com/pangeo

    @pangeo_data

    View Slide