Slide 1

Slide 1 text

O c e a n C l o u d Transforming oceanography with a new approach to data and computing Ryan Abernathey

Slide 2

Slide 2 text

Physical Oceanographer Ph.D. From MIT, 2012 Associate Prof. at Columbia / LDEO https://ocean-transport.github.io/ Co-founder of Pangeo Open Source Developer Open Science Advocate O c e a n C l o u d Transforming oceanography with a new approach to data and computing Ryan Abernathey

Slide 3

Slide 3 text

Problem: Ocean data are huge and complex! 🤯 This limits scientific inquiry and restricts participation. 😔 Solution: OceanCloud: a new approach to infrastructure based on cloud computing, open data, and open-source software. 😎 T h i s Ta l k 3

Slide 4

Slide 4 text

4 Credit: NASA's Goddard Space Flight Center

Slide 5

Slide 5 text

4 Credit: NASA's Goddard Space Flight Center

Slide 6

Slide 6 text

4 Credit: NASA's Goddard Space Flight Center https://earthdata.nasa.gov/eosdis/cloud-evolution SWOT NISAR

Slide 7

Slide 7 text

5 Credit: NASA's Goddard Space Flight Center

Slide 8

Slide 8 text

5 Credit: NASA's Goddard Space Flight Center

Slide 9

Slide 9 text

6 “Brb, let me just go download the data to my laptop…”

Slide 10

Slide 10 text

6 PB “Brb, let me just go download the data to my laptop…”

Slide 11

Slide 11 text

P r i v i l e g e d I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 7 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann

Slide 12

Slide 12 text

P r i v i l e g e d I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 7 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons *Coined by Chelle Gentemann

Slide 13

Slide 13 text

P r i v i l e g e d I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 8 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data *Coined by Chelle Gentemann

Slide 14

Slide 14 text

P r i v i l e g e d I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 8 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data *Coined by Chelle Gentemann

Slide 15

Slide 15 text

P r i v i l e g e d I n s t i t u t i o n s c r e at e “ D ata F o r t r e s s e s * ” 9 Image credit: Moahim, CC BY-SA 4.0, via Wikimedia Commons Data ❌ Results not reproducible outside fortress ❌ Barrier to collaboration ❌ Inefficient / duplicative ❌ Can’t scale to future data needs ❌ Limits inclusion and knowledge transfer *Coined by Chelle Gentemann

Slide 16

Slide 16 text

• Grass-roots collaboration between scientists, software developers around open-source tools for solving real problems • Foundational support from NSF EarthCube • International partners, industry connections 10 A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e http://pangeo.io

Slide 17

Slide 17 text

• Grass-roots collaboration between scientists, software developers around open-source tools for solving real problems • Foundational support from NSF EarthCube • International partners, industry connections 10 A C o m m u n i t y P l at f o r m f o r B i g - D ata G e o s c i e n c e http://pangeo.io

Slide 18

Slide 18 text

Scientific users / use cases Open-source software libraries HPC and cloud infrastructure • Define science questions • Use software / infrastructure • Identify bugs / bottlenecks • Provide feedback to developers • Contribute widely the the open source scientific python ecosystem • Maintain / extend existing libraries, start new ones reluctantly • Solve integration challenges • Deploy interactive analysis environments • Curate analysis-ready datasets • Platform agnostic Agile development 👩💻 Pa n g e o B u i l d s w i t h O p e n D e v e l o p m e n t 11

Slide 19

Slide 19 text

T h r e e P i l l a r s o f C l o u d D ata E n v i r o n m e n t s 12 “Analysis Ready Data” Cleaned, curated open- access datasets available via high-performance globally available strorage system “Elastic Scaling” Automatically provision many computers on demand to accelerate big data processing. cloud “Data Proximate Computing” Bring analysis to the data. Web- based access provides “on-click to compute” access.

Slide 20

Slide 20 text

T h r e e P i l l a r s o f C l o u d D ata E n v i r o n m e n t s 12 “Analysis Ready Data” Cleaned, curated open- access datasets available via high-performance globally available strorage system “Elastic Scaling” Automatically provision many computers on demand to accelerate big data processing. cloud “Data Proximate Computing” Bring analysis to the data. Web- based access provides “on-click to compute” access.

Slide 21

Slide 21 text

O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ” 13 *Coined by Fernando Perez

Slide 22

Slide 22 text

14 *Coined by Fernando Perez O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”

Slide 23

Slide 23 text

15 👩💻👨💻👩💻 Group A: Air-Sea Interaction 👩💻👨💻👩💻 Group B: Seasonal Forecasting Research Education Industry *Coined by Fernando Perez O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”

Slide 24

Slide 24 text

15 👩💻👨💻👩💻 Group A: Air-Sea Interaction 👩💻👨💻👩💻 Group B: Seasonal Forecasting Research Education Industry ✅ Faster science, more discoveries ✅ Inherently reproducible ✅ Allows seamless global collaboration ✅ Unleashes creativity ✅ Cost effective ✅ Accessible to all ✅ Connects with industry *Coined by Fernando Perez O c e a n - C l o u d W i l l b e a “ D ata W at e r i n g H o l e * ”

Slide 25

Slide 25 text

• Many agencies (e.g. NASA, NOAA) are already moving data distribution to cloud… • …but the missing link is accessible cloud computing environments for all. Pangeo and its partners can help with this. • We must avoid building new fortresses in the cloud and ensure interoperability from the start! • A National Oceanographic Partnership Program (NOPP) could provide funding, help facilitate inter-agency collaboration, and support user adoption. H o w c a n w e A c h i e v e t h i s ? 16

Slide 26

Slide 26 text

L e a r n M o r e 17 http://pangeo.io https://github.com/pangeo-data/ https://medium.com/pangeo @pangeo_data