Slide 1

Slide 1 text

Catalog-driven, Reproducible Workflows for Ocean Science Rich Signell , USGS, Woods Hole, MA, USA Filipe Fernandes, Centro Universidade Monte Serrat, Santos, Brazil Woods Hole Coastal & Marine Science Center Meeting 2015-10-06

Slide 2

Slide 2 text

The Fourth Paradigm: eScience 1. Thousand years ago: science was empirical describing natural phenomena 2. Last few hundred years: + theoretical branch using models, generalizations 3. Last few decades: + computational branch simulating complex phenomena 4. Today: + data exploration branch (eScience) • Data captured by instruments or simulations • Processed by software • Information/knowledge stored in computer • Scientist analyzes database / files using data management and statistics Ref: Slide from Turing Award Winner Jim Gray’s presentation to the NRC, Jan 11, 2007 (last presentation - lost at sea Jan 28)

Slide 3

Slide 3 text

The 4th Network Layer: Data • “We need an end-to-end, layer-by-layer, designed information technology … that are composed of no more than a stack of protocols” • “We need open standards… and above all, we need to teach scientists to work in this new layer of data” 3 From the essay: “I have seen the Paradigm Shift, and It Is Us”, byJohn Wilbanks, in the book “The Fourth Paradigm” Data Web TCP/IP Ethernet

Slide 4

Slide 4 text

US Integrated Ocean Observing System (IOOS® ) IOOS® Plan defines: • Global Component • Coastal Component  17 Federal Agencies  11 Regional Associations

Slide 5

Slide 5 text

IOOS Core Principles • Adopt open standards & practices • Avoid customer-specific stovepipes • Standardized access services implemented at data providers 5 Customer Web access service Data Provider Observations Models

Slide 6

Slide 6 text

Ocean grids are often not regularly spaced! Stretched surface and terrain following vertical coordinates Curvilinear orthogonal horizontal coordinates

Slide 7

Slide 7 text

Unstructured (e.g. triangular) grid

Slide 8

Slide 8 text

NetCDF Climate and Forecast (CF) Conventions + UGRID + SGRID Groups using CF: GO-ESSP: Global Organization for Earth System Science Portal IOOS: Integrated Ocean Observing System ESMF: Earth System Modeling Framework OGC: Open Geospatial Consortium (GALEON: WCS profile)

Slide 9

Slide 9 text

Time Series, Trajectories Meteorology and Wave Buoy in the Gulf of Maine. Image courtesy of NOAA. Ocean Glider. Photo by Dave Fratantoni, Woods Hole Oceanographic Institution

Slide 10

Slide 10 text

OGC Sensor Observation Service (SOS) • Provides standard access to sensor data – GetCapabilities: provides the means to access SOS service metadata – DescribeSensor - retrieves detailed information about the sensors and processes generating those measurements. – GetObservation - provides access to sensor observations and measurement data via a spatio- temporal query that can be filtered by phenomena

Slide 11

Slide 11 text

IOOS Recommended Web Services and Data Encodings In-situ data (buoys, piers, towed sensors) Gridded data (model outputs, satellite) OGC Sensor Observation Service (SOS) OPeNDAP with Climate and Forecast Conventions XML or CSV Binary DAP using Climate and Forecast (CF) conventions Images of data OGC Web Map Service (WMS) GeoTIFF, PNG etc. -possibly with standardized styles Data Type Web Service Encoding

Slide 12

Slide 12 text

IOOS Data Infrastructure Diagram ROMS ADCIRC HYCOM SELFE NCOM NcML NcML NcML NcML NcML Common Data Model OPeNDAP+CF WCS NetCDF Subset THREDDS Data Server Standardized (CF-1.6, UGRID-0.9) Virtual Datasets Nonstandard Model Output Data Files Web Services Matlab Panoply IDV Clients NetCDF -Java Library or Broker WMS ncISO ArcGIS NetCDF4 -Python FVCOM Python ERDDAP NetCDF-Java SOS Geoportal Server GeoNetwork GI-CAT Observed data (buoy, gauge, ADCP, glider) Web Portals CKAN-pyCSW NcML Grid Ugrid TimeSeries Profile Trajectory TimeSeriesProfile Nonstandard Data Files Catalog Services

Slide 13

Slide 13 text

WMS-driven Model Viewing Portal

Slide 14

Slide 14 text

Interoperable access in Matlab (nctoolbox)

Slide 15

Slide 15 text

Interoperable Access in Python (Iris)

Slide 16

Slide 16 text

Catalog Search 16

Slide 17

Slide 17 text

Catalog Search 17 Catalog services can be federated via OGC CSW (Catalog Service for the Web)

Slide 18

Slide 18 text

IOOS System Test

Slide 19

Slide 19 text

2015 Boston Light Swim, Aug 15, 7:00am since 1907, 8 miles, no wet suit How cold will the water be?

Slide 20

Slide 20 text

NECOFS Massbay Forecast

Slide 21

Slide 21 text

Reproducible IPython/Jupyter Notebook

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Final Result

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Reproducible in Minutes for Free

Slide 29

Slide 29 text

163 Python packages on IOOS channel!

Slide 30

Slide 30 text

Benefits of Standards-Based, Catalog-Driven, Reproducible Workflows • Find the real problems – Easy problems that can be fixed in minutes to day – Harder problems to guide future work • Fixes for specific workflows benefit everyone • Build success stories • Create reproducible workflows that others can learn from, expand on, or transform • Standardized workflows help develop the 4th network layer for data

Slide 31

Slide 31 text

[ rsignell-usgs | ocefpaf ] & github