Catalog-driven, Reproducible
Workflows for Ocean Science
Rich Signell , USGS, Woods Hole, MA, USA
Filipe Fernandes, Centro Universidade Monte Serrat, Santos, Brazil
Woods Hole Coastal & Marine Science Center Meeting
2015-10-06
Slide 2
Slide 2 text
The Fourth Paradigm: eScience
1. Thousand years ago: science was empirical
describing natural phenomena
2. Last few hundred years: + theoretical branch
using models, generalizations
3. Last few decades: + computational branch
simulating complex phenomena
4. Today: + data exploration branch (eScience)
• Data captured by instruments or simulations
• Processed by software
• Information/knowledge stored in computer
• Scientist analyzes database / files using data management and
statistics
Ref: Slide from Turing Award Winner Jim Gray’s presentation
to the NRC, Jan 11, 2007 (last presentation - lost at sea Jan 28)
Slide 3
Slide 3 text
The 4th Network Layer: Data
• “We need an end-to-end, layer-by-layer,
designed information technology … that are
composed of no more than a stack of protocols”
• “We need open standards… and above all, we
need to teach scientists to work in this new layer
of data”
3
From the essay: “I have seen the Paradigm Shift, and It Is Us”,
byJohn Wilbanks, in the book “The Fourth Paradigm”
Data
Web
TCP/IP
Ethernet
Slide 4
Slide 4 text
US Integrated Ocean Observing System (IOOS®
)
IOOS® Plan defines:
• Global Component
• Coastal Component
17 Federal Agencies
11 Regional Associations
Slide 5
Slide 5 text
IOOS Core Principles
• Adopt open standards & practices
• Avoid customer-specific stovepipes
• Standardized access services implemented at
data providers
5
Customer
Web access
service
Data
Provider
Observations
Models
Slide 6
Slide 6 text
Ocean grids are often not regularly spaced!
Stretched surface and terrain
following vertical coordinates
Curvilinear orthogonal
horizontal coordinates
Slide 7
Slide 7 text
Unstructured (e.g. triangular) grid
Slide 8
Slide 8 text
NetCDF Climate and Forecast (CF)
Conventions + UGRID + SGRID
Groups using CF:
GO-ESSP: Global
Organization for Earth
System Science Portal
IOOS: Integrated Ocean
Observing System
ESMF: Earth System
Modeling Framework
OGC: Open Geospatial
Consortium (GALEON:
WCS profile)
Slide 9
Slide 9 text
Time Series, Trajectories
Meteorology and Wave Buoy in the Gulf
of Maine. Image courtesy of NOAA.
Ocean Glider. Photo by Dave Fratantoni,
Woods Hole Oceanographic Institution
Slide 10
Slide 10 text
OGC Sensor Observation Service (SOS)
• Provides standard access to sensor data
– GetCapabilities: provides the means to access SOS
service metadata
– DescribeSensor - retrieves detailed information about
the sensors and processes generating those
measurements.
– GetObservation - provides access to sensor
observations and measurement data via a spatio-
temporal query that can be filtered by phenomena
Slide 11
Slide 11 text
IOOS Recommended Web Services
and Data Encodings
In-situ data (buoys,
piers, towed sensors)
Gridded data (model
outputs, satellite)
OGC Sensor
Observation Service
(SOS)
OPeNDAP with Climate
and Forecast
Conventions
XML or CSV
Binary DAP using
Climate and Forecast
(CF) conventions
Images of data
OGC Web Map Service
(WMS)
GeoTIFF, PNG etc.
-possibly with
standardized styles
Data Type Web Service Encoding
Slide 12
Slide 12 text
IOOS Data Infrastructure Diagram
ROMS
ADCIRC
HYCOM
SELFE
NCOM NcML
NcML
NcML
NcML
NcML
Common
Data Model
OPeNDAP+CF
WCS
NetCDF Subset
THREDDS Data Server
Standardized
(CF-1.6, UGRID-0.9)
Virtual Datasets
Nonstandard
Model Output
Data Files
Web Services Matlab
Panoply
IDV
Clients
NetCDF
-Java
Library
or Broker
WMS
ncISO
ArcGIS
NetCDF4
-Python
FVCOM
Python
ERDDAP
NetCDF-Java
SOS
Geoportal Server
GeoNetwork
GI-CAT
Observed data
(buoy, gauge,
ADCP, glider)
Web Portals
CKAN-pyCSW
NcML
Grid
Ugrid
TimeSeries
Profile
Trajectory
TimeSeriesProfile
Nonstandard
Data Files
Catalog
Services
Slide 13
Slide 13 text
WMS-driven Model Viewing Portal
Slide 14
Slide 14 text
Interoperable access in Matlab (nctoolbox)
Slide 15
Slide 15 text
Interoperable Access in Python (Iris)
Slide 16
Slide 16 text
Catalog Search
16
Slide 17
Slide 17 text
Catalog Search
17
Catalog services can be federated via
OGC CSW (Catalog Service for the Web)
Slide 18
Slide 18 text
IOOS System Test
Slide 19
Slide 19 text
2015 Boston Light Swim, Aug 15, 7:00am
since 1907, 8 miles, no wet suit
How cold will the water be?
Slide 20
Slide 20 text
NECOFS Massbay Forecast
Slide 21
Slide 21 text
Reproducible IPython/Jupyter Notebook
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
Final Result
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
Reproducible in Minutes for Free
Slide 29
Slide 29 text
163 Python packages on IOOS channel!
Slide 30
Slide 30 text
Benefits of Standards-Based,
Catalog-Driven, Reproducible Workflows
• Find the real problems
– Easy problems that can be fixed in minutes to day
– Harder problems to guide future work
• Fixes for specific workflows benefit everyone
• Build success stories
• Create reproducible workflows that others can
learn from, expand on, or transform
• Standardized workflows help develop the 4th
network layer for data