Catalog-driven workflows using CSW
Rich Signell , USGS, Woods Hole, MA, USA
Filipe Fernandes, SECOORA, Brazil
Kyle Wilcox, Axiom Data Science, Wickford, RI, USA
ESIP Winter Meeting, Washington, DC
2016-01-08
Slide 2
Slide 2 text
The 4th Network Layer: Data
• “We need an end-to-end, layer-by-layer,
designed information technology … that are
composed of no more than a stack of protocols”
• “We need open standards… and above all, we
need to teach scientists to work in this new layer
of data”
2
From the essay: “I have seen the Paradigm Shift, and It Is Us”,
byJohn Wilbanks, in the book “The Fourth Paradigm”
Data
Web
TCP/IP
Ethernet
Slide 3
Slide 3 text
US Integrated Ocean Observing System (IOOS®
)
• Global Component
• Coastal Component
17 Federal Agencies
11 Regional Associations
Slide 4
Slide 4 text
IOOS Core Principles
• Adopt open standards & practices
• Avoid customer-specific stovepipes
• Standardized access services implemented at
data providers
4
Customer
Web access
service
Data
Provider
Observations
Models
Slide 5
Slide 5 text
Numerical model Output
Slide 6
Slide 6 text
Time Series, Trajectories
Meteorology and Wave Buoy in the Gulf
of Maine. Image courtesy of NOAA.
Ocean Glider. Photo by Dave Fratantoni,
Woods Hole Oceanographic Institution
Slide 7
Slide 7 text
IOOS Data Infrastructure Diagram
ROMS
ADCIRC
HYCOM
SELFE
NCOM NcML
NcML
NcML
NcML
NcML
Common
Data Model
OPeNDAP
NetCDF Subset
THREDDS Data Server
Standardized
(CF-1.6, SGRID-0.1, UGRID-0.9)
Virtual Datasets
Nonstandard
Model Output
Data Files
Web Services
Matlab
Panoply
IDV
Clients
NetCDF
-Java
Library
or Broker
WMS
ncISO
ArcGIS
NetCDF4
-Python
FVCOM
Python
EDC
NetCDF-Java
SOS
Geoportal Server
GeoNetwork
CKAN
Observed data
(buoy, gauge,
ADCP, glider)
Web Portals
pycsw
NcML
Grid
TimeSeries
Profile
Trajectory
TimeSeriesProfile
Sgrid
Ugrid
Nonstandard Data Files
Catalog
Services
Rectilinear
ERDDAP
WCS
Slide 8
Slide 8 text
Catalog Search
8
Slide 9
Slide 9 text
Interoperable Access in Python (Iris)
Slide 10
Slide 10 text
IOOS System Test
Slide 11
Slide 11 text
2015 Boston Light Swim
2015 Aug 15, 7:00 am start
8 mile swim
No wet suit
How cold will the water be?
Slide 12
Slide 12 text
NECOFS Massbay Forecast
Slide 13
Slide 13 text
Reproducible Jupyter Notebook
Go to https://github.com/ocefpaf/boston_light_swim, click on “launch binder” to run on cloud
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
Final Result
Slide 18
Slide 18 text
18
Slide 19
Slide 19 text
19
Slide 20
Slide 20 text
pycsw
20
Slide 21
Slide 21 text
Workflow for the USGS CMG Portal
21
Slide 22
Slide 22 text
Workflow (3/3)
Axiom Data Science
– Runs a CSW search (in a cron job) on the
modeling groups pycsw services, filtering on
datasets that contain a project called
“CMG_Portal”
– Datasets that have valid WMS services are
added to the portal
See for details of the workflow
22
Slide 23
Slide 23 text
23
Slide 24
Slide 24 text
WMS-driven Model Viewing Portal
Slide 25
Slide 25 text
25
Slide 26
Slide 26 text
Interoperable access in Matlab (nctoolbox)
Slide 27
Slide 27 text
27
Slide 28
Slide 28 text
28
Slide 29
Slide 29 text
Catalog-driven dynamic portals
29
Slide 30
Slide 30 text
30
Slide 31
Slide 31 text
Benefits of catalog-driven applications
• Dynamically adapt to new or changing data
• Find the machine-to-machine issues
– Easy problems that can be fixed in minutes to day
– Harder problems to guide future work
• Fixes for your workflow benefit everyone
• Build success stories
• Create reproducible workflows that others can learn from,
expand on, or transform
• Standardized workflows help develop the 4th network layer
for data