Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Catalog-driven, Reproducible Workflows for Ocea...

Rich Signell
January 14, 2016

Catalog-driven, Reproducible Workflows for Ocean Science

Presentation at the Hazards-SEES Kick-off meeting, MIT

Rich Signell

January 14, 2016
Tweet

More Decks by Rich Signell

Other Decks in Science

Transcript

  1. Catalog-driven, Reproducible Workflows for Ocean Science Rich Signell , USGS,

    Woods Hole, MA, USA Filipe Fernandes, SECOORA, Salvador, Brazil Kyle Wilcox, Axiom Data Science, Wickford, RI Hazards-SEES Kick-off Meeting, MIT 2016-01-14
  2. The 4th Network Layer: Data • “We need an end-to-end,

    layer-by-layer, designed information technology … that are composed of no more than a stack of protocols” • “We need open standards… and above all, we need to teach scientists to work in this new layer of data” 2 From the essay: “I have seen the Paradigm Shift, and It Is Us”, byJohn Wilbanks, in the book “The Fourth Paradigm” Data Web TCP/IP Ethernet
  3. US Integrated Ocean Observing System (IOOS® ) IOOS® Plan defines:

    • Global Component • Coastal Component  17 Federal Agencies  11 Regional Associations
  4. IOOS Core Principles • Adopt open standards & practices •

    Avoid customer-specific stovepipes • Standardized access services implemented at data providers 4 Customer Web access service Data Provider Observations Models
  5. Ocean grids are often not regularly spaced! Stretched surface and

    terrain following vertical coordinates Curvilinear orthogonal horizontal coordinates
  6. NetCDF Climate and Forecast (CF) Conventions + UGRID + SGRID

    Groups using CF: GO-ESSP: Global Organization for Earth System Science Portal IOOS: Integrated Ocean Observing System ESMF: Earth System Modeling Framework OGC: Open Geospatial Consortium (GALEON: WCS profile)
  7. Time Series, Trajectories Meteorology and Wave Buoy in the Gulf

    of Maine. Image courtesy of NOAA. Ocean Glider. Photo by Dave Fratantoni, Woods Hole Oceanographic Institution
  8. IOOS Data Infrastructure Diagram ROMS ADCIRC HYCOM SELFE NCOM NcML

    NcML NcML NcML NcML Common Data Model OPeNDAP+CF WCS NetCDF Subset THREDDS Data Server Standardized (CF-1.6, UGRID-0.9) Virtual Datasets Nonstandard Model Output Data Files Web Services Matlab Panoply IDV Clients NetCDF -Java Library or Broker WMS ncISO ArcGIS NetCDF4 -Python FVCOM Python ERDDAP NetCDF-Java SOS Geoportal Server GeoNetwork GI-CAT Observed data (buoy, gauge, ADCP, glider) Web Portals CKAN-pyCSW NcML Grid Ugrid TimeSeries Profile Trajectory TimeSeriesProfile Nonstandard Data Files Catalog Services
  9. 2015 Boston Light Swim, Aug 15, 7:00am since 1907, 8

    miles, no wet suit How cold will the water be?
  10. Benefits of Standards-Based, Catalog-Driven, Reproducible Workflows • Find the real

    problems – Easy problems that can be fixed in minutes to day – Harder problems to guide future work • Fixes for specific workflows benefit everyone • Build success stories • Create reproducible workflows that others can learn from, expand on, or transform • Standardized workflows help develop the 4th network layer for data
  11. Questions • 1. What do you see as being the

    most important items for a successful Hazards project? 2. What are you most looking forward to achieve personally and as a team? 3. What are the main things that you need from other team members? 4. What are the biggest challenges for your role in the Hazards project? • Work with team members to achieve standardized data, services and use community tools. Standards for Lagrangian scientific feature types, CZML, TerriaJS