Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Let's love DATA.GOV

Rich Signell
September 02, 2015

Let's love DATA.GOV

A case for conducting system tests on DATA.GOV in the form of reproducible, end-to-end Python notebooks. Presentation to USGEO on 2015-09-02

Rich Signell

September 02, 2015
Tweet

More Decks by Rich Signell

Other Decks in Science

Transcript

  1. Let’s love DATA.GOV: a case for system tests Rich Signell

    , USGS, Woods Hole, MA, USA Derrick Snowden, IOOS Filipe Fernandes, SECOORA Kyle Wilcox, Axiom Data Science Eoin Howlett, Kelly Knee, RPS/ASA Tom Kralidis, Meteorological Service of Canada Anna Milan, Dave Neufeld, Yuanjie Li, NOAA NCEI Ted Habermann (HDF Group) Unidata Program Center British Met Office NICTA Australia … USGEO Meeting, 9/2/2015
  2. Data: The 4th Network Layer • “We need an end-to-end,

    layer-by-layer, designed information technology … that are composed of no more than a stack of protocols” • “We need open standards… and above all, we need to teach scientists to work in this new layer of data” 2 From the essay: “I have seen the Paradigm Shift, and It Is Us”, byJohn Wilbanks, in the book “The Fourth Paradigm” Data Web TCP/IP Ethernet
  3. US Integrated Ocean Observing System (IOOS® ) IOOS® Plan defines:

    • Global Component • Coastal Component  17 Federal Agencies  11 Regional Associations
  4. IOOS Core Principles • Adopt open standards & practices •

    Avoid customer-specific stovepipes • Standardized access services implemented at data providers 4 Customer Web access service Data Provider Observations Models
  5. IOOS Recommended Web Services and Data Encodings In-situ data (buoys,

    piers, towed sensors) Gridded data (model outputs, satellite) OGC Sensor Observation Service (SOS) OPeNDAP with Climate and Forecast Conventions XML or CSV Binary DAP using Climate and Forecast (CF) conventions Images of data OGC Web Map Service (WMS) GeoTIFF, PNG etc. -possibly with standardized styles Data Type Web Service Encoding
  6. IOOS Data Infrastructure Diagram ROMS ADCIRC HYCOM SELFE NCOM NcML

    NcML NcML NcML NcML Common Data Model OPeNDAP+CF WCS NetCDF Subset THREDDS Data Server Standardized (CF-1.6, UGRID-0.9) Virtual Datasets Nonstandard Model Output Data Files Web Services Matlab Panoply IDV Clients NetCDF -Java Library or Broker WMS ncISO ArcGIS NetCDF4 -Python FVCOM Python ERDDAP NetCDF-Java SOS Geoportal Server GeoNetwork GI-CAT Observed data (buoy, gauge, ADCP, glider) Godiva2 CKAN-pyCSW NcML Grid Ugrid TimeSeries Profile Trajectory TimeSeriesProfile Nonstandard Data Files CSW Catalog Services
  7. 7

  8. 2015 Boston Light Swim, Aug 15, 7:00am since 1907, 8

    miles, no wet suit How cold will the water be?
  9. Benefits of System Test Notebooks • Find the real problems

    – Easy problems that can be fixed in minutes to day – Harder problems to guide future work • Fixes for specific workflows benefit everyone • Create reproducible workflows that others can learn from, expand on, or transform • Build success stories • Make scientific discoveries by accident
  10. 20

  11. 22

  12. 23

  13. Client Software Stack • Environment – IPython Notebooks, Anaconda, Binstar,

    Wakari, Github • Search – CSW using OWSLib • Access – OPenDAP+CF using Iris and Pyugrid – Sensor Observation Service (SOS) using OWSLib and PyOOS • Analysis and Plotting – Scipy, Pandas, Matplotlib, Cartopy, Vincent, Folium
  14. Summary • Standards, web services and catalogs allow us to

    serve data in a unified way • Python gives us a free scientific access, analysis and visualization environment • Ipython/Jupyter notebooks give us documented workflows and browser interface • Anaconda and anaconda.org lets anyone easily reproduce our workflows • Result: more efficient and effective access to ocean data, and anyone can assess ocean model skill