$30 off During Our Annual Pro Sale. View Details »

Challenges and needs of reproducible workflows of Open Weather and Climate Data

Challenges and needs of reproducible workflows of Open Weather and Climate Data

ECMWF offers, also as operator of the two Copernicus services on Climate Change (C3S) and Atmosphere Monitoring (CAMS), a range of open environmental data sets on climate, air quality, fire and floods.

Through Copernicus, a wealth of open data is being made available free and open and a new range of users, not necessarily ‘expert’ users, are interested in exploiting the data.
This makes the reproducibility of workflows particularly important A full, free and open data policy is vital for reproducible workflows and an important prerequisite. Reproducibility however has to be reflected in all aspects of the data processing chain. The biggest challenge is currently a limited data ‘accessibility’, where ‘accessibility’ means more than just improving data access. Accessibility aspects are strongly linked with being reproducible and
require improvements / developments along the entire data processing chain, including the development of example workflows and reproducible training materials, the need for data standards and interoperability, as well as developing or improving the right open-source software tools.

The presentation will go through each step of some example workflows for open meteorological and climate data and will discuss reproducibility and ‘accessibility’ challenges and future needs that will be required in order to make open meteorological and climate data fully accessible and reproducible.

Julia Wagemann

October 14, 2019
Tweet

More Decks by Julia Wagemann

Other Decks in Technology

Transcript

  1. Challenges and
    Needs of
    REPRODUCIBLE
    WORKFLOWS
    of
    Open Big Weather and
    Climate Data
    Julia Wagemann
    PhD candidate at University of Marburg
    Visiting Scientist at ECMWF
    @JuliaWagemann
    #repwork19
    Reading, 14 Oct 2019

    View Slide

  2. Basic NEEDS for reproducible workflows:
    ❖ Open data

    View Slide

  3. Basic NEEDS for reproducible workflows:
    ❖ Open data ❖ Open-source software
    / FOSS4G

    View Slide

  4. Basic NEEDS for reproducible workflows:
    ❖ Open data ❖ Open-source software
    / FOSS4G

    View Slide

  5. Basic NEEDS for reproducible workflows:
    ❖ Open data ❖ Open-source software
    / FOSS4G

    View Slide

  6. Islands of Open Big Earth Data

    View Slide

  7. Islands of Open Big Earth Data
    Meteorological /
    climate community
    Earth Observation
    community

    View Slide

  8. Islands of Open Big Earth Data
    Meteorological /
    climate community
    Earth Observation
    community
    ● Copernicus
    Climate Data
    Store
    ● GRIB, NetCDF
    ● Google Earth
    Engine
    ● GeoTiff,
    JPEG2000

    View Slide

  9. Principle components of a Big Earth Data workflow
    Access Processing Visualization

    View Slide

  10. Reproducibility challenge - Example

    View Slide

  11. Reproducibility challenge - DATA ACCESSIBILITY
    ● different data are accessible via different
    data access systems
    ● it is still about downloading data
    ● community-specific data formats
    (GRIB, NetCDF, GeoTiff)
    ● data structure and complexity
    (analyses vs forecast, multiple dimensions)
    Access

    View Slide

  12. ERA5 surface pressure

    View Slide

  13. Reproducibility challenge - DATA ACCESSIBILITY - Question
    If a process involves data
    download that might take
    several months...
    Do we call it reproducible?

    View Slide

  14. One great example of a reproducible workflow...

    ● Processing
    cdsapi
    xarray
    cfgrib
    xESMF cartopy
    https://colab.research.google.com/drive/1wW
    Hz_SMCHNuos5fxWRUJTcB6wqkTJQCR

    View Slide

  15. REPRODUCIBILITY
    does not happen
    either by accident
    Interoperability does not
    happen by accident. (Cliff Kottman)

    View Slide

  16. Reproducibility is ´going the extra mile´

    View Slide

  17. Additional NEEDS for reproducible workflows:
    ❖ Open data
    ❖ Open-source
    software / FOSS4G

    View Slide

  18. Additional NEEDS for reproducible workflows:
    ❖ Open data
    ❖ Open-source
    software / FOSS4G
    Reproducibility
    as personal code of conduct

    View Slide

  19. Additional NEEDS for reproducible workflows:
    ❖ Open data
    ❖ Open-source
    software / FOSS4G
    Prioritise
    Interoperability
    of data systems
    Reproducibility
    as personal code of conduct
    API

    View Slide

  20. “Problems cannot be solved
    by the same level of thinking
    that created them!”
    (Albert Einstein)

    View Slide

  21. Thank you!
    Questions?
    Julia Wagemann
    PhD candidate at University of Marburg
    Visiting Scientist at ECMWF
    @JuliaWagemann
    #repwork19
    Reading, 14 Oct 2019

    View Slide