Slide 1

Slide 1 text

Challenges and Needs of REPRODUCIBLE WORKFLOWS of Open Big Weather and Climate Data Julia Wagemann PhD candidate at University of Marburg Visiting Scientist at ECMWF @JuliaWagemann #repwork19 Reading, 14 Oct 2019

Slide 2

Slide 2 text

Basic NEEDS for reproducible workflows: ❖ Open data

Slide 3

Slide 3 text

Basic NEEDS for reproducible workflows: ❖ Open data ❖ Open-source software / FOSS4G

Slide 4

Slide 4 text

Basic NEEDS for reproducible workflows: ❖ Open data ❖ Open-source software / FOSS4G

Slide 5

Slide 5 text

Basic NEEDS for reproducible workflows: ❖ Open data ❖ Open-source software / FOSS4G

Slide 6

Slide 6 text

Islands of Open Big Earth Data

Slide 7

Slide 7 text

Islands of Open Big Earth Data Meteorological / climate community Earth Observation community

Slide 8

Slide 8 text

Islands of Open Big Earth Data Meteorological / climate community Earth Observation community ● Copernicus Climate Data Store ● GRIB, NetCDF ● Google Earth Engine ● GeoTiff, JPEG2000

Slide 9

Slide 9 text

Principle components of a Big Earth Data workflow Access Processing Visualization

Slide 10

Slide 10 text

Reproducibility challenge - Example

Slide 11

Slide 11 text

Reproducibility challenge - DATA ACCESSIBILITY ● different data are accessible via different data access systems ● it is still about downloading data ● community-specific data formats (GRIB, NetCDF, GeoTiff) ● data structure and complexity (analyses vs forecast, multiple dimensions) Access

Slide 12

Slide 12 text

ERA5 surface pressure

Slide 13

Slide 13 text

Reproducibility challenge - DATA ACCESSIBILITY - Question If a process involves data download that might take several months... Do we call it reproducible?

Slide 14

Slide 14 text

One great example of a reproducible workflow... ● ● Processing cdsapi xarray cfgrib xESMF cartopy https://colab.research.google.com/drive/1wW Hz_SMCHNuos5fxWRUJTcB6wqkTJQCR

Slide 15

Slide 15 text

REPRODUCIBILITY does not happen either by accident Interoperability does not happen by accident. (Cliff Kottman)

Slide 16

Slide 16 text

Reproducibility is ´going the extra mile´

Slide 17

Slide 17 text

Additional NEEDS for reproducible workflows: ❖ Open data ❖ Open-source software / FOSS4G

Slide 18

Slide 18 text

Additional NEEDS for reproducible workflows: ❖ Open data ❖ Open-source software / FOSS4G Reproducibility as personal code of conduct

Slide 19

Slide 19 text

Additional NEEDS for reproducible workflows: ❖ Open data ❖ Open-source software / FOSS4G Prioritise Interoperability of data systems Reproducibility as personal code of conduct API

Slide 20

Slide 20 text

“Problems cannot be solved by the same level of thinking that created them!” (Albert Einstein)

Slide 21

Slide 21 text

Thank you! Questions? Julia Wagemann PhD candidate at University of Marburg Visiting Scientist at ECMWF @JuliaWagemann #repwork19 Reading, 14 Oct 2019