Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generating reproducible workflows for the publication of open and FAIR data

Peter Desmet
December 12, 2019

Generating reproducible workflows for the publication of open and FAIR data

Talk at Research Data Management & Data Stewardship: much more than a FAIRytale in Brussels, Belgium - December 12, 2019.

Peter Desmet

December 12, 2019
Tweet

More Decks by Peter Desmet

Other Decks in Science

Transcript

  1. 12 December 2019, Brussels
    Peter Desmet & Lien Reyserhove
    Generating repeatable workflows
    for the publication of open and FAIR data

    View full-size slide

  2. Research data management

    View full-size slide

  3. Open data publication

    View full-size slide

  4. Research software development

    View full-size slide

  5. We want to study
    (invasive) alien species
    to inform/guide environmental policy

    View full-size slide

  6. Reporting on invasive alien species

    View full-size slide

  7. •  What species?
    •  Where are they?
    •  How are they getting here?
    •  What is their impact?
    •  Future distributions?
    •  Future impact?
    Alien species in Belgium

    View full-size slide

  8. What species
    are alien in Belgium?

    View full-size slide

  9. Let’s check the
    Alien species checklist
    for Belgium

    View full-size slide

  10. We don’t have one!
    And certainly not one that is
    verified, open and FAIR

    View full-size slide

  11. We do have
    A number of authoritative checklist
    with a more specialized scope

    View full-size slide

  12. Data for alien plants

    View full-size slide

  13. Data for alien plants

    View full-size slide

  14. Open & FAIR?
    Checklist Open Findable Accessible
    Inter-
    operable
    Reusable

    View full-size slide

  15. Data for alien molluscs

    View full-size slide

  16. Open & FAIR?
    Checklist Open Findable Accessible
    Inter-
    operable
    Reusable

    View full-size slide

  17. Checklist Open Findable Accessible
    Inter-
    operable
    Reusable
    How to go from …

    View full-size slide

  18. Checklist Open Findable Accessible
    Inter-
    operable
    Reusable
    … to open & FAIR data?

    View full-size slide

  19. Checklist Open Findable Accessible
    Inter-
    operable
    Reusable
    … to open & FAIR data?
    Unified

    View full-size slide

  20. TrIAS data publication
    workflow

    View full-size slide

  21. Workflow
    1.  Data management Tidy data

    View full-size slide

  22. Authors can manage their own data

    View full-size slide

  23. Tidy data (Wickham 2014)

    View full-size slide

  24. Tidy data (Wickham 2014)
    Each row is an observation

    View full-size slide

  25. Each column is a variable
    Tidy data (Wickham 2014)

    View full-size slide

  26. Tidy data (Wickham 2014)
    Each table is an observational unit

    View full-size slide

  27. Setup a repository

    View full-size slide

  28. Template structure

    View full-size slide

  29. Upload raw data

    View full-size slide

  30. Workflow
    1.  Data management
    2.  Standardization
    Tidy data
    Interoperable

    View full-size slide

  31. Reproducible data transformation

    View full-size slide

  32. Literate programming (R Markdown)

    View full-size slide

  33. Generate standardized data

    View full-size slide

  34. Collaborative

    View full-size slide

  35. Workflow
    1.  Data management
    2.  Standardization
    3.  Documentation
    Tidy data
    Interoperable
    Understandable

    View full-size slide

  36. Documenting with metadata

    View full-size slide

  37. Bringing it all together

    View full-size slide

  38. Bringing it all together

    View full-size slide

  39. Workflow
    1.  Data management
    2.  Standardization
    3.  Documentation
    4.  Publication
    Tidy data
    Interoperable
    Understandable
    Open

    View full-size slide

  40. Publishing data

    View full-size slide

  41. Published data

    View full-size slide

  42. Published data

    View full-size slide

  43. Workflow
    1.  Data management
    2.  Standardization
    3.  Documentation
    4.  Publication
    5.  Registration
    Tidy data
    Interoperable
    Understandable
    Open
    FAIR

    View full-size slide

  44. Global Biodiversity Information Facility

    View full-size slide

  45. Registering a dataset with GBIF

    View full-size slide

  46. Dataset on GBIF

    View full-size slide

  47. FAIR metadata

    View full-size slide

  48. Alien molluscs

    View full-size slide

  49. Checklist Open Findable Accessible
    Inter-
    operable
    Reusable
    FAIR datasets

    View full-size slide

  50. Going even further

    View full-size slide

  51. Imagine a future where dynamically, from year to year, we can
    track the progression of alien species (AS), identify emerging
    species, assess their current and future risk and timely inform
    policy in a seamless data-driven workflow. One that is built on
    open science and open data infrastructures. By using
    international biodiversity standards and facilities, we would
    ensure interoperability, repeatability and sustainability. This
    would make the process adaptable to future requirements in an
    evolving IAS policy landscape both locally and internationally.
    Mission

    View full-size slide

  52. Checklist Open Findable Accessible
    Inter-
    operable
    Reusable
    Creating a unified checklist
    Unified

    View full-size slide

  53. Multiple checklists on GBIF

    View full-size slide

  54. Using GBIF as an infrastructure

    View full-size slide

  55. Repeatable process

    View full-size slide

  56. Documented process

    View full-size slide

  57. FAIR unified checklist

    View full-size slide

  58. We now have an
    Alien species checklist
    for Belgium

    View full-size slide

  59. Going even further

    View full-size slide

  60. Checklist-based indicators

    View full-size slide

  61. Open occurrence data

    View full-size slide

  62. Occurrence-based indicators

    View full-size slide

  63. Reproducible, open & fair

    View full-size slide

  64. trias-project.be

    View full-size slide

  65. Thank you!
    Peter Desmet & Lien Reyserhove (2019) Generating reproducible workflows for
    the publication of open and FAIR data. Presentation. http://bit.ly/trias-open-fair
    @trias_project Tracking Invasive Alien Species (TrIAS)
    trias-project.be
    @oscibio Open science lab for biodiversity
    oscibio.inbo.be
    @peterdesmet

    View full-size slide