$30 off During Our Annual Pro Sale. View Details »

How to manage and publish biodiversity data

How to manage and publish biodiversity data

Talk at the BiodivScen Data Management Workshop in Helsinki, Finland - May 14, 2019.

Peter Desmet

May 14, 2019
Tweet

More Decks by Peter Desmet

Other Decks in Science

Transcript

  1. How to manage and
    publish biodiversity data
    BiodivScen Data Management Workshop
    14 May 2019 - Helsinki
    Peter Desmet

    View Slide

  2. Open science lab for biodiversity
    At the Research Institute for Nature and Forest
    (INBO) in Belgium.
    We offer technical support to researchers in the
    projects we collaborate in.
    Our support is mainly focused on open data
    publication and research software development.
    @oscibio

    View Slide

  3. Managing biodiversity data

    View Slide

  4. You’re not alone
    Managing data is hard, but a lot already exists.
    Please don’t reinvent the wheel if you don’t
    have to.

    View Slide

  5. Platforms
    Look for existing platforms
    before creating your own

    View Slide

  6. Many systems exist to manage biological collection
    information - iDigBio.org
    Biological collection databases

    View Slide

  7. Citizen science infrastructure for recording (photo)
    observations: smartphone app, species image
    recognition, community validation - inaturalist.org
    iNaturalist

    View Slide

  8. Customizable and open: create projects, add custom
    fields to observations, API, species image recognition
    iNaturalist

    View Slide

  9. Manage and analyse tracking data, has own
    repository - www.movebank.org
    Movebank

    View Slide

  10. It is expensive to develop and maintain one:
    collaborate with others if you do so!
    No platform yet?

    View Slide

  11. Data standards

    View Slide

  12. TDWG - www.tdwg.org
    Biodiversity Information Standards

    View Slide

  13. TDWG maintained standard to express biodiversity
    information: glossary of terms - dwc.tdwg.org
    Darwin Core

    View Slide

  14. Most popular way to package biodiversity data: data
    as CSV files (core + extensions), metadata as XML
    Darwin Core Archive

    View Slide

  15. Used to standardize information in a field: requires
    community input
    Vocabularies

    View Slide

  16. Not biodiversity oriented, but makes data widely
    compatible - www.opengeospatial.org/standards
    Open Geospatial Consortium

    View Slide

  17. Licenses
    Don’t create your own licence!

    View Slide

  18. Standardized licenses to grant or clarify copyright
    permissions for creative works -
    creativecommons.org
    Creative Commons

    View Slide

  19. Creative Commons Zero is the most appropriate
    license for scientific (biodiversity) data
    CC0 for scientific data

    View Slide

  20. Getting credit for your data is a community and
    technical issue
    Don’t use a license to get credit

    View Slide

  21. Publishing biodiversity data

    View Slide

  22. Find a repository
    To archive your data,
    following the FAIR principles

    View Slide

  23. View Slide

  24. Generic research repository: free, easy to use, close
    to unlimited size, has API, managed by CERN
    Zenodo

    View Slide

  25. To find mostly domain-specific repositories
    (“databases”) and standards - fairsharing.org
    FAIRsharing

    View Slide

  26. Easiest and most interoperable way to publish
    species occurrences and checklists
    Integrated Publishing Toolkit (IPT)

    View Slide

  27. Publishing data
    to the largest
    biodiversity information infrastructure

    View Slide

  28. Global Biodiversity Information Facility
    GBIF - www.gbif.org

    View Slide

  29. Species recorded at a specific place and time
    1.3 billion occurrence records

    View Slide

  30. Human observations: citizen science, monitoring
    Machine observations: GPS tracking, camera traps
    Specimens: preserved, fossil or living collections
    Sampling events: sample with associated
    measurements
    Occurrence data

    View Slide

  31. Taxonomic checklist: synonymy, classification
    Regional checklist: species distribution
    Thematic checklist: species properties (e.g.
    invasive)
    … and species data

    View Slide

  32. Request endorsement to become a data publisher
    Standardize your data into Darwin Core
    Document your data with standardized metadata
    Choose a license: CC0, CC-BY, CC-BY-NC
    Register your dataset to make it discoverable
    How to publish data to GBIF

    View Slide

  33. Make use of the Integrated Publishing Toolkit (IPT).
    Ask national node for existing data hosting centres
    How to publish data to GBIF

    View Slide

  34. Make use of one the platforms that already publishes
    data to GBIF
    How to publish data to GBIF

    View Slide

  35. GBIF services
    What you get in return

    View Slide

  36. Registered datasets get a DOI, are findable through
    GBIF website and API, and citations are tracked
    Discoverability

    View Slide

  37. Registered datasets get a DOI, are findable through
    GBIF website and API, and citations are tracked
    Discoverability

    View Slide

  38. Data search
    Darwin Core standardization allows cross-dataset
    search through website and API

    View Slide

  39. Backbone taxonomy
    All data gets matched to a backbone taxonomy →
    unique ID, higher classification, synonymy resolution

    View Slide

  40. Reproducible downloads
    Any query can be downloaded, gets a DOI and there
    are clear citation guidelines

    View Slide

  41. Distributed, active community supporting data
    publishers and users
    Community

    View Slide

  42. E.g. Tracking Invasive Alien Species (TrIAS) uses
    GBIF as a starting point doi.org/10.15468/xoidmd
    Infrastructure that can be build upon

    View Slide

  43. You’re not alone
    A lot already exists. Please don’t reinvent the
    wheel if you don’t have to.

    View Slide

  44. Thank you!
    @peterdesmet
    Desmet P (2019) How to manage and publish
    biodiversity data http://bit.ly/biodivscen-talk

    View Slide