Slide 1

Slide 1 text

How to manage and publish biodiversity data BiodivScen Data Management Workshop 14 May 2019 - Helsinki Peter Desmet

Slide 2

Slide 2 text

Open science lab for biodiversity At the Research Institute for Nature and Forest (INBO) in Belgium. We offer technical support to researchers in the projects we collaborate in. Our support is mainly focused on open data publication and research software development. @oscibio

Slide 3

Slide 3 text

Managing biodiversity data

Slide 4

Slide 4 text

You’re not alone Managing data is hard, but a lot already exists. Please don’t reinvent the wheel if you don’t have to.

Slide 5

Slide 5 text

Platforms Look for existing platforms before creating your own

Slide 6

Slide 6 text

Many systems exist to manage biological collection information - iDigBio.org Biological collection databases

Slide 7

Slide 7 text

Citizen science infrastructure for recording (photo) observations: smartphone app, species image recognition, community validation - inaturalist.org iNaturalist

Slide 8

Slide 8 text

Customizable and open: create projects, add custom fields to observations, API, species image recognition iNaturalist

Slide 9

Slide 9 text

Manage and analyse tracking data, has own repository - www.movebank.org Movebank

Slide 10

Slide 10 text

It is expensive to develop and maintain one: collaborate with others if you do so! No platform yet?

Slide 11

Slide 11 text

Data standards

Slide 12

Slide 12 text

TDWG - www.tdwg.org Biodiversity Information Standards

Slide 13

Slide 13 text

TDWG maintained standard to express biodiversity information: glossary of terms - dwc.tdwg.org Darwin Core

Slide 14

Slide 14 text

Most popular way to package biodiversity data: data as CSV files (core + extensions), metadata as XML Darwin Core Archive

Slide 15

Slide 15 text

Used to standardize information in a field: requires community input Vocabularies

Slide 16

Slide 16 text

Not biodiversity oriented, but makes data widely compatible - www.opengeospatial.org/standards Open Geospatial Consortium

Slide 17

Slide 17 text

Licenses Don’t create your own licence!

Slide 18

Slide 18 text

Standardized licenses to grant or clarify copyright permissions for creative works - creativecommons.org Creative Commons

Slide 19

Slide 19 text

Creative Commons Zero is the most appropriate license for scientific (biodiversity) data CC0 for scientific data

Slide 20

Slide 20 text

Getting credit for your data is a community and technical issue Don’t use a license to get credit

Slide 21

Slide 21 text

Publishing biodiversity data

Slide 22

Slide 22 text

Find a repository To archive your data, following the FAIR principles

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

Generic research repository: free, easy to use, close to unlimited size, has API, managed by CERN Zenodo

Slide 25

Slide 25 text

To find mostly domain-specific repositories (“databases”) and standards - fairsharing.org FAIRsharing

Slide 26

Slide 26 text

Easiest and most interoperable way to publish species occurrences and checklists Integrated Publishing Toolkit (IPT)

Slide 27

Slide 27 text

Publishing data to the largest biodiversity information infrastructure

Slide 28

Slide 28 text

Global Biodiversity Information Facility GBIF - www.gbif.org

Slide 29

Slide 29 text

Species recorded at a specific place and time 1.3 billion occurrence records

Slide 30

Slide 30 text

Human observations: citizen science, monitoring Machine observations: GPS tracking, camera traps Specimens: preserved, fossil or living collections Sampling events: sample with associated measurements Occurrence data

Slide 31

Slide 31 text

Taxonomic checklist: synonymy, classification Regional checklist: species distribution Thematic checklist: species properties (e.g. invasive) … and species data

Slide 32

Slide 32 text

Request endorsement to become a data publisher Standardize your data into Darwin Core Document your data with standardized metadata Choose a license: CC0, CC-BY, CC-BY-NC Register your dataset to make it discoverable How to publish data to GBIF

Slide 33

Slide 33 text

Make use of the Integrated Publishing Toolkit (IPT). Ask national node for existing data hosting centres How to publish data to GBIF

Slide 34

Slide 34 text

Make use of one the platforms that already publishes data to GBIF How to publish data to GBIF

Slide 35

Slide 35 text

GBIF services What you get in return

Slide 36

Slide 36 text

Registered datasets get a DOI, are findable through GBIF website and API, and citations are tracked Discoverability

Slide 37

Slide 37 text

Registered datasets get a DOI, are findable through GBIF website and API, and citations are tracked Discoverability

Slide 38

Slide 38 text

Data search Darwin Core standardization allows cross-dataset search through website and API

Slide 39

Slide 39 text

Backbone taxonomy All data gets matched to a backbone taxonomy → unique ID, higher classification, synonymy resolution

Slide 40

Slide 40 text

Reproducible downloads Any query can be downloaded, gets a DOI and there are clear citation guidelines

Slide 41

Slide 41 text

Distributed, active community supporting data publishers and users Community

Slide 42

Slide 42 text

E.g. Tracking Invasive Alien Species (TrIAS) uses GBIF as a starting point doi.org/10.15468/xoidmd Infrastructure that can be build upon

Slide 43

Slide 43 text

You’re not alone A lot already exists. Please don’t reinvent the wheel if you don’t have to.

Slide 44

Slide 44 text

Thank you! @peterdesmet Desmet P (2019) How to manage and publish biodiversity data http://bit.ly/biodivscen-talk