How to manage and
publish biodiversity data
BiodivScen Data Management Workshop
14 May 2019 - Helsinki
Peter Desmet
Slide 2
Slide 2 text
Open science lab for biodiversity
At the Research Institute for Nature and Forest
(INBO) in Belgium.
We offer technical support to researchers in the
projects we collaborate in.
Our support is mainly focused on open data
publication and research software development.
@oscibio
Slide 3
Slide 3 text
Managing biodiversity data
Slide 4
Slide 4 text
You’re not alone
Managing data is hard, but a lot already exists.
Please don’t reinvent the wheel if you don’t
have to.
Slide 5
Slide 5 text
Platforms
Look for existing platforms
before creating your own
Slide 6
Slide 6 text
Many systems exist to manage biological collection
information - iDigBio.org
Biological collection databases
Slide 7
Slide 7 text
Citizen science infrastructure for recording (photo)
observations: smartphone app, species image
recognition, community validation - inaturalist.org
iNaturalist
Slide 8
Slide 8 text
Customizable and open: create projects, add custom
fields to observations, API, species image recognition
iNaturalist
Slide 9
Slide 9 text
Manage and analyse tracking data, has own
repository - www.movebank.org
Movebank
Slide 10
Slide 10 text
It is expensive to develop and maintain one:
collaborate with others if you do so!
No platform yet?
Slide 11
Slide 11 text
Data standards
Slide 12
Slide 12 text
TDWG - www.tdwg.org
Biodiversity Information Standards
Slide 13
Slide 13 text
TDWG maintained standard to express biodiversity
information: glossary of terms - dwc.tdwg.org
Darwin Core
Slide 14
Slide 14 text
Most popular way to package biodiversity data: data
as CSV files (core + extensions), metadata as XML
Darwin Core Archive
Slide 15
Slide 15 text
Used to standardize information in a field: requires
community input
Vocabularies
Slide 16
Slide 16 text
Not biodiversity oriented, but makes data widely
compatible - www.opengeospatial.org/standards
Open Geospatial Consortium
Slide 17
Slide 17 text
Licenses
Don’t create your own licence!
Slide 18
Slide 18 text
Standardized licenses to grant or clarify copyright
permissions for creative works -
creativecommons.org
Creative Commons
Slide 19
Slide 19 text
Creative Commons Zero is the most appropriate
license for scientific (biodiversity) data
CC0 for scientific data
Slide 20
Slide 20 text
Getting credit for your data is a community and
technical issue
Don’t use a license to get credit
Slide 21
Slide 21 text
Publishing biodiversity data
Slide 22
Slide 22 text
Find a repository
To archive your data,
following the FAIR principles
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
Generic research repository: free, easy to use, close
to unlimited size, has API, managed by CERN
Zenodo
Slide 25
Slide 25 text
To find mostly domain-specific repositories
(“databases”) and standards - fairsharing.org
FAIRsharing
Slide 26
Slide 26 text
Easiest and most interoperable way to publish
species occurrences and checklists
Integrated Publishing Toolkit (IPT)
Slide 27
Slide 27 text
Publishing data
to the largest
biodiversity information infrastructure
Slide 28
Slide 28 text
Global Biodiversity Information Facility
GBIF - www.gbif.org
Slide 29
Slide 29 text
Species recorded at a specific place and time
1.3 billion occurrence records
Slide 30
Slide 30 text
Human observations: citizen science, monitoring
Machine observations: GPS tracking, camera traps
Specimens: preserved, fossil or living collections
Sampling events: sample with associated
measurements
Occurrence data
Slide 31
Slide 31 text
Taxonomic checklist: synonymy, classification
Regional checklist: species distribution
Thematic checklist: species properties (e.g.
invasive)
… and species data
Slide 32
Slide 32 text
Request endorsement to become a data publisher
Standardize your data into Darwin Core
Document your data with standardized metadata
Choose a license: CC0, CC-BY, CC-BY-NC
Register your dataset to make it discoverable
How to publish data to GBIF
Slide 33
Slide 33 text
Make use of the Integrated Publishing Toolkit (IPT).
Ask national node for existing data hosting centres
How to publish data to GBIF
Slide 34
Slide 34 text
Make use of one the platforms that already publishes
data to GBIF
How to publish data to GBIF
Slide 35
Slide 35 text
GBIF services
What you get in return
Slide 36
Slide 36 text
Registered datasets get a DOI, are findable through
GBIF website and API, and citations are tracked
Discoverability
Slide 37
Slide 37 text
Registered datasets get a DOI, are findable through
GBIF website and API, and citations are tracked
Discoverability
Slide 38
Slide 38 text
Data search
Darwin Core standardization allows cross-dataset
search through website and API
Slide 39
Slide 39 text
Backbone taxonomy
All data gets matched to a backbone taxonomy →
unique ID, higher classification, synonymy resolution
Slide 40
Slide 40 text
Reproducible downloads
Any query can be downloaded, gets a DOI and there
are clear citation guidelines
Slide 41
Slide 41 text
Distributed, active community supporting data
publishers and users
Community
Slide 42
Slide 42 text
E.g. Tracking Invasive Alien Species (TrIAS) uses
GBIF as a starting point doi.org/10.15468/xoidmd
Infrastructure that can be build upon
Slide 43
Slide 43 text
You’re not alone
A lot already exists. Please don’t reinvent the
wheel if you don’t have to.
Slide 44
Slide 44 text
Thank you!
@peterdesmet
Desmet P (2019) How to manage and publish
biodiversity data http://bit.ly/biodivscen-talk