Defining
Dataset specifica-ons
to communicate data quality
Peter Desmet, S-jn Van Hoey, Dimitri Brosens
Slide 2
Slide 2 text
Darwin Core
offers a lot of (necessary) freedom
Slide 3
Slide 3 text
But how do you express more rigorous
requirements?
Slide 4
Slide 4 text
We need
documenta-on
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
Does my dataset
comply?
Slide 8
Slide 8 text
We need
machine-readable
documenta-on
Slide 9
Slide 9 text
YAML
Human & machine-readable
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
Demo
Slide 12
Slide 12 text
Dataset
Slide 13
Slide 13 text
Run data-validator
Slide 14
Slide 14 text
Report
Slide 15
Slide 15 text
Improved dataset
Slide 16
Slide 16 text
Rerun data-validator
Slide 17
Slide 17 text
Specifica-ons for
datasets
Slide 18
Slide 18 text
Specifica-ons for
data publishers
Slide 19
Slide 19 text
Specifica-ons for
data users
Slide 20
Slide 20 text
Specifica-ons for
communi-es
Slide 21
Slide 21 text
Integra-on in
data publica-on workflows
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
Proof of concept
github.com/inbo/data-validator
Examples used in this presenta-on: bit.ly/2h352c8
Slide 24
Slide 24 text
Thanks!
@peterdesmet
@s-jnvanhoey
@dimibro
bit.ly/2h0cDLU
Desmet P, Van Hoey S & Brosens D (2016) Defining dataset specifica-ons to
communicate data quality. hbp://bit.ly/2h0cDLU