Slide 1

Slide 1 text

Camtrap DP Using Frictionless Standards for a camera trapping data exchange format Frictionless community call 18 November 2021 Peter Desmet 0000-0002-8442-8025

Slide 2

Slide 2 text

Introduction

Slide 3

Slide 3 text

- Open science lab for biodiversity - Based at INBO in Belgium - We support researchers - Open data publication - Research software development Hi 👋 oscibio.inbo.be

Slide 4

Slide 4 text

Biodiversity information - Information about the occurrence of species - What - When - Where - Sources - Citizen scientists - Collections - Monitoring programmes - Sensors inaturalist.org

Slide 5

Slide 5 text

GBIF - Global Biodiversity Information Facility (GBIF) - Open access to biodiversity data - International - Close to 2 billion records - Common data format - Cross-dataset search and API - Creative Commons licensed - Citation tracking gbif.org

Slide 6

Slide 6 text

Darwin Core - Maintained by Biodiversity Information Standards (TDWG) - Common set of terms to facilitate exchange of biodiversity information - Darwin Core Archive: package format for biodiversity information dwc.tdwg.org

Slide 7

Slide 7 text

Camtrap DP

Slide 8

Slide 8 text

- Wildlife monitoring technique - Non-invasive - Well-established - Enables study of animal abundance, distribution, behaviour - Data-intensive: lots of images or videos Camera trapping

Slide 9

Slide 9 text

- Data management platforms - Upload and manage data - Annotate with species identifications (often using image recognition) - Limited data exchange between platforms - Limited data publication from platforms Data are well-managed, not shared Agouti Wildlife Insights TRAP PER eMam mal

Slide 10

Slide 10 text

- Does not capture full scope - Project setup - Camera setup - Blank, vehicle, unknown sequences of images - Star schema too limited to capture all relationships - Camera trap researchers do not recognize data model Darwin Core (Archive)? Agouti Wildlife Insights TRAP PER eMam mal Darwin Core (Archive)?

Slide 11

Slide 11 text

- “Camera Trap Data Package” - Designed to capture all essential data and metadata of a single camera trap study - Model to exchange camera trapping data - Format to exchange camera trapping data Camtrap DP

Slide 12

Slide 12 text

- Metadata about project Camtrap DP model Project / Study img 1 grey heron img 2 grey heron img 3 blank seq 1 moorhen seq 1 coot gbif.org/occurrence/3045046810 gbif.org/occurrence/3045043163 - Deployments: start/end date, location, camera info - Media: file path/url, timestamp, sequence - Observations: blank, or animal of certain species, count, sex, ...

Slide 13

Slide 13 text

- Metadata as datapackage.json - Project metadata - Package structure - Deployments as csv - Media as csv - Observations as csv Camtrap DP format datapackage .json media.csv observations .csv deployments .csv sequenceID mediaID deploymentID deploymentID

Slide 14

Slide 14 text

- Developed by Frictionless Data - Set of open specifications (JSON schemas) that can be combined - Data Package for datasets - Data Resource for data files - Table Schema for table fields - Simple, machine-usable & extensible Using Frictionless Standards specs.frictionlessdata.io

Slide 15

Slide 15 text

- References and extends Data Package - Additional requirements for existing properties - Require contributors, created - Require 3 specific resources - New properties - Organization - Project - Spatial, temporal, taxonomic scope Custom Data Package profile

Slide 16

Slide 16 text

- One for each of the 3 resources - Relationships between resources - Fields - Name - Definition - Type + format - Required - Controlled vocabularies (as enum) - Related definition Three Table Schemas

Slide 17

Slide 17 text

- Link to Camtrap DP profile - 👍 Includes version - 👍 Validates against Camtrap DP - Verbose inclusion of Table Schemas - 👍 Allows to omit & reorder fields - ❌ Does not validate against Camtrap DP! How should datasets reference Camtrap DP? { "id": "https://doi.org/10.5281/zenodo.4893244", "profile": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ camtrap-dp-profile.json", "contributors": {}, "project": {}, "spatial": {}, "temporal": {}, "taxonomic": [], "resources": [ { "name": "deployments", "path": "deployments.csv", "profile": "tabular-data-resource", "schema": { "title": "Deployments", "description": "Table with camera trap deployments. Includes `deployment_id`, start, end, ...", "fields": [ { "name": "deployment_id", "type": "string", "format": "default", "description": "Unique identifier (within a project) of the deployment.", "example": "dep1", "constraints": { "required": true, "unique": true } }, ... ] } }, ... ] }

Slide 18

Slide 18 text

- Link to Camtrap DP profile - 👍 Includes version - 👍 Validates against Camtrap DP - Link to Table Schemas - 👍 Validates against Camtrap DP - ❌ Requires all fields to be included in csv - 😕 Publisher needs to reference 4 schemas How should datasets reference Camtrap DP? { "id": "https://doi.org/10.5281/zenodo.4893244", "profile": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ camtrap-dp-profile.json", "contributors": {}, "project": {}, "spatial": {}, "temporal": {}, "taxonomic": [], "resources": [ { "name": "deployments", "path": "deployments.csv", "profile": "tabular-data-resource", "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ deployments-table-schema.json" }, { "name": "media", "path": "media.csv", "profile": "tabular-data-resource", "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ media-table-schema.json" }, { "name": "observations", "path": "observations.csv", "profile": "tabular-data-resource", "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ observations-table-schema.json" } ] }

Slide 19

Slide 19 text

- First Camtrap DP dataset published on Zenodo - datapackage.json - Not zipped with data: allows to download this file without downloading data - Links to Camtrap DP: no verbose schemas Published dataset doi.org/10.5281/zenodo.4893244

Slide 20

Slide 20 text

Development

Slide 21

Slide 21 text

- Open and versioned on GitHub - Hosts (versions) of Camtrap DP profile and Table Schemas - Includes example dataset for automated testing - Collaborative - Community interacts through issues - Any change requires Pull Request review Camtrap DP development github.com/tdwg/camtrap-dp

Slide 22

Slide 22 text

- Camtrap DP profile and Table Schemas as human-readable documentation - Generated automatically from source files - Petridish Jekyll theme - Specific Jekyll layouts - Hosted using GitHub pages Website tdwg.github.io/camtrap-dp

Slide 23

Slide 23 text

- Camtrap DP profile and Table Schemas as human-readable documentation - Generated automatically from source files: - Petridish Jekyll theme - Specific Jekyll layouts - Hosted using GitHub pages Website ices-tools-dev.github.io/esas

Slide 24

Slide 24 text

- Python: frictionless-py to validate 👌 - Metadata - Structure - Fields - Controlled vocabularies - Relationships - R: more commonly used in biodiversity science frictionless validate datapackage.json # ----- # valid: deployments.csv # ----- # ----- # valid: media.csv # ----- # ----- # valid: observations.csv # ----- Software github.com/frictionlessdata/frictionless-py

Slide 25

Slide 25 text

- Not {datapackage.r} - Plays nice with Tidyverse - Read functionality - read_package(): load profile - read_resource(): load into data frame - Write functionality: to be developed # devtools::install_github("inbo/datapackage") library(datapackage) pkg <- read_package( "https://zenodo.org/record/4893244/files/datapackage.json" ) #> Please make sure you have the right to access data from this Data Package for your proposed use. #> Follow applicable norms or requirements to credit the dataset and its authors. #> For more information, see https://doi.org/10.5281/zenodo.4893244 pkg$resource_names #> [1] "deployments" "multimedia" "observations" read_resource(pkg, "deployments") #> # A tibble: 505 × 16 #> deploymentID locationID locationName #> #> 1 005eaf17-3197-425a-b… 81e247d0-edc9-452… B_ML_val 04_Roes… #> 2 00a2c20d-f038-490c-9… e254a13c-26e8-483… B_HS_val 2_proce… #> 3 00b0ecf3-a098-4e91-9… 9541cd66-93ee-42e… B_DM_val 2_Aloam #> 4 00ce371b-a2b5-4712-b… a934bb70-90d5-440… B_HS_val 6_keers… #> 5 0162ecfb-dc2a-4bc3-a… 91d9abdd-da56-49a… B_ML_val 05_mole… #> 6 01d9f82e-b1e4-4d95-8… ce943ced-1bcf-414… B_DM_val 4_'t WAD #> 7 01dd1933-9738-4859-9… 9541cd66-93ee-42e… B_DM_val 2_Aloam #> 8 01e48853-9ece-4d19-a… 2b477cf0-513d-4bb… B_ML_val 10_Sint… #> 9 01e889b8-a8ae-4f84-8… d2f5034c-3699-4e6… D_MICA 328 #> 10 0432546b-9c28-495f-b… 56128022-6061-4b5… B_ML_val 06_Oost… #> # … with 495 more rows, and 13 more variables: longitude , #> # latitude , start , end , setupBy , #> # cameraID , cameraModel , cameraInterval , #> # cameraHeight , baitUse , featureType , #> # tags , comments datapackage R package github.com/inbo/datapackage

Slide 26

Slide 26 text

Next steps

Slide 27

Slide 27 text

- GPS tracking data for birds: github.com/inbo/bird-tracking - Acoustic telemetry data for fish: github.com/inbo/etn-occurrences - Biological signals in weather radar data: enram.github.io/vpts-dp Frictionless Standards for other biodiversity data zenodo.org/search?q=keywords:%22frictionless data%22

Slide 28

Slide 28 text

- Frictionless Biodiversity data currently not harvested by GBIF - Lossy Darwin Core transformation - Technically possible - Requires community review - Requires (R) software to do it Transformation to Darwin Core doi.org/10.15468/5tb6ze bit.ly/dwc-for-biologging-camtrap-dp

Slide 29

Slide 29 text

- Support Camtrap DP as export format in data management platforms - Expand datapackage R package Software

Slide 30

Slide 30 text

- Allow omitting fields in data (enable schema_sync by default) - Better documentation on how to extend Data Package - Support referencing a single endpoint for community format - Support linking to external vocab (in enum) Suggestions to better support community formats github.com/frictionlessdata/project/ discussions/636

Slide 31

Slide 31 text

- Capture essential data and metadata of a camera trap study - Data exchange model and format - Uses Frictionless Standards - Data Package (extended) - Data Resource - Table Schema (extended) - Open, versioned and collaborative - Suggestions for Frictionless Standards Camtrap DP summary

Slide 32

Slide 32 text

Thank you tdwg.github.io/camtrap-dp Desmet P (2021) Camtrap DP: Using Frictionless Standards for a camera trapping data exchange format. Presentation at the Frictionless community call. https://bit.ly/camtrap-dp-frictionless-2021