Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Camtrap DP: Using Frictionless Standards for a ...

Peter Desmet
November 18, 2021

Camtrap DP: Using Frictionless Standards for a camera trapping data exchange format

Talk at the Frictionless community call - November 18, 2021.

Recording: https://youtu.be/Pi_kbQ_KYiM

Peter Desmet

November 18, 2021
Tweet

More Decks by Peter Desmet

Other Decks in Science

Transcript

  1. Camtrap DP Using Frictionless Standards for a camera trapping data

    exchange format Frictionless community call 18 November 2021 Peter Desmet 0000-0002-8442-8025
  2. - Open science lab for biodiversity - Based at INBO

    in Belgium - We support researchers - Open data publication - Research software development Hi 👋 oscibio.inbo.be
  3. Biodiversity information - Information about the occurrence of species -

    What - When - Where - Sources - Citizen scientists - Collections - Monitoring programmes - Sensors inaturalist.org
  4. GBIF - Global Biodiversity Information Facility (GBIF) - Open access

    to biodiversity data - International - Close to 2 billion records - Common data format - Cross-dataset search and API - Creative Commons licensed - Citation tracking gbif.org
  5. Darwin Core - Maintained by Biodiversity Information Standards (TDWG) -

    Common set of terms to facilitate exchange of biodiversity information - Darwin Core Archive: package format for biodiversity information dwc.tdwg.org
  6. - Wildlife monitoring technique - Non-invasive - Well-established - Enables

    study of animal abundance, distribution, behaviour - Data-intensive: lots of images or videos Camera trapping
  7. - Data management platforms - Upload and manage data -

    Annotate with species identifications (often using image recognition) - Limited data exchange between platforms - Limited data publication from platforms Data are well-managed, not shared Agouti Wildlife Insights TRAP PER eMam mal
  8. - Does not capture full scope - Project setup -

    Camera setup - Blank, vehicle, unknown sequences of images - Star schema too limited to capture all relationships - Camera trap researchers do not recognize data model Darwin Core (Archive)? Agouti Wildlife Insights TRAP PER eMam mal Darwin Core (Archive)?
  9. - “Camera Trap Data Package” - Designed to capture all

    essential data and metadata of a single camera trap study - Model to exchange camera trapping data - Format to exchange camera trapping data Camtrap DP
  10. - Metadata about project Camtrap DP model Project / Study

    img 1 grey heron img 2 grey heron img 3 blank seq 1 moorhen seq 1 coot gbif.org/occurrence/3045046810 gbif.org/occurrence/3045043163 - Deployments: start/end date, location, camera info - Media: file path/url, timestamp, sequence - Observations: blank, or animal of certain species, count, sex, ...
  11. - Metadata as datapackage.json - Project metadata - Package structure

    - Deployments as csv - Media as csv - Observations as csv Camtrap DP format datapackage .json media.csv observations .csv deployments .csv sequenceID mediaID deploymentID deploymentID
  12. - Developed by Frictionless Data - Set of open specifications

    (JSON schemas) that can be combined - Data Package for datasets - Data Resource for data files - Table Schema for table fields - Simple, machine-usable & extensible Using Frictionless Standards specs.frictionlessdata.io
  13. - References and extends Data Package - Additional requirements for

    existing properties - Require contributors, created - Require 3 specific resources - New properties - Organization - Project - Spatial, temporal, taxonomic scope Custom Data Package profile
  14. - One for each of the 3 resources - Relationships

    between resources - Fields - Name - Definition - Type + format - Required - Controlled vocabularies (as enum) - Related definition Three Table Schemas
  15. - Link to Camtrap DP profile - 👍 Includes version

    - 👍 Validates against Camtrap DP - Verbose inclusion of Table Schemas - 👍 Allows to omit & reorder fields - ❌ Does not validate against Camtrap DP! How should datasets reference Camtrap DP? { "id": "https://doi.org/10.5281/zenodo.4893244", "profile": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ camtrap-dp-profile.json", "contributors": {}, "project": {}, "spatial": {}, "temporal": {}, "taxonomic": [], "resources": [ { "name": "deployments", "path": "deployments.csv", "profile": "tabular-data-resource", "schema": { "title": "Deployments", "description": "Table with camera trap deployments. Includes `deployment_id`, start, end, ...", "fields": [ { "name": "deployment_id", "type": "string", "format": "default", "description": "Unique identifier (within a project) of the deployment.", "example": "dep1", "constraints": { "required": true, "unique": true } }, ... ] } }, ... ] }
  16. - Link to Camtrap DP profile - 👍 Includes version

    - 👍 Validates against Camtrap DP - Link to Table Schemas - 👍 Validates against Camtrap DP - ❌ Requires all fields to be included in csv - 😕 Publisher needs to reference 4 schemas How should datasets reference Camtrap DP? { "id": "https://doi.org/10.5281/zenodo.4893244", "profile": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ camtrap-dp-profile.json", "contributors": {}, "project": {}, "spatial": {}, "temporal": {}, "taxonomic": [], "resources": [ { "name": "deployments", "path": "deployments.csv", "profile": "tabular-data-resource", "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ deployments-table-schema.json" }, { "name": "media", "path": "media.csv", "profile": "tabular-data-resource", "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ media-table-schema.json" }, { "name": "observations", "path": "observations.csv", "profile": "tabular-data-resource", "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/0.1.6/ observations-table-schema.json" } ] }
  17. - First Camtrap DP dataset published on Zenodo - datapackage.json

    - Not zipped with data: allows to download this file without downloading data - Links to Camtrap DP: no verbose schemas Published dataset doi.org/10.5281/zenodo.4893244
  18. - Open and versioned on GitHub - Hosts (versions) of

    Camtrap DP profile and Table Schemas - Includes example dataset for automated testing - Collaborative - Community interacts through issues - Any change requires Pull Request review Camtrap DP development github.com/tdwg/camtrap-dp
  19. - Camtrap DP profile and Table Schemas as human-readable documentation

    - Generated automatically from source files - Petridish Jekyll theme - Specific Jekyll layouts - Hosted using GitHub pages Website tdwg.github.io/camtrap-dp
  20. - Camtrap DP profile and Table Schemas as human-readable documentation

    - Generated automatically from source files: - Petridish Jekyll theme - Specific Jekyll layouts - Hosted using GitHub pages Website ices-tools-dev.github.io/esas
  21. - Python: frictionless-py to validate 👌 - Metadata - Structure

    - Fields - Controlled vocabularies - Relationships - R: more commonly used in biodiversity science frictionless validate datapackage.json # ----- # valid: deployments.csv # ----- # ----- # valid: media.csv # ----- # ----- # valid: observations.csv # ----- Software github.com/frictionlessdata/frictionless-py
  22. - Not {datapackage.r} - Plays nice with Tidyverse - Read

    functionality - read_package(): load profile - read_resource(): load into data frame - Write functionality: to be developed # devtools::install_github("inbo/datapackage") library(datapackage) pkg <- read_package( "https://zenodo.org/record/4893244/files/datapackage.json" ) #> Please make sure you have the right to access data from this Data Package for your proposed use. #> Follow applicable norms or requirements to credit the dataset and its authors. #> For more information, see https://doi.org/10.5281/zenodo.4893244 pkg$resource_names #> [1] "deployments" "multimedia" "observations" read_resource(pkg, "deployments") #> # A tibble: 505 × 16 #> deploymentID locationID locationName #> <chr> <chr> <chr> #> 1 005eaf17-3197-425a-b… 81e247d0-edc9-452… B_ML_val 04_Roes… #> 2 00a2c20d-f038-490c-9… e254a13c-26e8-483… B_HS_val 2_proce… #> 3 00b0ecf3-a098-4e91-9… 9541cd66-93ee-42e… B_DM_val 2_Aloam #> 4 00ce371b-a2b5-4712-b… a934bb70-90d5-440… B_HS_val 6_keers… #> 5 0162ecfb-dc2a-4bc3-a… 91d9abdd-da56-49a… B_ML_val 05_mole… #> 6 01d9f82e-b1e4-4d95-8… ce943ced-1bcf-414… B_DM_val 4_'t WAD #> 7 01dd1933-9738-4859-9… 9541cd66-93ee-42e… B_DM_val 2_Aloam #> 8 01e48853-9ece-4d19-a… 2b477cf0-513d-4bb… B_ML_val 10_Sint… #> 9 01e889b8-a8ae-4f84-8… d2f5034c-3699-4e6… D_MICA 328 #> 10 0432546b-9c28-495f-b… 56128022-6061-4b5… B_ML_val 06_Oost… #> # … with 495 more rows, and 13 more variables: longitude <dbl>, #> # latitude <dbl>, start <dttm>, end <dttm>, setupBy <chr>, #> # cameraID <chr>, cameraModel <chr>, cameraInterval <dbl>, #> # cameraHeight <dbl>, baitUse <fct>, featureType <fct>, #> # tags <chr>, comments <chr> datapackage R package github.com/inbo/datapackage
  23. - GPS tracking data for birds: github.com/inbo/bird-tracking - Acoustic telemetry

    data for fish: github.com/inbo/etn-occurrences - Biological signals in weather radar data: enram.github.io/vpts-dp Frictionless Standards for other biodiversity data zenodo.org/search?q=keywords:%22frictionless data%22
  24. - Frictionless Biodiversity data currently not harvested by GBIF -

    Lossy Darwin Core transformation - Technically possible - Requires community review - Requires (R) software to do it Transformation to Darwin Core doi.org/10.15468/5tb6ze bit.ly/dwc-for-biologging-camtrap-dp
  25. - Support Camtrap DP as export format in data management

    platforms - Expand datapackage R package Software
  26. - Allow omitting fields in data (enable schema_sync by default)

    - Better documentation on how to extend Data Package - Support referencing a single endpoint for community format - Support linking to external vocab (in enum) Suggestions to better support community formats github.com/frictionlessdata/project/ discussions/636
  27. - Capture essential data and metadata of a camera trap

    study - Data exchange model and format - Uses Frictionless Standards - Data Package (extended) - Data Resource - Table Schema (extended) - Open, versioned and collaborative - Suggestions for Frictionless Standards Camtrap DP summary
  28. Thank you tdwg.github.io/camtrap-dp Desmet P (2021) Camtrap DP: Using Frictionless

    Standards for a camera trapping data exchange format. Presentation at the Frictionless community call. https://bit.ly/camtrap-dp-frictionless-2021