Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Science workshop: Create and use GBIF occu...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Open Science workshop: Create and use GBIF occurrence cubes

Hands-on workshop to learn what are GBIF species occurrence cubes, how to download them and how to use them. This workshop and the species occurrence cubes are an output of Horizon Europe project: B-Cubed. License: CC-BY.

Avatar for Damiano Oldoni

Damiano Oldoni

March 31, 2026
Tweet

More Decks by Damiano Oldoni

Other Decks in Research

Transcript

  1. Biodiversity Building Blocks for policy Damiano Oldoni, Ward Langeraert &

    Jasmijn Hillaert INBO Open Science workshop: Create and use GBIF occurrence cubes INBO/31-03-2026/Brussels
  2. Biodiversity Building Blocks for policy Overview • What are species

    occurrence cubes? • GBIF SQL download API • Hands-on: download species occurrence cubes ◦ GBIF web interface ◦ rgbif • Break with 🎂 • Hands-on: use species occurrence cubes
  3. Biodiversity Building Blocks for policy Species Occurrence Cubes An occurrence

    cube is a tab-separated csv file containing species occurrence measures (e.g. a count) summarised by taxonomic, temporal and/or spatial dimensions (e.g. a given year, a specific taxonomic rank, etc). Service has been officially launched by GBIF on March 2025. • Aggregated GBIF occurrence data • you choose the grouping variables • Data are delivered as a GBIF Download • Same delivery method as for occurrences • Findable: DOI • Accessible: GBIF infrastructure • Interoperable: tab-separated csv file • Reproducible: metadata (with query) available
  4. Biodiversity Building Blocks for policy Species occurrence cube = aggregate

    GBIF occurrences A typical cube aggregates occurrences • taxonomically, e.g. species • spatially, e.g. EEA grid 1x1km • temporally, e.g. year Presented at TDWG2020 (see slides, abstract). Preprint (PDF) used in B-Cubed project proposal. Used for calculating emerging trends indicators. Oldoni D, Groom Q, Desmet P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes
  5. Biodiversity Building Blocks for policy year eea_cell_code speciesKey n min_coord_uncertainty

    2014 1kmE3886N3121 2889173 51 10 2014 1kmE3886N3122 2889173 109 10 ... ... ... ... ... 2018 1kmE4047N3067 2889088 1 2828 Aggregate Number of occurrences of a specific taxon in a specific cell and in a specific time interval Derived from Oldoni D, Groom Q, Desmet P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes
  6. Biodiversity Building Blocks for policy Aggregate Extend the cube. Examples:

    - extra measures: what’s the minimum coordinate uncertainty among all the occurrences for that specific year/cell/species? year eea_cell_code speciesKey n min_coord_uncertainty 2014 1kmE3886N3121 2889173 51 10 2014 1kmE3886N3122 2889173 109 10 ... ... ... ... ... 2018 1kmE4047N3067 2889088 1 2828 Derived from Oldoni D, Groom Q, Desmet P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes
  7. Biodiversity Building Blocks for policy Aggregate Extend the cube. Examples:

    - extra measures: what’s the number of occurrences for the same year/cell combination at class level? year eea_cell_code speciesKey n classKey n_class 2014 1kmE3886N3121 2889173 51 220 4890 2014 1kmE3886N3122 2889173 109 220 2901 ... ... ... ... … ... 2018 1kmE4047N3067 2889088 1 220 510 Derived from Oldoni D, Groom Q, Desmet P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes
  8. Biodiversity Building Blocks for policy Aggregate Extend the cube. Examples:

    - extra grouping variable: group also by dataset year eea_cell_code speciesKey datasetKey n 2014 1kmE3886N3121 2889173 7f5e4129-0717-428e-876a- 464fbd5d9a47 51 2014 1kmE3886N3122 2889173 271c444f-f8d8-4986-b748- e7367755c0c1 109 ... ... ... ... ... 2018 1kmE4047N3067 2889088 7f5e4129-0717-428e-876a- 464fbd5d9a47 1 Derived from Oldoni D, Groom Q, Desmet P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes
  9. Biodiversity Building Blocks for policy GBIF SQL download API Occurrence

    SQL Download API allows users: • to query GBIF occurrences using SQL (Structured Query Language). • to select the columns of interest* • to generate summary views of GBIF data* *not possible with the “standard” Predicate Download API See GBIF documentation.
  10. Biodiversity Building Blocks for policy GBIF SQL download API Select

    the columns of interest = return a flat occurrence table Let’s SELECT columns FROM occurrence WHERE conditions Let’s select the unique datasets and publishers with occurrences recorded in Belgium this year: SELECT DISTINCT datasetKey, publishingOrgKey FROM occurrence WHERE countryCode = 'BE' AND "year" = 2026 Result: https://doi.org/10.15468/dl.3z4fr2
  11. Biodiversity Building Blocks for policy GBIF SQL download API Select

    the columns of interest = return a flat occurrence table Let’s SELECT columns FROM occurrence WHERE conditions Real world example. Biologging data from a network: - All organisms for a single year - occurrenceID - organismID - taxonomical information - Spatial information - eventID, parentEventID From workshop: Hip to be cubed: using the new GBIF SQL Download API (Part 1). Huybrechts P, Breugelmans L, Trekels M, Rodrigues A, Blissett M. https://bit.ly/4lSM75n
  12. Biodiversity Building Blocks for policy GBIF SQL download API SELECT

    occurrenceid, organismid, scientificname, taxonkey, eventdate, decimallatitude, decimallongitude, eventid, parenteventid, datasetkey, publisher FROM occurrence WHERE GBIF_STRINGARRAYCONTAINS(occurrence.networkkey, 'ab013f3a-3c00-42cb-9fdb-cb5f4ba20a4b', FALSE) AND occurrence."year" = 2020 AND occurrence.occurrencestatus = 'PRESENT' AND occurrence.basisofrecord = 'MACHINE_OBSERVATION' From workshop: Hip to be cubed: using the new GBIF SQL Download API (Part 1). Huybrechts P, Breugelmans L, Trekels M, Rodrigues A, Blissett M. https://bit.ly/4lSM75n
  13. Biodiversity Building Blocks for policy GBIF SQL download API From

    workshop: Hip to be cubed: using the new GBIF SQL Download API (Part 1). Huybrechts P, Breugelmans L, Trekels M, Rodrigues A, Blissett M. https://bit.ly/4lSM75n
  14. Biodiversity Building Blocks for policy GBIF SQL download API Select

    the columns of interest AND aggregate Let’s SELECT columns FROM occurrence WHERE conditions GROUP BY variables Let’s count the number of occurrences recorded this year in Belgium for each dataset and publisher: SELECT datasetKey, publishingOrgKey , COUNT(*) FROM occurrence WHERE countryCode = 'BE' AND "year" = 2026 GROUP BY datasetKey, publishingOrgKey Result: https://doi.org/10.15468/dl.czvvdp
  15. Biodiversity Building Blocks for policy GBIF SQL download API Select

    the columns of interest AND aggregate Let’s SELECT columns FROM occurrence WHERE conditions GROUP BY dimensions Let’s count the number of occurrences recorded this month in Flanders for each species and day. Only presences (no absences). SELECT species, speciesKey , eventDate, COUNT(*)AS n FROM occurrence WHERE countryCode = 'BE' AND level1gid = 'BEL.2_1' AND "year" = 2026 AND "month" = 3 AND occurrenceStatus = 'PRESENT' GROUP BY species, speciesKey, eventDate Result: https://doi.org/10.15468/dl.cerftr
  16. Biodiversity Building Blocks for policy GBIF SQL download API Select

    the columns of interest AND aggregate Let’s SELECT columns FROM occurrence WHERE conditions GROUP BY dimensions Let’s count the number of occurrences recorded this month in Flanders for each species and day. Only presences (no absences). SELECT species, speciesKey, eventDate, COUNT(*)AS n FROM occurrence WHERE countryCode = 'BE' AND level1gid = 'BEL.2_1' AND "year" = 2026 AND "month" = 3 AND occurrenceStatus = 'PRESENT' GROUP BY species, speciesKey, eventDate Result: We have just created our first species occurrence cube 🫨 A cube with two dimensions: - taxonomic - temporal Ok, we created a square ⃞
  17. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface.
  18. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. • Go to the GBIF occurrence search • occurrence_status=present (already selected by default) • year=2010,2025 • country=BE • Show “All filters” to select Flanders region: gadm_gid=BEL.2_1 • taxon_key=6 (scientificName: Plantae) • coordinate_uncertainty_in_meters=0,1000 (quite precise georeferenced data) • URL occurrence search • Download
  19. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Let’s choose the dimensions
  20. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. • Taxonomic dimension: Species • Temporal dimension: Year • Spatial dimension: EEA reference grid - Europe only; • Spatial resolution: 1km • Randomize points within uncertainty circle: yes
  21. Biodiversity Building Blocks for policy Download species occurrence cubes Randomize

    points within uncertainty circle: why? Directly assigning centroid coordinates to grid can lead to huge spatial bias Oldoni D, Groom Q, Desmet P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes
  22. Biodiversity Building Blocks for policy Oldoni D, Groom Q, Desmet

    P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes Download species occurrence cubes Randomize points within uncertainty circle: why? Directly assigning centroid coordinates to grid can lead to huge spatial bias
  23. Biodiversity Building Blocks for policy Oldoni D, Groom Q, Desmet

    P (2020) https://speakerdeck.com/damianooldoni/occurrence-cubes Download species occurrence cubes Randomize points within uncertainty circle: why? Directly assigning centroid coordinates to grid can lead to huge spatial bias How to assign occurrences to grids ? How to apply randomization? Via special grid functions, e.g. GBIF_EEARGCode. STRING GBIF_EEARGCode(INTEGER gridSize, DOUBLE latitude, DOUBLE longitude, DOUBLE coordinateUncertaintyInMeters) Set to 0 to disable randomization.
  24. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Let’s choose the measures
  25. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. • Occurrence count at higher taxonomic level: from Kingdom up to Genus. Useful to assert sampling bias. • Include minimum coordinate uncertainty: Yes. Useful to assert the spatial precision of the data. • Include minimum temporal uncertainty: Yes. Useful to assert the temporal precision of the data.
  26. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Let’s apply some filters for data quality
  27. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Check that the following filters are checked ✅ • Remove records with geospatial issues • Remove records not confidently matched to a taxon • Remove records at country centroids • Remove records of fossils and living specimens, e.g. those from botanical and zoological gardens Can we download now? NO: let’s Edit as SQL first. Why? Because “for complex queries and aggregations, the SQL editor provides more freedom.” Goal: remove unvalidated records, based on identificationVerificationStatus.
  28. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Start editing
  29. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Remove unvalidated records, based on identificationVerificationStatus Filter stuff in SQL? Add condition in the WHERE section of the SQL query: WHERE ... AND ( LOWER(identificationVerificationStatus) NOT IN ( 'unverified', 'unvalidated', 'not validated', 'under validation', 'not able to validate', 'control could not be conclusive due to insufficient knowledge', 'uncertain', 'unconfirmed', 'unconfirmed - not reviewed', 'validation requested' ) OR identificationVerificationStatus IS NULL )
  30. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Remove unvalidated records, based on identificationVerificationStatus Filter stuff in SQL? Add condition in the WHERE section of the SQL query: WHERE ... AND ( LOWER(identificationVerificationStatus) NOT IN ( 'unverified', 'unvalidated', 'not validated', 'under validation', 'not able to validate', 'control could not be conclusive due to insufficient knowledge', 'uncertain', 'unconfirmed', 'unconfirmed - not reviewed', 'validation requested' ) OR identificationVerificationStatus IS NULL )
  31. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. Stop editing and DOWNLOAD GBIF works… We take a piece of cake 🎂
  32. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using the web interface. GBIF download still under processing? No worries! Give a look to this download and its SQL query.
  33. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using dedicated SQL web interface. Are you a SQL expert, or do you have a good template to reuse? Just start writing SQL directly in the GBIF SQL interface!
  34. Biodiversity Building Blocks for policy Download species occurrence cubes Let’s

    download a species occurrence cube from GBIF using rgbif. You can use {rgbif} to interface with the GBIF SQL download API: • Use the function occ_download_sql(). • Give a look to the “GBIF SQL Downloads” vignette. q <- “this is my SQL query, so it won’t work” occ_download_sql(q) Do you want to know more about creating/importing cubes wih rgbif? Give a look to the presentation from slide 66 (pdf, Google Slides): “The b3verse: an R package suite to process cubes and calculate indicators”, Langeraert W, Dove S, Hillaert J, 2026.
  35. Biodiversity Building Blocks for policy What to do with species

    occurrence cubes? • #occs/year, #cells/year (measured occupancy) • other indicators, see {b3gbi} Code to run (Rmd). Use species occurrence cubes
  36. Biodiversity Building Blocks for policy From: “The b3verse: an R

    package suite to process cubes and calculate indicators”, Langeraert W, Dove S, Hillaert J, 2026. Slide 93. General indicators Centaurea cyanus (c) Kai-Philipp Schablewski CC-BY-NC
  37. Biodiversity Building Blocks for policy From: “The b3verse: an R

    package suite to process cubes and calculate indicators”, Langeraert W, Dove S, Hillaert J, 2026. Slide 95. General indicators Ophrys apifera © mauro_fioretto CC-BY-NC
  38. Biodiversity Building Blocks for policy This project receives funding from

    the European Union’s Horizon Europe Research and Innovation Programme (ID No 101059592). Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the EU nor the EC can be held responsible for them. Thank you! Damiano Oldoni, Ward Langeraert & Jasmijn Hillaert Google Slides, PDF B-Cubed Newsletter @b-cubed.eu B-Cubed Project @BCubedProject b-cubed.eu B-Cubed Project