Van Daele, Tim Adriaens, Peter Desmet, Quentin Groom Research Institute Nature and Forest (INBO), Belgium Biodiversity data cubes: spatial aggregation and uncertainty Open Earth Monitor – Global workshop 2023/10/06 - Bolzano
rapid, reliable and repeatable biodiversity monitoring data which decision makers can use to evaluate policy. ABOUT Such information – from local to global level and within relevant timescales – calls for an improved integration of data on biodiversity from different sources. B-Cubed is standardising access to biodiversity data empowering policymakers to address the impacts of biodiversity change. Challenges Opportunities Aim
rapid biodiversity data at a low cost, B-Cubed is packaging known methods together into standardised workflows. They can be run by anyone for any region and can be updated according to advances in data, methods and models. WORKFLOWS Repeatable workflows to create data cubes Automated workflows to calculate indicators from biodiversity data cubes Deep-learning to discover long-term spatiotemporal dependencies in species distribution models Exemplar workflows Deep learning Automated workflows
• Address the ongoing biodiversity crisis • Essential Biodiversity Variables ( EBVs): a global system of harmonized observations, Pereira et al. (2013) • Aggregated “data cubes” to build EBVs of species distribution and abundance at a global scale, Kissling et al. (2018) • Repeatable? Scalable? Automated?
of the occurrence of a species (or other taxon) at a particular place on a specified date • Occurrences are events in a 3-dimensional space • Taxonomic (what)
of the occurrence of a species (or other taxon) at a particular place on a specified date • Occurrences are events in a 3-dimensional space • Taxonomic (what) • Temporal (when)
of the occurrence of a species (or other taxon) at a particular place on a specified date • Occurrences are events in a 3-dimensional space • Taxonomic (what) • Temporal (when) • Spatial (where)
cubes • Aggregate occurrences to partition the 3-dimensional space: • Taxonomic (e.g. at species level) • Temporal (e.g. at year level) • Spatial (e.g. at 1x1km level, EEA reference grid)
cubes: step 4 • Aggregate: number of occurrences of a specific taxon in a specific cell and in a specific time interval year eea_cell_code speciesKey n min_coord_uncertainty 2014 1kmE3886N3121 2889173 51 10 2014 1kmE3886N3122 2889173 109 10 ... ... ... ... ... 2018 1kmE4047N3067 2889173 1 2828
cubes: step 4 • Aggregate: number of occurrences of a specific taxon in a specific cell and in a specific time interval year eea_cell_code speciesKey n min_coord_uncertainty 2014 1kmE3886N3121 2889173 51 10 2014 1kmE3886N3122 2889173 109 10 ... ... ... ... ... 2018 1kmE4047N3067 2889173 1 2828
occurrence cube: visualization purposes • Random assignment step generates different cubes from same occurrences • Random assignment means that we cannot blindly create a map from the cube NO!
occurrence cube: visualization purposes • Random assignment step generates different cubes from same occurrences • Random assignment means that we cannot blindly create a map from the cube
occurrence cube: visualization purposes • Random assignment step generates different cubes from same occurrences • Add map of minimum coordinate uncertainty of the grid cells
occurrence cube: data quality filtering • How to deal with the intrinsic spatial uncertainty? • Solution 1: make cubes with precise enough data only (data quality step) • Solution 2: remove cells with “high” min_coord_uncertainty • Downside: enough data left? (Van Eupen, 2021)
occurrence cube: stability of statistics • Random assignment step generates different cubes from same occurrences • How stable are summary statistics such as the observed occupancy, i.e. number of occupied grid cells by a species? • What is the minimum number of cubes needed to robustly infer the average observed occupancy and its uncertainty?
on now • Further study of convergence of observed occupancy on real data and other synthetic data • Preliminary studies: real data seem to converge fast
on now • Random assignment using a different distribution: normal distribution for data acquired with GPS technology, although not strictly a gaussian process (Specht2020)
funding from the European Union’s Horizon Europe Research and Innovation Programme (ID No 101059592). Views and opinions expressed are those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the EU nor the EC can be held responsible for them. Damiano Oldoni Open science lab for biodiversity (oscibio) Research Institute Nature and Forest (INBO) Abstract Slides: pptx, pdf Photo by Viridiflavus - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=4956453 b-cubed.eu @BCubedProject B-Cubed Project