Upgrade to Pro — share decks privately, control downloads, hide ads and more …

20 years of Open Biodiversity Data in Belgium. What's Next?

20 years of Open Biodiversity Data in Belgium. What's Next?

Keynotes presentation at Open Belgium 2021.

André Heughebaert

March 04, 2021
Tweet

More Decks by André Heughebaert

Other Decks in Science

Transcript

  1. 20 YEARS OF OPEN BIODIVERSITY DATA IN 20 YEARS OF

    OPEN BIODIVERSITY DATA IN BELGIUM. WHAT'S NEXT? BELGIUM. WHAT'S NEXT? André Heughebaert - Belgian Biodiversity Platform Keynote presentation 1 / 58
  2. 1. Introduction 2. How we made it? 3. Where are

    we today? 4. What's next? 5. Further readings 2 / 58
  3. Biodiversity "biological variety and variability of life on Earth. Biodiversity

    is typically a measure of variation at the genetic, species, and ecosystem level. Biodiversity is not distributed evenly on Earth, and is richer in the tropics." 4 / 58
  4. Occurrence "the fact of something existing or being found in

    a place or under a particular set of conditions." 5 / 58
  5. Evidence "the available body of facts or information indicating whether

    a belief or proposition is true or valid." 6 / 58
  6. 1996 HTTP 1.0 speci cations 1999 the OECD Megascience Forum

    recommended the establishment of an organization such as 2001 Establishment of the Global Biodiversity Information Facility 2004 GBIF OECD's Declaration on Access to Research Data from Public Funding 7 / 58
  7. "enable users to navigate and put to use vast quantities

    of biodiversity information, advancing scienti c research ... serving the economic and quality-of-life interests of society, and providing a basis from which our knowledge of the natural world can grow rapidly and in a manner that avoids duplication of effort and expenditure." (OECD 1999) 8 / 58
  8. 2009 DarwinCore standard rati ed by biodiversity Informatics 2012 Global

    Biodiversity Informatics Conference (GBIC) 2014 GBIF Governing Board adopt CC-0, CC-BY & CC-BY-NC licences 2015 IPT v2.2 introduces the use of Digital Object Identi es or for datasets 2021 Open Belgium: Let's make Belgian knowledge open, usable, useful and used TDWG DOIs 9 / 58
  9. CODATA Twenty year review of GBIF Conclusions: "First of all,

    our ndings show that GBIF is the most comprehensive, openly available, application-agnostic (most unbiased), easiest-to-use, and modern access point to known digital occurrence data." 11 / 58
  10. Technical Barriers Poor Internet availability and limited bandwith Heterogenous character

    encoding No Common data format HTTP protocol limitations 12 / 58
  11. Educational Barriers Scientists rewarded for their publications in journals Science

    Data disregarded as challenging and of lesser importance most scientists lack the basic IT skills (eg programming, GIS or SQL) 13 / 58
  12. Cultural Barriers my little precious syndrom No time to explain/describe

    my data No incentives to publish my raw data No one will ever consider re-using my data Artisanal digitization efforts of collections 14 / 58
  13. Overcoming the barriers Lower the technical threshold Capacity building amongst

    all continents Invest in people skills Advocate Open Data & Science Opportunistic approach ( rst come rst served) Offer IT support and dedicated portals Tell success stories 16 / 58
  14. Some of our Belgian success stories (2007-2019) (2010) (2012) (2014)

    (2018) (2017-2020) African Mammalia Gracillariidae Formicidae Atlas IFBL Catalogue of Belgian Lepidoptera TrIAS 17 / 58
  15. Open...Usable...Useful...Used Distributed network: Data Publishers hosting IPTs(Integrated Publishing Toolkit) National

    Nodes for technical support Centralized Registry at Secretariat Final users discover, explore and download data "freely available or accessible; unrestricted." 19 / 58
  16. Open...Usable...Useful...Used Describing Entities and Attributes: Map proprietary databases into DarwinCore

    concepts DwC Terms and closed vocabulary standardized by the community Datasets packed in DwC archive les "able or t to be used." 21 / 58
  17. Open...Usable...Useful...Used DwC star schema data model EML (Ecological Metadata Language)

    Taxonomic, Geographical and Time coverages CC Licenses CC-0, CC-BY, (CC-BY-NC) Authors preferred citation Authors emails for users feedback "able to be used for a practical purpose or in several ways." 24 / 58
  18. Open...Usable...Useful...ReUsed Various ways to discover, explore and download Global Data

    portal GBIF.org Thematic and/or National Data portals JSON webservices , libraries reuse: "use again or more than once." Rgbif pygbif 26 / 58
  19. Open...Usable...Useful...ReUsed in Python pip install pygbif from pygbif import occurrences

    as occ occ.search(taxonKey \= 3329049) occ.get(key \= 252408386) occ.count(isGeoreferenced \= True) occ.download('basisOfRecord = LITERATURE') occ.download('taxonKey = 3119195') occ.download('decimalLatitude > 50') occ.download\_list(user \= "sckott", limit \= 5) occ.download\_meta(key \= "0000099-140929101555934") occ.download\_get("0000066-140928181241064") 27 / 58
  20. Open...Usable...Useful...ReUsed with R library(rgbif) gbif_download = occ_download( type="and", pred("taxonKey", 5052020),

    # Calopteryx xanthostoma pred("hasGeospatialIssue", FALSE), pred("hasCoordinate", TRUE), pred_gte("year", 1900), pred_not(pred("basisOfRecord", "FOSSIL_SPECIMEN")), pred_not(pred("basisOfRecord", "LIVING_SPECIMEN")), pred_not(pred("establishmentMeans","MANAGED")), pred_not(pred("establishmentMeans","INTRODUCED")), pred_not(pred("establishmentMeans","INVASIVE")), pred_not(pred("establishmentMeans","NATURALISED")), pred_or( pred_lt("coordinateUncertaintyInMeters",10000), pred_not(pred_notnull("coordinateUncertaintyInMeters")) # keep nu ) 28 / 58
  21. Open...Usable...Useful...ReUsed Citation mechanism: Each datatset gets a Digital Object Identi

    er Each download gets its own DOI Trace usage back to original datasets Allow litterature usage tracking 29 / 58
  22. What's next? (Data) Fill data gaps eg non-linnean taxonomy Automated

    harvesting of academic data Prepare the infrastructure for the data deluge Modern packaging eg or More strict, still more exible standards Frictionless Data W3C linked data 46 / 58
  23. What's next? (Information) AI/ML interpreted/enhanced (meta)data Allow users annotations Link

    species occurrences with related entities : people, institutions, literature, land use, ecology, legislation... 47 / 58
  24. What's next? (Knowledge) Out of the biodiversity silo eg through

    Address the big societal challenges (Health, Pandemics, Food security, Climate change...) UN EOSC Sustainable Development Goals 48 / 58
  25. What's next? (globally) Truely global coverage Be more inclusive :

    non-linnean Taxonomy Indigenous & Traditional Knowledge Be part of the bigger puzzle Alliance for Biodiversity Knowlegde Adapt funding schema and governance 50 / 58
  26. What's next? (Belgium) Data coverage: reduce North/South fracture Be more

    inclusive : non-linnean Taxonomy non-traditional data sources Be part of the bigger puzzle in EU DiSSCo, LifeWatch, EOSC... 51 / 58
  27. 5. FURTHER READINGS 5. FURTHER READINGS Data integration enables global

    biodiversity synthesis Global Biodiversity Informatics Outlook: Delivering biodiversity knowledge in the information age Twenty-Year Review of GBIF (CODATA, Paris, 2020) GBIF Science Review 2020 53 / 58
  28. Heberling JM, Miller JT, Noesgaard D, Weingart SB and Schigel

    D ( Data integration enables global biodiversity synthesis. Proceedings of the National Academy of Sciences. Proceedings of t Available at: https://doi.org/10.1073/pnas.2018093118. 54 / 58
  29. Hobern D, Apostolico A, Arnaud E, Bello JC, Canhos D,

    Dubois G, F Global Biodiversity Informatics Outlook: Delivering biodiversity Copenhagen: Global Biodiversity Information Facility. Available at: (https://doi.org/10.15468/6jxa-yb44). 55 / 58
  30. GBIF Science Review 2020 GBIF Secretariat. (2021). GBIF Science Review

    2020. https://doi.org/10.35035/bezp-jj23 57 / 58
  31. THANK YOU THANK YOU Time for your questions - André

    Heughebaert Belgian Biodiversity Platform 58 / 58