Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Crunching Data In GeoServer with Discrete Globa...

Crunching Data In GeoServer with Discrete Global Grid Systems (DGGS)

Discrete Global Grid Systems are a way to tessellate the entire planet into zones sharing similar characteristics, with multiple resolutions to address different precision needs, allowing integration of data coming from different data sources, and on demand analysis of data.

Come to this presentation to have an introduction to the DGGS concepts, learn when they are a good fit for a specific problem, and get an update on their implementation in GeoServer.

Simone Giannecchini

October 04, 2021
Tweet

More Decks by Simone Giannecchini

Other Decks in Technology

Transcript

  1. GeoSolutions • Offices in Italy & US, Worldwide clients •

    30+ collaborators, 25+ Engineers • Our products • Our Offer Enterprise Support Services Deployment Subscription Professional Training Customized Solutions GeoNode
  2. Affiliations We strongly support Open Source, it Is in our

    core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT
  3. DGGS • Discrete Global Grid System (DGGS) • Earth partitioning

    (no overlap). Each partition is called a “zone”. Each zone has a unique identifier. • Zones should have the same area (but not all impl do) • Partitioning has no arbitrary limits (poles, dateline) • Multi-resolution, zones have parent/child relationships
  4. DGGS libraries • A DGGS library implements the geometric structure

    of a particular DGGS. At a minimum: • Conversion between zone identifiers and their polygonal geometry. • From point/polygon to zones • Give a zone parent, children, neighbors • DGGS and libraries used in GeoServer implementation: rHealPix, Uber’s H3 P3 →
  5. rHealPix • Zones are equal area • Parents contain exactly

    9 children • The sum of the children builds the parent • A cell has 4 neighbors, diagonal ones are not considered close • Zone identifiers are easy to reason with • P is parent of P1 • P1 is parent of P12 • Only Python based implementation
  6. H3 • Hexagon based system, with a few pentagons in

    the mix • Each zone has 6 or 7 children • The sum of the children does not make the parent. Not equal area. • Zone neighbors share the same distance, center center • Zone identifiers hard to reason with • 817c3ffffffffff is 807dfffffffffff child • Excellent implementation, bindings in many languages
  7. DGGS “geometry” datastore • First step, encapsulate the DGGS libraries

    behind a common interface • • Then build a GeoServer data store reporting the zone structure and attribute → WMS,WFS! • • Difficulties binding to rHealPix Python implementation: • Used JEP to call onto Python interpreter • Performance and scalability are limited
  8. WFS download • WFS download allows other software to display

    and manipulate DGGS zones In this example the WFS generated a shapefile, which has then been displayed in QGIS
  9. ClickHouse DataStore • DGGS zones count can grow very large

    (100s trillions to cover entire planet at max resolution) • DGGS is especially interesting for analysis • ClickHouse DGGS datastore • OLAP database • Tables partitioned by default, can spread partitions over nodes • Runs queries using all cores and all nodes
  10. Sampling Sentinel 2 • Australian Capital Territory • Sampled Sentinel

    2 at resolution 11 • Stored results in ClickHouse OLAP database
  11. Multi-resolution database • Computed NDVI, NDWI, NDBI • Zone parents

    computed by aggregation (fast) • Multi-resolution representation • Each resolution stored in ClickHouse
  12. DGGS API Data retrieval API reminiscent of OGC API Features,

    with features unique to DGGS https://tb16.geo-solutions.it/geoserver/ogc/dggs/api?f=text%2Fhtml
  13. Zones access • /collections/{collectionId}/zones • Retrieve DGGS zones • “resolution”

    parameter mandatory • Space filtering • By “bbox”, like in OGC API Features (CRS84) • By “geom”, CRS84 polygon • By “zones”, array of zone identifiers (most efficient) https://observablehq.com/@mxfh/iterative-h3-polyfill
  14. Neighboring zones • /collections/{collectionId}/neighbors • Retrieve neighbors of a given

    zone (by id) • Specify an integer search radius Neighbors of “N66” with a search radius of 2 Neighbors of “8075fffffffffff” with a search radius of 2
  15. Parents and childrens • /collections/{collectionId}/parents (up to r=0) • /collections/{collectionId}/children

    (to specified r) Children of “N66” at r=4 Children of “8075fffffffffff” at r=2
  16. Access by point and polygon • /collections/{collectionId}/point (location and target

    resolution) • /collections/{collectionId}/polygon (target res and compaction)
  17. Data Access and Processing API • Another group in TB16

    worked on DAPA • Allows access to data and quick aggregates: • min/max/avg/stddev • by time, area, both • • DGGS extra parameter: resolution • • Spatial filter, polygon or list of DGGS zones • • Turned into ClickHouse queries • • ClickHouse query computation uses all cores/nodes • • Especially fast if the spatial filter is expressed as a set of DGGS zones
  18. On demand accuracy • The “resolution” parameter allows to control

    the trade off between speed and accuracy • Analytics users, such as Jupyter notebooks, can use low resolution for development, and switch up to higher resolution to get final results
  19. Want to know more? • If you’d like to get

    more details about this activity, please lookup and read: • OGC Testbed-16, DGGS and DGGS API ER • OGC Testbed-16, Data Access and Processing Engineering Report https://www.ogc.org/docs/er
  20. Want to try? • Source code under the OGC API

    umbrella: • Part of the “ogcapi” community module