Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serving large GeoPackage dataset in GeoServer: the OS MasterMap and ZoomStack use case

Serving large GeoPackage dataset in GeoServer: the OS MasterMap and ZoomStack use case

GeoPackage is becoming a pervasive tool to share data among systems. But how well does it transfer meta-information, and how well does it handle large datasets? The presentation will introduce the work GeoSolutions performed during OGC Testbed 16, to answer those questions. In addition to the above, we’ll discuss handling large raster GeoPackages with GeoServer.

Simone Giannecchini

October 04, 2021

More Decks by Simone Giannecchini

Other Decks in Technology


  1. GeoSolutions • Offices in Italy & US, Worldwide clients •

    30+ collaborators, 25+ Engineers • Our products • Our Offer Enterprise Support Services Deployment Subscription Professional Training Customized Solutions GeoNode
  2. Affiliations We strongly support Open Source, it Is in our

    core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT
  3. OGC Testbed 16 • OGC experiments with new directions and

    standard improvements and in yearly Testbeds • GeoSolutions participated in Testbed 16 GeoPackage thread. Objectives: • Improve contents discoverability through metadata profiles • Improve support for large vector datasets
  4. Contribution • All code changes contributed to GeoTools and GeoServer:

    ◦ Core changes in the GeoTools geopkg module ◦ Experimental extensions in the GeoServer WPS GeoPackage module
  5. Exporting GeoPackage • Export multiple layers • Both raster and

    vector • Control contents by filtering • Add indexes • The “gs-gpkg” community module contains a process that can:
  6. Exporting meta-information • TB16 added ability to export: • Linked

    Metadata • Dataset provenance • Styles, common operational picture • GeoPackage Extensions used: • Metadata (core) • Portrayal (experimental!) • Semantic annotations (experimental!)
  7. Semantic annotations (SA) • A “semantic annotation” tags a table

    or a row in the table, giving it extra meaning • E.g., can say “this metadata entry is actually a WFS request” • If two items in the database share the same annotation, that also forms an association between them
  8. Provenance, data source • An OWS context document is included,

    with a WFS request to the original server • Annotated as “Dataset provenance” using SA
  9. Provenance, generator • The original WPS request is also included

    • OWS context + SA "type":"Feature", "id": "http://www.geoserver.org/wps/geopkg/execute/request/48cd66fe-03c5-412a-8f43-5a95e7312d27", "properties":{ "title":"WPS", "updated":"2020-09-29T06:45:44Z", "offerings":[ { "code":"http://www.opengis.net/spec/owc-geojson/1.0/req/wps", "operations":[ { "code":"GetCapabilities", "method":"GET", "type":"application/xml", "href":"http://localhost:8081/geoserver/wps?service=WPS&version=1.0&request=GetCapabilities" }, { "code":"DescribeProcess", "method":"GET", "type":"application/xml", "href":"http://localhost:8081/geoserver/wps?service=WPS&version=1.0&request=DescribeProcess&identifier=gs%3AGeoPackage Process" }, { "code":"Execute", "method":"POST", "type":"application/xml", "href":"http://localhost:8081/geoserver/wps", "request":{ "type":"application/xml", "content" : "<?xml version=\"1.0\" encoding=\"UTF-8\"?><wps:Execute xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" xmlns:ows=\"http://www.opengis.net/ows/1.1\" xmlns:wps=\"http://www.opengis.net/wps/1.0.0\" xmlns:xlink=\"http://www.w3.org/1999/xlink\" service=\"WPS\" version=\"1.0
  10. Styles: multiple encodings • A style can be provided in

    multiple languages • E.g., Mapbox GL styles and SLD • Dedicated table for “stylesheets”
  11. Styles: symbols stored too • Styles might be using symbols

    (e.g., External graphic pointing to “osmmsymbols/culvertSymbol.svg”), those can be embedded too
  12. Multi-layer maps: COP • GeoServer layer groups provide a reference

    to build a basemap • Multiple layer groups can contain the exported layers (e.g., day mode, night mode) • Groups are exported as Common Operational Picture
  13. Multi-layer maps: COP • OWS contexts in metadata list layers

    with styles, providing map composition and stacking order • SA tags them as COP … { "type":"Feature", "id":"http://www.opengis.net/spec/owc-json/1.0/req/gpkg/osm m:topographicarea", "properties":{ "title":"topographicarea", "updated":"2020-09-29T15:37:05Z", "active":true, "offerings":[ { "code":"http://www.opengis.net/spec/owc-json/1.0/req/gpkg/1 .2/opt/features", "operations":[ { "code":"GPKG", "method":"SELECT", "type":"SQL Record Set", "href":"test.gpkg", "request": { "type":"SQL Record Set", "content":"SELECT * FROM topographicarea;" } } ], "styles":[ { "name":"osmm:topographicarea-light", "title":"TopographicArea", "abstract":"OS MasterMap Topography Layer. Ordnance Survey. (c) Crown copyright and database rights 2017.", "default":true, "content":{ "type":"SQL Record Set", "content":"SELECT stylesheet, format FROM gpkgext_stylesheets WHERE style_id = (SELECT id FROM gpkgext_styles where style = 'osmm:topographicarea-light');" } } ] } ] } }
  14. MasterMap Topography • Large vector dataset by Ordnance Survey (UK)

    • 50GB worth of compressed GML files, 300GB as imported in PostGIS • Several attributes have multiplicity greater than 1: arrays in PostgreSQL (e.g., history of changes) • Several attributes are enumerated • Meant to be displayed at 1:4000 and above (not multiscale)
  15. MasterMap Topography • Multiple styles available on GitHub • Resulting

    GeoPackage is very large (250GB): • How to we make it smaller • How do we read it faster
  16. E-n-u-m-e-r-a-t-e-d • Use the core “Schema Extension” to represent enumerated

    attributes. • Pack strings down into simple integers
  17. Arrays (experimental) • Enumerated arrays are saved as JSON1 strings,

    eventually enumerated • Rich set of functions to query and work against them • Also, arrays of enumerated values
  18. Actual size reduction • SQLite is more compact than PostgreSQL

    by way of record storage • Enumerations size reduction: almost 40GB less vs same GeoPackage without enumerations Database Size GB PostgreSQL 300 GeoPackage, no enums 245 GeoPackage, with enums 206
  19. Improve data extraction • Extracting data from this large GeoPackage

    is slow • Visualization at 1:4000, only a tiny part of the whole is needed • Optimize physical structure to improve data access: GeoHash sorting • Identified in the GeoPackage extraction process as “sort on the geometry” <features name="boundaryline" identifier="boundaryline"> <description>boundaryline</description> <srs>EPSG:27700</srs> <featuretype>osmm:boundaryline</featuretype> <sort xmlns:fes="http://www.opengis.net/fes/2.0"> <fes:SortProperty> <fes:ValueReference>wkb_geometry</fes:ValueReference> </fes:SortProperty> </sort>
  20. Improve data extraction • Random order vs GeoHash order •

    More effective on spinning disks, but useful on SSDs as well • Records more packed: more efficient transfer and better file system caching https://postgis.net/workshops/postgis-intro/clusterindex.html
  21. Prove it! GeoHash benchmak • WMS benchmark, GeoHash sorted GPKG

    as GeoServer data source. • Data stored on SSD • Thousands of unique requests • Hot (data cached by OS ) vs cold (OS caches dropped) benchmarks • Always faster, more so if there are no caches • On mobile client, speed ups go up to 50x
  22. Another take: index GeoPackage • Another possible option, split the

    large GeoPackage in parts • Index GeoPackage linking to parts • Index GeoPackage contains all metadata • Parts only data • In this example, split along the 100km UK grid
  23. Zoomstack • Distributed by Ordnance Survey • Free GeoPackage, 10GB

    worth of data • Many layers • Deeply multi-scale dataset
  24. Generalized tables extension • When zoomed out, only a small

    part of the table data is used • If the dataset is large enough, indexes alone are not enough • Clone the tables with just the data they need • Widely used approach when rendering OSM from PostGIS
  25. Reducing record count • Each table contains generalized geometries •

    More importantly, it contains less records • More speed-up: • The larger is the table • The wider is the record number difference between base table and generalized table • We measured between 20% and 300x speed-up factors • But also, no speed-up if improperly set up, or bad match for the approach Table name Record count waterlines 2.431.848,00 waterlines_g1 164.208,00 waterlines_g2 12.752,00 waterlines_g3 2.481,00
  26. S2 cloudless • Sentinel 2 cloudless by EOX • One

    raster layer, 541GB • Pyramid with 13 zoom levels • Around 180 million records
  27. S2 cloudless • Access to tiles is indexed and pretty

    fast • Used to be pretty slow: eliminated some aggregation queries that made the code easier to read, but deadly slow on such large tables
  28. Reading faster • GeoPackage is a SQLite database • Optimizations

    to make reading faster: • Read-only mode (both vector and raster) • Oddly enough memory mapping did not seem to help
  29. Writing faster • Optimizations to make writing faster: • Set-up

    for exclusive access • Disable journal (it’s a one time write, either finishes or the result is thrown away) • Prepared statements and batches of insert statements • Again, memory mapping did not seem to help
  30. Thanks for watching! • Want more details? • Check out

    the engineering report. OGC Testbed-16, GeoPackage Engineering Report