Save 37% off PRO during our Black Friday Sale! »

Serving large GeoPackage dataset in GeoServer: the OS MasterMap and ZoomStack use case

Serving large GeoPackage dataset in GeoServer: the OS MasterMap and ZoomStack use case

GeoPackage is becoming a pervasive tool to share data among systems. But how well does it transfer meta-information, and how well does it handle large datasets? The presentation will introduce the work GeoSolutions performed during OGC Testbed 16, to answer those questions. In addition to the above, we’ll discuss handling large raster GeoPackages with GeoServer.


Simone Giannecchini

October 04, 2021


  1. Andrea Aime Simone Giannecchini GeoSolutions Serving large geopackage datasets in

    GeoServer The OS MasterMap and ZoomStack use case
  2. GeoSolutions • Offices in Italy & US, Worldwide clients •

    30+ collaborators, 25+ Engineers • Our products • Our Offer Enterprise Support Services Deployment Subscription Professional Training Customized Solutions GeoNode
  3. Affiliations We strongly support Open Source, it Is in our

    core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT
  4. Introduction

  5. OGC Testbed 16 • OGC experiments with new directions and

    standard improvements and in yearly Testbeds • GeoSolutions participated in Testbed 16 GeoPackage thread. Objectives: • Improve contents discoverability through metadata profiles • Improve support for large vector datasets
  6. Deploy architecture WPS WMS gs:GeoPackage GetMap

  7. Contribution • All code changes contributed to GeoTools and GeoServer:

    ◦ Core changes in the GeoTools geopkg module ◦ Experimental extensions in the GeoServer WPS GeoPackage module
  8. Contents discovery improvements

  9. Exporting GeoPackage • Export multiple layers • Both raster and

    vector • Control contents by filtering • Add indexes • The “gs-gpkg” community module contains a process that can:
  10. Exporting meta-information • TB16 added ability to export: • Linked

    Metadata • Dataset provenance • Styles, common operational picture • GeoPackage Extensions used: • Metadata (core) • Portrayal (experimental!) • Semantic annotations (experimental!)
  11. Embedding metadata • Linked metadata is downloaded and embedded for

    offline usage
  12. Embedding metadata Implementation uses the gpkg_metadata and gpgk_metadata_reference

  13. Semantic annotations (SA) • A “semantic annotation” tags a table

    or a row in the table, giving it extra meaning • E.g., can say “this metadata entry is actually a WFS request” • If two items in the database share the same annotation, that also forms an association between them
  14. Provenance, data source • An OWS context document is included,

    with a WFS request to the original server • Annotated as “Dataset provenance” using SA
  15. Provenance, generator • The original WPS request is also included

    • OWS context + SA "type":"Feature", "id": "", "properties":{ "title":"WPS", "updated":"2020-09-29T06:45:44Z", "offerings":[ { "code":"", "operations":[ { "code":"GetCapabilities", "method":"GET", "type":"application/xml", "href":"http://localhost:8081/geoserver/wps?service=WPS&version=1.0&request=GetCapabilities" }, { "code":"DescribeProcess", "method":"GET", "type":"application/xml", "href":"http://localhost:8081/geoserver/wps?service=WPS&version=1.0&request=DescribeProcess&identifier=gs%3AGeoPackage Process" }, { "code":"Execute", "method":"POST", "type":"application/xml", "href":"http://localhost:8081/geoserver/wps", "request":{ "type":"application/xml", "content" : "<?xml version=\"1.0\" encoding=\"UTF-8\"?><wps:Execute xmlns:xs=\"\" xmlns:ows=\"\" xmlns:wps=\"\" xmlns:xlink=\"\" service=\"WPS\" version=\"1.0
  16. Styles: portrayal extension • Adds new tables to contain styles

    and symbols (experimental!)
  17. Styles: multiple encodings • A style can be provided in

    multiple languages • E.g., Mapbox GL styles and SLD • Dedicated table for “stylesheets”
  18. Styles: symbols stored too • Styles might be using symbols

    (e.g., External graphic pointing to “osmmsymbols/culvertSymbol.svg”), those can be embedded too
  19. Styles: layer association • Association? SA to the rescue

  20. Multi-layer maps: COP • GeoServer layer groups provide a reference

    to build a basemap • Multiple layer groups can contain the exported layers (e.g., day mode, night mode) • Groups are exported as Common Operational Picture
  21. Multi-layer maps: COP • OWS contexts in metadata list layers

    with styles, providing map composition and stacking order • SA tags them as COP … { "type":"Feature", "id":" m:topographicarea", "properties":{ "title":"topographicarea", "updated":"2020-09-29T15:37:05Z", "active":true, "offerings":[ { "code":" .2/opt/features", "operations":[ { "code":"GPKG", "method":"SELECT", "type":"SQL Record Set", "href":"test.gpkg", "request": { "type":"SQL Record Set", "content":"SELECT * FROM topographicarea;" } } ], "styles":[ { "name":"osmm:topographicarea-light", "title":"TopographicArea", "abstract":"OS MasterMap Topography Layer. Ordnance Survey. (c) Crown copyright and database rights 2017.", "default":true, "content":{ "type":"SQL Record Set", "content":"SELECT stylesheet, format FROM gpkgext_stylesheets WHERE style_id = (SELECT id FROM gpkgext_styles where style = 'osmm:topographicarea-light');" } } ] } ] } }
  22. Large GeoPackages The MasterMap case

  23. MasterMap Topography • Large vector dataset by Ordnance Survey (UK)

    • 50GB worth of compressed GML files, 300GB as imported in PostGIS • Several attributes have multiplicity greater than 1: arrays in PostgreSQL (e.g., history of changes) • Several attributes are enumerated • Meant to be displayed at 1:4000 and above (not multiscale)
  24. MasterMap Topography • Multiple styles available on GitHub • Resulting

    GeoPackage is very large (250GB): • How to we make it smaller • How do we read it faster
  25. E-n-u-m-e-r-a-t-e-d • Use the core “Schema Extension” to represent enumerated

    attributes. • Pack strings down into simple integers
  26. Arrays (experimental) • Enumerated arrays are saved as JSON1 strings,

    eventually enumerated • Rich set of functions to query and work against them • Also, arrays of enumerated values
  27. Actual size reduction • SQLite is more compact than PostgreSQL

    by way of record storage • Enumerations size reduction: almost 40GB less vs same GeoPackage without enumerations Database Size GB PostgreSQL 300 GeoPackage, no enums 245 GeoPackage, with enums 206
  28. Improve data extraction • Extracting data from this large GeoPackage

    is slow • Visualization at 1:4000, only a tiny part of the whole is needed • Optimize physical structure to improve data access: GeoHash sorting • Identified in the GeoPackage extraction process as “sort on the geometry” <features name="boundaryline" identifier="boundaryline"> <description>boundaryline</description> <srs>EPSG:27700</srs> <featuretype>osmm:boundaryline</featuretype> <sort xmlns:fes=""> <fes:SortProperty> <fes:ValueReference>wkb_geometry</fes:ValueReference> </fes:SortProperty> </sort>
  29. Improve data extraction • Random order vs GeoHash order •

    More effective on spinning disks, but useful on SSDs as well • Records more packed: more efficient transfer and better file system caching
  30. Prove it! GeoHash benchmak • WMS benchmark, GeoHash sorted GPKG

    as GeoServer data source. • Data stored on SSD • Thousands of unique requests • Hot (data cached by OS ) vs cold (OS caches dropped) benchmarks • Always faster, more so if there are no caches • On mobile client, speed ups go up to 50x
  31. Another take: index GeoPackage • Another possible option, split the

    large GeoPackage in parts • Index GeoPackage linking to parts • Index GeoPackage contains all metadata • Parts only data • In this example, split along the 100km UK grid
  32. Index GeoPackage

  33. Large GeoPackages The ZoomStack case

  34. Zoomstack • Distributed by Ordnance Survey • Free GeoPackage, 10GB

    worth of data • Many layers • Deeply multi-scale dataset
  35. Generalized tables extension • When zoomed out, only a small

    part of the table data is used • If the dataset is large enough, indexes alone are not enough • Clone the tables with just the data they need • Widely used approach when rendering OSM from PostGIS
  36. Reducing record count • Each table contains generalized geometries •

    More importantly, it contains less records • More speed-up: • The larger is the table • The wider is the record number difference between base table and generalized table • We measured between 20% and 300x speed-up factors • But also, no speed-up if improperly set up, or bad match for the approach Table name Record count waterlines 2.431.848,00 waterlines_g1 164.208,00 waterlines_g2 12.752,00 waterlines_g3 2.481,00
  37. Large GeoPackages Raster tiles

  38. S2 cloudless • Sentinel 2 cloudless by EOX • One

    raster layer, 541GB • Pyramid with 13 zoom levels • Around 180 million records
  39. S2 cloudless • Access to tiles is indexed and pretty

    fast • Used to be pretty slow: eliminated some aggregation queries that made the code easier to read, but deadly slow on such large tables
  40. Other old-school optimizations

  41. Reading faster • GeoPackage is a SQLite database • Optimizations

    to make reading faster: • Read-only mode (both vector and raster) • Oddly enough memory mapping did not seem to help
  42. Writing faster • Optimizations to make writing faster: • Set-up

    for exclusive access • Disable journal (it’s a one time write, either finishes or the result is thrown away) • Prepared statements and batches of insert statements • Again, memory mapping did not seem to help
  43. Thanks for watching! • Want more details? • Check out

    the engineering report. OGC Testbed-16, GeoPackage Engineering Report
  44. The End Questions?