Scaling GeoServer in the cloud clustering state of the art - FOSS4G 2023

Slide 1

Slide 1 text

Andrea Aime Simone Giannecchini GeoSolutions Scaling GeoServer in the cloud: clustering state of the art

Slide 2

Slide 2 text

GeoSolutions Enterprise Support Services Deployment Subscription Professional Training Customized Solutions GeoNode • Offices in Italy & US, Global Clients/Team • 30+ collaborators, 25+ Engineers • Our products • Our Offer

Slide 3

Slide 3 text

Affiliations We strongly support Open Source, it Is in our core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT

Slide 4

Slide 4 text

Introduction

Slide 5

Slide 5 text

GeoServer, can in run in the cloud • Not cloud native • But ready to deploy in the cloud • Databases • Blob storage support • COG • GeoServer cloud: kubernetes, microservice and more cloud readiness facilities → see dedicated presentation

Slide 6

Slide 6 text

GeoServer clustering • OGC services are (mostly!) stateless • Exception(s) • WPS async requests • Importer async requests • Do we need a clustering plugin? • Most of the time, no

Slide 7

Slide 7 text

Static configuration case

Slide 8

Slide 8 text

The backoffice/production model • Backoffice environment (aka TEST or STG) • Set up new layers • Design their styles for optimal output and performance • Setup metadata and descriptions correctly • Test everything carefully before going live

Slide 9

Slide 9 text

The backoffice/production model • Production environment • Static, shared configuration • For the few async requests, shared state in external database • Auto-scale as you see fit

Slide 10

Slide 10 text

Putting it all together • Version control data directory (checkout when ready): git, svn, whatever • Rolling reload production

Slide 11

Slide 11 text

But pay attention to • Keep data separate from configuration • Keep logs separate, and each node uses a different file • Number of layers, rolling restart can take some time DATA LOGS TILE CACHES CONFIGURATION ENV PARAM

Slide 12

Slide 12 text

Tile cache deployment, opt 1 GS/GWC GS/GWC GS/GWC Shared file system GS/GWC Seeding • Most relaxed layout • Shared filesystem can have issues with heavy concurrent writers • Can work, if the cluster mostly reads • Separate, dedicated machines for focused seeding (temporary docker images)

Slide 13

Slide 13 text

Tile cache deployment, opt 2 GS/GWC FS • Layout useful for short lived caches and fragile network filesystems • Duplicates work to get better stability • Common cases, weather forecast Memory cache GS/GWC FS Memory cache GS/GWC FS Memory cache Load balancer

Slide 14

Slide 14 text

Tile cache deployment, opt 3 GWC Dedicated filesystem • Layout useful for few layers (GWC config is XML files) • Optimal hardware usage • Double configuration effort (automate using REST) GS GS GS Load balancer • Want to have GWC read configuration directly from GeoServer instead? Good idea, funding wanted!

Slide 15

Slide 15 text

No no, I need to change config in production all the time!

Slide 16

Slide 16 text

Do you really do though? • In our experience, most of the time, you do not! (well, not at a high rate!) • You can use static configuration with dynamic data loading, filtering and styling

Slide 17

Slide 17 text

I receive new data continuously! • Fine, why set up a new layer for each data batch though? • If structure is regular • Use dimensions (time, elevation, custom ones) • Use client side filtering • Much better option to keep time moving windows (e.g., last 3 months of data) T1 Tn T2

Slide 18

Slide 18 text

Work on the mosaic index • Just record new entries and remove older ones in the database • No need to touch the configuration T1 Tn T2 Tn + 1 OUT! IN!

Slide 19

Slide 19 text

“Mosaics” everywhere • Storage options • Image mosaic store (STAC index too) • (Partitioned) database tables • Vector mosaic store (external storage)

Slide 20

Slide 20 text

Different uses -> different views • You “just” need to filter data • GeoFence plugin can filter layer based on the user by alphanumeric/spatial

Slide 21

Slide 21 text

Client limitations? Trick them! • Clients that cannot deal with dimensions, or vendor parameters (e.g. CQL_FILTER) • Use the “parameter extractor” community module /geoserver/tiger/wms/H11?SERVICE=WMS… /geoserver/tiger/wms?SERVICE=WMS &CQL_FILTER=CFCC=’H11’&...

Slide 22

Slide 22 text

Client allows to change style! • You allow users to change the style of the maps? • Just use &sld and &sld_body in your requests • Or parametric styles! “env” function is your friend!

Slide 23

Slide 23 text

But if you really need to change configuration all the time, then…

Slide 24

Slide 24 text

Typical use case • Case A: • The application allows users to upload their custom data • They are responsible for its configuration • Hopefully many small data sets • Case B: • Hum… wait, haven’t really met another case yet! (maybe tell me during the Q&A at the end?)

Slide 25

Slide 25 text

Clustering community modules • Not enough traction to have a dedicated maintainer • Few deployments use either of them JMS clustering JDBC clustering

Slide 26

Slide 26 text

JMS config • Loads data directory from XML files • Sends JMS messages to distribute changes

Slide 27

Slide 27 text

• Copies the catalog into a database • Loads configuration on demand • Caching (too slow otherwise, many queries) • hz-cluster sibling module sends drop cache messages JDBC config

Slide 28

Slide 28 text

Some testing

Slide 29

Slide 29 text

Testing WMS requests • Using ne-styles repository: • Natural Earth Data • CSS styling • Political map

Slide 30

Slide 30 text

Configuration cases • Data volumes • Case A: ne-styles as is (25 layers) • Case B: “ne” workspace duplicated up to 40.000 layers • Clustering • Vanilla/JMS config • JDBCConfig • Builds: 2.24.x nightly, June 24th 2023

Slide 31

Slide 31 text

Load testing results • Mostly unaffected by number of layers • JDBCConfig between 10% and 20% slower • JDBCConfig was 50% slower with with lots of layers, has improved! JDBCConfig couple years ago

Slide 32

Slide 32 text

Startup times with 40k layers • JDBC Config has constant startup time, does not load config → 13 seconds • Vanilla/JMS proportional to number of layers: → 56 seconds (would take longer on a completely cold disk) • Experimental new XML config loader in GeoServer cloud that could do better than this.

Slide 33

Slide 33 text

Administrative GUI access, 40k layers • Access to home page as admin: • JDBCConfig: 72 seconds • Vanilla/JMS: 1 second • Access to layers page as admin • JDBCConfig: 300 secs! • Vanilla/JMS: 2 seconds

Slide 34

Slide 34 text

The future: eat your cake and have it too!

Slide 35

Slide 35 text

Reality check - 1 • GeoServer code built to have the full catalog in memory • Code expects it’s quick to: • Get any configuration, eventually multiple times per request • Get the full list of anything (layers, workspaces, stores, styles, …) • Makes it hard to have any solution based on external storage

Slide 36

Slide 36 text

Reality check - 2 • A small amount of deployments actually need dynamic configuration changes in production • Core developers are already busy with GIS • Configuration data structures changes over time (core modules) → don’t want a clustering plugin that needs to be constantly aligned with core changes

Slide 37

Slide 37 text

Idea: Hazelcast distributed memory • Will use the same serialization as GeoServer configuration for messaging (maintained across configuration changes) • Declare distributed Maps, let HZ do the message passing for us • Distribution library, among other things: • Distributed data structures, with “near cache” • Distributed locks • Messaging • Integration with various clouds

Slide 38

Slide 38 text

Idea: Hazelcast distributed memory GS1 GS2 GS..n Hazelcast Distributed catalog near cache Distributed catalog near cache Distributed catalog near cache

Slide 39

Slide 39 text

Conclusions

Slide 40

Slide 40 text

Let’s summarize • Most of the time, use static data directory, share, load balance, auto-scale, live happily • When you really need to change configuration at runtime, JMS cluster or JDBCConfig, with some limitations • Moving forward, we’re going to develop a new plugin that hopefully matches low maintenance with good performance

Slide 41

Slide 41 text

The End Questions? [email protected] [email protected]