GeoSolutions presentation for the "Modern Cloud Geospatial Architecture Survey" panel.
We talk about intersections of modern cloud architecture with geospatial world, with a special focus on GeoServer and MapStore.
⚫ Core products ⚫ Offer Enterprise Support Services Deployment Subscription Professional Training Customized Solutions GeoNode 9th of June 2021- FedGeo
core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT 9th of June 2021- FedGeo
and GeoNetwork ⚫ Vast experience with Raster Serving ⚫ Designed and developed JAI-Ext ⚫ Designed and developed ImageIO-Ext ⚫ Design and Developed most raster code in GeoTools/GeoServer ⚫ Vast Experience with Vector Data Serving ⚫ WFS, WMS, Vector Tiles with OGV ⚫ Extensive Experience with Spatial DBMS ⚫ Oracle, SQL Server, Postgis, MongoDB, etc.. ⚫ Extensive Experience with creating webgis applications ⚫ OpenLayers, Leaflet, Cesium, MapboxGL ⚫ Ext-JS, JQuery, Bootstrap, Angular, React, Redux ⚫ Extensive Experience with OGC Protocols ⚫ Extensive Experience in Performance and Scalability (Big Data and Cloud) ⚫ Unparalleled multi-industry experience 9th of June 2021- FedGeo
(let’s be honest) ⚫ Good definition here …model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. 9th of June 2021- FedGeo
self-service, Broad network access, Resource pooling, Rapid elasticity, Measured Service) ⚫ Service Models → PAAS, IAAS, SAAS ⚫ Deployment Models → Public, private, hybrid ⚫ Cloud-Native ⚫ SW developed specifically for the cloud ⚫ Microservices ⚫ Most existing software is not cloud-native ⚫ Cloud-Ready ⚫ SW adapted to work in the cloud ⚫ Monoliths rather then Microservices ⚫ Everybody claims to be cloud-ready 9th of June 2021- FedGeo
not Cloud Computing ⚫ A dedicated server with Kubernetes, containers, etc.. is not Cloud Computing (?) ⚫ Most of the time Cloud == Public Cloud ⚫ The promise of increased agility is real ⚫ The promise of cost savings is not-so-real ⚫ Cloud provider lock in is the new vendor lock-in ⚫ SAAS is (sometimes) becoming the new license trap 9th of June 2021- FedGeo
size and footprint of applications ⚫ Distributed systems with loose coupling ⚫ Well defined (REST) Interfaces for communication ⚫ Containers to simplify deployment ⚫ Automation ⚫ Minimize human intervention to reduce risks ⚫ Ensure repeatability ⚫ Monitoring, Metering, Logging ⚫ Know what happens before it happens ⚫ Take proactive actions ⚫ Use of Managed Services ⚫ Use what the Cloud provider… provides (within reason)! ⚫ Object Storage (COG, Flatgeobuf, ZARR) ⚫ Event Driven processing (AWS Lambda, Azure Functions) ⚫ Use of Managed Services ⚫ Use what the Cloud provider… provides (within reason)! 9th of June 2021- FedGeo
by adding more resources without code refactoring (within reason…) ⚫ Horizontally rather than vertically ⚫ Agility ⚫ Faster time to market ⚫ I am surely missing something ... 9th of June 2021- FedGeo
exploit Object Storage, Events and so on ⚫ Hard to automate ⚫ Simply put, it is up most of the time with no intervention ⚫ Hard to observe ⚫ Simply put, it is up most of the time with no intervention ⚫ Hard to scale ⚫ Horizontally rather than vertically ⚫ Hard to evolve ⚫ Horizontally rather than vertically ⚫ Many more… “Legacy” Cloud (Geo) Architecture 9th of June 2021- FedGeo
data, openly available is bigger than ever ⚫ Sentinel data, Landsat, Point Clouds, Buildings, OSM, … ⚫ Directly in the cloud, ready to be exploited ⚫ Object Storage ⚫ Traditional file systems do not scale, not even in cost! ⚫ Cloud formats to the rescue → COG, Zarr, FlatGeobuf, etc… ⚫ Discovery & Access ⚫ Clients may want to access data directly → COG in QGIS or Browser ⚫ STAC and OpenSearch to the rescue! ⚫ Exposing large catalogues in a discovery friendly manner ⚫ DRI & ARD ⚫ Application Ready Data, is key to lower exploitation barriers → once again STAC + COG is a good example ⚫ Decision Ready Information to drive informed decisions → the struggle is to extract the right information to drive decisions Data & Cloud Geospatial Architecture 9th of June 2021- FedGeo
moving to REST ⚫ Interoperability is still key ⚫ Beware of the various lock-ins! ⚫ Discovery & Access ⚫ Did I already mention STAC, OpenSearch and COG? ⚫ Analytics VS data access ⚫ We need services to answer simple question in a simple way, interactively, over huge datasets (Petabytes-size) ⚫ Notebooks, DAPA, H3, DGGS, GEE, MS AI for Earthetc.. ⚫ Serverless paradigm & automation ⚫ Cloud platforms allows us to reduce boilerplate code ⚫ Cloud platforms allows us to break hard dependencies ⚫ Cloud platforms allows us to concentrate on business-critical code ⚫ I am surely missing something important … Services & Cloud Geo Architecture 9th of June 2021- FedGeo
is common (e.g. Vector Tiles) ⚫ Analytics in the browser is becoming common (did I mention COG?) ⚫ WebGL is the key technology ⚫ Stories, Dashboards, anything but maps! ⚫ Dashboards or nothing! ⚫ Exploring new ways to visualize and make sense of geo data ⚫ ESRI paved the way ⚫ 3D & AR ⚫ Making sense of 3D capabilities in the browser (beyond layers on a globe) ⚫ Gaming engines opening up new possibilities (see below) ⚫ Digital Twins ⚫ Smart Cities on steroids ⚫ A pinch of everything (IOT, BigData, 3D, Lidar, BIM) with a 3D client ⚫ Many more… UI & Cloud Geo Architecture 9th of June 2021- FedGeo
serve large amount of data ⚫ User want to access (visualize, analyze/process, download) such data ⚫ Fine grain access permissions ⚫ Possible Scenarios ⚫ Publishing of EO time series ⚫ Publishing of Drone data time series ⚫ Publishing of Sensor Time Series ⚫ Publishing of MetOc or Atmospheric Modela ⚫ Publishing of positions for moving objects ⚫ Publishing of related products ⚫ Any combination of the above… Data-as-a-service Platform 9th of June 2021- FedGeo
Data is added along one or more dimensions (TIME, ELEVATION, FlightUUID, etc..) ⚫ Data is (sometimes) removed as it falls out of a window of validity ⚫ Data is continuously added, rarely modified, at most is removed ⚫ Data is the key element! ⚫ Ingestion time shall be minimized ⚫ Speed of serving is key ⚫ Data shall be ready-to-use Data-as-a-service Platform 9th of June 2021- FedGeo
+ MetOc data ⚫ Continuous ingestion of vector and raster data (100+ time series) ⚫ 1 year catalogue of data → aiming at 20 years ⚫ Private cloud with business continuity ⚫ Multistage environment with full automation (Gitlab Pipelines) ⚫ Infrastructure and Data Configuration as code 9th of June 2021- FedGeo
global Geological data model ⚫ AWS deployment using managed Kubernetes cluster ⚫ Multistage environment with full automation (Jenkins) ⚫ for software, data and configuration ⚫ Monthly configuration and data release with QA cycle ⚫ GeoServer Cluster + Tile Caching on S3 ⚫ MapStore frontend with basic 3D capabilities 9th of June 2021- FedGeo
virtual machines → moving to Azure as we speak ⚫ Ship positions updated continuously → ingestion & enrichment system based on Kafka, 2500 position/sec ⚫ Passive GeoServer Cluster with dedicated Backoffice instance → No tile caching! ⚫ Machines, Software and Configuration deployments are managed by configuration management system (Puppet) and Pipelines (Jenkins) ⚫ Multistage environment (TEST, PREPROD, QA, PROD) ⚫ Infrequent configuration changes → Datadir versioned in a GitLab repository ⚫ OGC WMS & WFS to disseminate to third party clients → interoperability is key ⚫ Currently relying on Oracle Exadata → moving to Azure PostgreSQL 9th of June 2021- FedGeo
here ⚫ Covers input data preparation ⚫ Covers Styling Optimization ⚫ Covers JVM Options tuning ⚫ Covers Configuration for robustness (resource limits and control flow) ⚫ Covers the basic info for tile caching ⚫ GeoServer in production presentations ⚫ Our Training material ⚫ Advanced GeoServer Configuration ⚫ Enterprise Set-up Recommendations ⚫ WE WON’T COVER THIS AGAIN → it is a precondition for what we talk about here 9th of June 2021- FedGeo
here ⚫ GeoServer characteristics, points of strength, limitations and things to know in general to drive the design ⚫ How to plan for scalability and high availability, focusing on our specific scenario ⚫ How to design a multi environment set up to account for QA over software, data as well as configuration ⚫ Monitoring, Metering, Logging and Troubleshooting a GeoServer webinar ⚫ Planned for 29th of June ⚫ Registration here (it is free!) 9th of June 2021- FedGeo
GeoServer stores configuration in files ⚫ No automatic way to pick up config changes from files ⚫ Data can live in it, but we do not recommend it in enterprise set ups ⚫ Manually messing with the configuration files is dangerous ⚫ Memory-bound configuration ⚫ GeoServer loads data configuration in memory at startup (configuration not data itself) ⚫ GeoServer exposes GUI and REST endpoints to reload config when needed ⚫ Configuration reloading does not break OGC services ⚫ Configuration reloading blocks GUI and REST API 9th of June 2021- FedGeo
internal configuration not thread-safe → can handle high volume parallel reads (e.g. GetMaps) but shall serialize writes (e.g. REST API POST calls) ⚫ Access to GUI and REST API on a single instance is serialized + GUI does not like load balancers ⚫ OGC Requests can go in parallel (actually MUST) ⚫ Make sure you move expensive operation outside configuration changes (large file uploads, importer tasks, etc..) ⚫ Default Java Opts & Config ⚫ Heap Memory must be tuned ⚫ JNDI & Connections Pool must be properly configured ⚫ Resource Limits must be properly configured ⚫ Control Flow must be installed and must be properly configured 9th of June 2021- FedGeo
With properly configured data and styles the bottleneck is usually the CPU not the memory ⚫ Our reference dimensioning is 4CPU, 2 to 4 GB of HEAP ⚫ Do you have 1M+ layers? If no, 4GB is enough ⚫ Do you generate large PDF prints of PNG maps? If no, 4GB is enough ⚫ Do you have 8 or more CPUs? If no, 4B is enough ⚫ GeoServer is slow ⚫ Are you expecting GeoServer to serve a 1TB striped Bigtiff with no overviews? ⚫ Are you trying to visualize 10M points from a corporate Oracle table? ⚫ Did you optimize the standard configuration? ⚫ Are you running PROD with the prototype cross-platform binary? 9th of June 2021- FedGeo
a single GeoServer instance with 2 CPUs, no caching and you expect it to handle 200 req/sec? ⚫ Serving a large number of layers ⚫ Large usually mean 50k or more ⚫ Start up times / Reload times can grow (e.g. Oracle tables) ⚫ Heap Memory usage might grow ⚫ GetCapabilities documents become slow and hard to parse for clients (e.g. bloated 100MB+ files) ⚫ Partitioning with Virtual Services can help ⚫ Sharding on different instances can help 9th of June 2021- FedGeo
implementations available ⚫ GeoSolutions one here ⚫ Official image coming soon ⚫ Advantages ⚫ We did some of the work for you ⚫ Flexible portable and repeatable ⚫ Orchestrators can help ⚫ Disadvantages ⚫ Require some prior knowledge ⚫ Debugging can be a headache 9th of June 2021- FedGeo
⚫ Requirements and Code ⚫ Configuration ⚫ Data? Not recommended ⚫ Monitor your containers ⚫ Centralized logging ⚫ Parametrize logs and audits file paths ⚫ Sharing of files and directories is not implicit ⚫ Logging to stdout ⚫ File permissions ⚫ Watch your user IDs and GeoServer user permissions ⚫ Users on the host system are not the same as the ones in the container 9th of June 2021- FedGeo
Disk Quota ⚫ Controlflow ⚫ GeoFence ⚫ Security ⚫ DNS can help too ⚫ Parameterized Configuration ⚫ Database URLs ⚫ Usernames and Passwords ⚫ Backup & Restore Plugin ⚫ Port changes between Environments for you ⚫ No restart, possibility to Dry-run ⚫ Possibility to filter per layer or workspace ⚫ Experimental but getting more mature 9th of June 2021- FedGeo
the end users ⚫ Test code changes ⚫ Intranet vs Internet facing services ⚫ Allow multiple teams to work in parallel ⚫ How? ⚫ Automate migration between environments ⚫ Make you data directory portable ☺ ⚫ Use containers ⚫ Use backup and restore 9th of June 2021- FedGeo
out – Horizontal Scalability ⚫ Having more similar nodes in parallel ⚫ Natural fit for elastic computing environments ⚫ Autoscaling ⚫ Scaling up – Vertical Scalability ⚫ More HW resources to a single machine ⚫ Natural fit for legacy static environments ⚫ GeoServer can cope with both ⚫ Scaling up to 64 cores has been proven in the past ⚫ Scaling up requires fine tuning to be CPU bound rather than I/O bound as we seek CPU utilization ⚫ Scaling out has been done in K8s, AWS, Azure, GCP, etc… ⚫ Multiple strategies for scaling out 9th of June 2021- FedGeo
powerful HW ⚫ Single fine tuned GeoServer will give you scalability but not availability ⚫ No autoscaling, configured for largest expected/handled load ⚫ HW is a hard bottleneck ⚫ Scaling out – Horizontal Scalability ⚫ Many smaller GeoServer instances working in parallel ⚫ Sharding and grouping by data/functionality is an option ⚫ Superior Scalability, Superior Availability ⚫ If autoscaling is allowed, no need to configure for worst case scenario ⚫ Mixed Approach ⚫ Multiple larger compute instances with multiple GeoServer instances → common in legacy virtualized environments 9th of June 2021- FedGeo
instances ignore each other ⚫ Active Clustering → GS instances talk to each other ⚫ Active Clustering ⚫ Config Changes propagate between instances ⚫ Requires specific extensions (JMS Clustering, Hazelcast, Stratus, GeoServer Cloud) ⚫ More moving parts, more maintenance work! ⚫ Use it wisely ⚫ Passive Clustering ⚫ No special plugins ⚫ Config Changes do not propagate → reload is required ⚫ No additional moving parts ⚫ Can cover 90% of use cases 9th of June 2021- FedGeo
administration ⚫ Changes via GUI or via REST Interface ⚫ Can do Active/Passive ⚫ Productions instances are for data serving ⚫ No config changes ⚫ Can scale horizontally! ⚫ Data is centralized and shared between instances ⚫ Configuration promotion requires reload ⚫ With some tricks it can cover most use cases 9th of June 2021- FedGeo
in the data directory ⚫ GS load its config in memory at startup ⚫ GS does not automatically pick up config changes from the data directory (needs explicit config reload via GUI or REST) ⚫ GeoServer GUI does not work well behind a randomizing load balancer ⚫ GS startup/reload times can be long with 10k+ layers ⚫ GS continuously write to log files ⚫ GS TileCache can work in clustering 9th of June 2021- FedGeo
thinking about active clustering plugins make sure you need such layout! ⚫ 95% of cases Passive Clustering with Backoffice- Production is enough! ⚫ We will focus on Active Clustering in a future webinar 9th of June 2021- FedGeo
was born when cloud meant this → ⚫ It is somewhat a monolith (not really, read on..) ⚫ We can’t depend on any cloud provider ⚫ We still use the file system here and there ⚫ GeoServer is cloud-ready ⚫ It is highly modular, its footprint can be reduced a lot ⚫ It is known to run in AWS, Azure, GCP, OpenShift, IBM Cloud, etc.. ⚫ It is known to run in K8s, Rancher, etc.. ⚫ It can autoscale (CPU is the resource to look at) ⚫ It can use Object Storage (Tile Cache, COG, etc..) ⚫ Prefers compute intensive instances ⚫ Likes Containers ⚫ Likes Automation! (Azure Pipelines, Jenkins, etc..) 9th of June 2021- FedGeo
takeways ⚫ Data keeps growing, the cloud can help ⚫ OpenData has won ⚫ Interoperability is key (more than ever) ⚫ Automation is crucial (Azure Pipelines, Jenkins, etc..) ⚫ Leverage Cloud Native Formats (did I mention COG?) ⚫ Beware of the cloud provider lock in ⚫ SAAS is the new stovepipe/license trap 9th of June 2021- FedGeo