Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying and operating GeoServer: a DevOps perspective (FOSS4G 2022 Edition)

Simone Giannecchini
August 31, 2022

Deploying and operating GeoServer: a DevOps perspective (FOSS4G 2022 Edition)

Cloud computing is revolutionizing the way companies develop, deploy and operate software and GeoSpatial software is no exception. With benefits of cloud based deployments range from cost savings to simplified management, flexibility, lower downtime and scalability of dynamic environments it is easy to understand why more and more companies are migrating their on premise systems to the cloud but cloud based setups have their own set of hurdles and challenges.
The migration of the series itself can be challenging. Monitoring, debugging and scaling of applications are very much different than what you are used to.
In this presentation we will share with you the lessons we have learned at GeoSolutions to tackle these problems and share some common patterns for the migration of on premise GeoServer clusters to the cloud. We'll share with you tips on how to:

best practices to migrate your existing GeoServer cluster to the cloud
insights on your geoserver cluster using centralized logging and Monitor plugin
avoid common bottlenecks to best set up a distributed scalable GeoServer cluster 
work containers and container orchestrators like Kubernetes

Simone Giannecchini

August 31, 2022


  1. GeoSolutions Enterprise Support Services Deployment Subscription Professional Training Customized Solutions

    GeoNode • Offices in Italy & US, Global Clients/Team • 40+ collaborators, 30+ Engineers • Our products • Our Offer
  2. Affiliations We strongly support Open Source, it Is in our

    core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT
  3. What is GeoServer? • GeoSpatial enterprise gateway • Manage and

    Disseminate of raster and vector data • Standards compliant • OGC WCS 1.0, 1.1.1 (RI), 2.0 • OGC WFS 1.0, 1.1 (RI), 2.0 • OGC WMS 1.1.1, 1.3.0 • OGC WPS 1.0.0 • OGC CSW 2.0.1 (ebRIM) • Google Earth/Maps support • KML, GeoSearch, etc..
  4. What is GeoServer? GeoServer WFS WMS PostGIS Oracle H2 DB2

    SQL Server GeoPacka ge MySql Spatialite Elastic MongoDB Shapefile ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- WFS PNG, GIF JPEG TIFF, GeoTIFF SVG, PDF KML/KMZ Shapefile GML2 GML3 GeoRSS GeoJSON CSV/XLS GeoPackage Raw vector data Servers Styled maps DBMS Vector files WCS GeoTIFF WMS ArcGrid Img+world Mosaic MrSID JPEG 2000 ECW,Pyramid, Oracle GeoRaster, PostGis Raster files Raw raster data GeoTIFF ArcGrid GTopo30 Img+World WMTS, TMS KML superoverlays Google maps tiles OGC tiles OSGEO tiles KML WPS CSW ESRI REST
  5. App. Execution Technologies • How things changed over time •

    Bare Metal (non virtualized) Buy or Rent the HW and install everything from scratch. Single Tenant • Virtual Machines Spin up a virtual server along with others running on an existing, virtualized server. Looks like a dedicated machine • Containers Run a single (?) application alongside others on an existing environment T I M E
  6. Bare Metal (no virt.) • Pros • Fast (no virt.

    layer) • Single tenant -> Secure • No “noisy neighbour” effect -> more stability • Cons • Limited flexibility & scalability • billing, re-installs • Dimensioning is hard -> over-provision • More work for Operations
  7. Virtual Machines • Pros • Flexibility (Start, Stop, Snapshot, Template)

    • Fragmentation and resource usage • Reduces costs • Cons • Virtualization layer overhead • Noisy neighbour effect • Single failure impacts many services
  8. What is a Container • What is a container then?

    • Type of virtualization that happens at the operating system level • Applications can run in an isolated user spaces called “Containers” • Implemented at the kernel level, multiple containers share the same Kernel
  9. Containers • Pros • Application is bundled with dependencies •

    Shared Kernel (lower resource) • Easy installation and migration • High density with isolation • Startup time • Cons • Steep learning curve • Existing apps must be “containerized” first • Shared Kernel (security?)
  10. • Platform to manage containers and services • Originally Developed

    by Google based on their experience with Borg • Manage a set of nodes to provide you with a platform to run the containers • Helps you manage and scale your applications What is Kubernetes
  11. Why is it relevant? • Traditional deployments • No resource

    boundaries → some applications starve for resources • Can’t easily reallocate resources after the initial setup • Virtual Machines • Multiple VMs on the same server → better resource utilization • Better isolation • Each VM has a copy of the OS
  12. Why is it relevant? • Containers • shared kernel with

    isolated userspace • each container has its own filesystem, a share of CPU cores • decoupled from they underlying infrastructure → portable across distributions and cloud providers • …
  13. Why is it relevant? • … • fast image creation

    and easy rollback compared to VMs → Good fit for frequent deployments and CI/CD • separation of concerns between Devs and Ops • consistency across development in multiple environments
  14. Why Kubernetes? • Manage the containers that run your applications

    in production with no downtime • Takes care of running your application containers on a distributed system • Scaling application and the nodes cluster and failover
  15. • Also • Service discovery and load balancing • Storage

    Abstraction mount storage of choice) • Auto rollouts and rollbacks. You describe the desired state for your containers • Self Healing with restart failing containers, and probes, .. • Secrets management. Externalize sensitive and env. specific info Why Kubernetes?
  16. How does it compare to.. • There are other orchestrators

    and tools available to manage containers • Docker Compose • allows you to define services as collections but that is pretty much it • Docker Swarm • gives you to work on a distributed environment • services definition and commands are somewhat similar to compose • not as sophisticated (and complex! as K8s)
  17. Containerize GeoServer • Need a docker image • Use the

    ones available! • GeoSolutions here, sources here • Official Image available check blogpost here • Ready to use • Based on Tomcat images • Configurable
  18. Helm • Package Manager for Kubernetes • Use “recipes” available

    online • Deployment unit is called a Chart • Charts can easily be published, versioned and shared • Provide templating for your Manifests • GeoSolutions Chart implementation available for preview: https://github.com/geosolutions-it/charts
  19. Resources • What is Kubernetes • Borg: The Predecessor to

    Kubernetes • Containerization • What is a Container • Docker Compose • Docker Swarm • Rancher
  20. Best Practices … • Start simple • Set up a

    local environment (Minikube) • Use available Images and Charts, don’t reinvent the wheel • Use managed K8s • Plan ahead • Design your infrastructure before moving on with deployment • Clustering Strategies, Storage Classes , Volumes sizing • Implement a Logging and a monitoring solution
  21. … Best Practices • Choose the right nodes • No

    need for tons of RAM • Compute optimized node perform better (~4 CPUs/instance)
  22. Choose Storage - Block Storage • Local or Block Storage

    • Local • Non Shared • Can fail • Low latency • Good fit for temporary storing Logs and Audits • Cached Tiles?
  23. Choose Storage - File Share • File Share • File

    Storage ~ NFS • Shared (locally) • Scales well • Typical use case is GeoServer datadirs • Spatial Data • Cached tiles?
  24. Choose Storage - Blob Storage • Blob Storage • Max

    scalability • Elastic • Cheap • High Latency • Shared • Robust • Cached Tiles?
  25. EO Data Dissemination • Earth Observation • Meteorological and Oceanographic

    data • Continuous data ingestion flordata • TBs of data growing over time. Raster and Vector •
  26. Monitoring and Logging • Can be tricky in Containerized environments!

    • Dynamic and distributed • Instances are spinning up and down • distributed across multiple nodes • Hard to identify problems • What can we do?
  27. Centralize and Aggregate • Centralize and Aggregate • Single central

    location • Easy to navigate and filter • New nodes can spawn but also go away • You’ll need “shippers” to collect and send out logs to the central service
  28. Auditing • GeoServer can produce audit files for you •

    Monitoring extension • Tracks requests made to GeoServer • Collect and ingest them to create pretty Dashboards
  29. Pointers • GeoSolutions Website https://geosolutionsgroup.com/ • GeoServer on Kubernetes https://www.geosolutionsgroup.com/blog/devops-k8

    s/ • A DevOps perspective on GeoServer https://www.geosolutionsgroup.com/blog/devops-geoser ver-monitoring-metering/ • Cloud Optimized GeoTiffs https://www.cogeo.org/ https://docs.geoserver.org/latest/en/user/community /cog/index.html • Helm https://helm.sh/