Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A DevOps perspective on GeoServer: Monitoring, Metering, Logging and Troubleshooting

A DevOps perspective on GeoServer: Monitoring, Metering, Logging and Troubleshooting

In this presetation, our Lead DevOps Alessandro Parma, together with our Director Simone Giannecchini, will walk us through the key points for making a deployment of GeoServer observable and give you confidence that you have the tools to anticipate problems and be prepare to provide quickly fixes.

Simone Giannecchini

June 29, 2021

More Decks by Simone Giannecchini

Other Decks in Technology


  1. A DevOps perspective on GeoServer: Monitoring, Metering, Logging and Troubleshooting

    Alessandro Parma Simone Giannecchini Luis E. Bermudez GeoSolutions
  2. Contents ⚫ About us ⚫ What is GeoServer? ⚫ Where

    to start ⚫ GeoServer key facts ⚫ Analyzing your data & scenario ⚫ Common Mistakes ⚫ Real World Use Cases ⚫ Conclusions & Next Steps 29th of June 2021- Online
  3. GeoSolutions ⚫ Founded in 2006, offices in Italy & US

    ⚫ Our core products ⚫ Our offer Enterprise Support Services Deployment Subscription Professional Training Customized Solutions GeoNode 29th of June 2021- Online
  4. Trusted by more than 200 clients • UN FAO (CIOK,

    FIGIS, NRL, FORESTRY, ESTG), UN WFP, World Bank, DLR, EUMETSAT, JRC, ARPAT, NATO CMRE, UNESCO, IGAD, UNEP, etc.. • BAYER, BASF, DigitalGlobe, MDA, TOPCON, SwissRE, e-GEOS, Halliburton, etc.. 29th of June 2021- Online
  5. Industries Smart Cities Space MetOcean Defense Natural Resources OpenData Utilities

    Research Emergency Response Government 29th of June 2021- Online
  6. Associations We strongly support Open Source, it Is in our

    core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT 29th of June 2021- Online
  7. Our Distinctive Traits ⚫ Lead Developers of GeoNode, GeoServer, MapStore

    and GeoNetwork ⚫ Vast experience with Raster Serving ⚫ Designed and developed JAI-Ext ⚫ Designed and developed ImageIO-Ext ⚫ Design and Developed most raster code in GeoTools/GeoServer ⚫ Vast Experience with Vector Data Serving ⚫ WFS, WMS, Vector Tiles with OGV ⚫ Extensive Experience with Spatial DBMS ⚫ Oracle, SQL Server, Postgis, MongoDB, etc.. ⚫ Extensive Experience with creating webgis applications ⚫ OpenLayers, Leaflet, Cesium, MapboxGL ⚫ Ext-JS, JQuery, Bootstrap, Angular, React, Redux ⚫ Extensive Experience with OGC Protocols ⚫ Extensive Experience in Performance and Scalability (Big Data and Cloud) ⚫ Unparalleled multi-industry experience 29th of June 2021- Online
  8. Team – Key Members 30+ Staff Members, 25+ Software Engineers

    ⚫ Andrea Aime: GeoServer Project Steering Committee, GeoTools PMC, JAI- Tools Lead, ImageIO-Ext committer ⚫ Simone Giannecchini: Founder, GeoServer PSC, GeoTools PMC, ImageIO- Ext Architect, JAI-Tools founder ⚫ Luis E. Bermudez: US CEO since 2020. 2010-2020 OGC Exec, Director of the Innovation and Compliance Programs. ⚫ Giovanni Allegri: Senior Project Manager, QGIS, GeoNode ⚫ Alessio Fabiani: Founder, GeoServer PSC, GeoTools Committer, MapStore Committer ⚫ Emanuele Tajariol: GeoServer Committer, GeoTools Committer, GeoNetwork PSC ⚫ Mauro Bartolomeoli: GeoServer Committer, GeoTools Comitter, GeoBatch Committer, MapStore Architect ⚫ Lorenzo Natali: MapStore2 Technical Lead 29th of June 2021- Online
  9. We are hiring! ⚫ You are fond on Open Source

    ⚫ If you are fond on GeoServer|GeoNode|MapStore|QGIS ⚫ If you like an international environment (both clients and colleagues) ⚫ If you like a challenging position ⚫ Send us your resume → multiple positions: ⚫ DevOps Engineer ⚫ Senior Frontend SW Engineer ⚫ Java SW Engineer ⚫ Support Engineer ⚫ Python SW Engineer 29th of June 2021- Online
  10. What is GeoServer? ⚫ GeoSpatial enterprise gateway • Java Enterprise

    • Management and Dissemination of raster and vector data ⚫ Standards compliant • OGC WCS 1.0, 1.1.1 (RI), 2.0 • OGC WFS 1.0, 1.1 (RI), 2.0 • OGC WMS 1.1.1, 1.3.0 • OGC WPS 1.0.0 • OGC CSW 2.0.1 (ebRIM) ⚫ Google Earth/Maps support • KML, GeoSearch, etc.. 29th of June 2021- Online
  11. What is GeoServer? GeoServer WFS WMS PostGIS Oracle H2 DB2

    SQL Server GeoPackage MySql Spatialite Elastic MongoDB Shapefile ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- WFS PNG, GIF JPEG TIFF, GeoTIFF SVG, PDF KML/KMZ Shapefile GML2 GML3 GeoRSS GeoJSON CSV/XLS GeoPackage Raw vector data Servers Styled maps DBMS Vector files WCS GeoTIFF WMS ArcGrid Img+world Mosaic MrSID JPEG 2000 ECW,Pyramid, Oracle GeoRaster, PostGis Raster, NetCDF Raster files Raw raster data GeoTIFF ArcGrid GTopo30 Img+World WMTS, TMS KML superoverlays Google maps tiles OGC tiles OSGEO tiles KML WPS CSW ESRI REST 29th of June 2021- Online
  12. Terminology ⚫ Logging ⚫ Ability to collect, aggregate, parse, enrich,

    store, index, logging events across the entire infrastructure ⚫ Applications Log → GeoServer Log, PostgreSQL Log… ⚫ Infrastructure Log → Syslog, Access log… ⚫ Metering (Metrics Collection) ⚫ Ability to collect, enrich, store, index measurements of resource usage (HW or SW), health and availability that can be observed and collected across the entire infrastructure, without impacting the infrastructure itself! ⚫ HW Metrics → CPU Usage, Memory Usage, Disk Usage.. ⚫ SW Metrics → Response Time, Throughput, Error Rate… ⚫ Instrumentation refers to performing deep metering of applications in certain conditions as it may impact the infrastructure itself 29th of June 2021- Online
  13. Terminology ⚫ Alerting ⚫ Ability to trigger proper and timely

    actions whenever certain metrics hit certain thresholds or certain conditions are met ⚫ Conditions → something important happened (bad|not bad) ⚫ Actions → notifications, memory dump, restarts, banning, … ⚫ Active Element → raise human attention, replace human intervention.. ⚫ Built on top of Metrics! ⚫ Monitoring ⚫ … the art and science of ensuring that an application both remains available and responds to user requests within an acceptable amount of time ⚫ The process of aggregating, visualizing and analyzing metrics, and logs to improve application and infrastructure awareness and performance and enable automatic events response ⚫ Metering + Logging + Alerting → Monitoring 29th of June 2021- Online
  14. Terminology ⚫ Observability is key ⚫ Looking at components from

    the outside may not suffice ⚫ E.g. GeoServer audit log ⚫ Microservices help (to a certain degree) ⚫ Observer Effect → monitoring should have a negligible impact ⚫ Beware of Alert Fatigue ⚫ too many alerts ⚫ too many notifications ⚫ Mixing the correct metrics with logs will help us quickly derive decision ready information and keep our system healthy and performant 29th of June 2021- Online
  15. Resources ⚫ An Introduction to Infrastructure and Application Monitoring ⚫

    Monitoring demystified: A guide for logging, tracing, metrics ⚫ Monitoring, Metering and Logging ⚫ Log Management ⚫ Gathering Metrics from Your Infrastructure and Applications ⚫ Metering, monitoring, and logging ⚫ Measuring DevOps ⚫ The Four Key Metrics of DevOps ⚫ The Complete Guide to Metrics, Monitoring & Alerting ⚫ Logging vs Monitoring: How are They Different & Why You Need Both ⚫ Log Management: What DevOps Teams Need to Know 29th of June 2021- Online
  16. GeoServer strengths & limitations ⚫ GeoServer Data Directory ⚫ Where

    GeoServer stores configuration in files ⚫ No automatic way to pick up config changes from files ⚫ Data can live in it, but we do not recommend it in enterprise set ups ⚫ Manually messing with the configuration files is dangerous ⚫ Memory-bound configuration ⚫ GeoServer loads data configuration in memory at startup (configuration not data itself) ⚫ GeoServer exposes GUI and REST endpoints to reload config when needed ⚫ Configuration reloading does not break OGC services ⚫ Configuration reloading blocks GUI and REST API 29th of June 2021- Online
  17. GeoServer strengths & limitations ⚫ Global Configuration Locks ⚫ GeoServer

    internal configuration not thread-safe → can handle high volume parallel reads (e.g. GetMaps) but shall serialize writes (e.g. REST API POST calls) ⚫ Access to GUI and REST API on a single instance is serialized + GUI does not like load balancers ⚫ OGC Requests can go in parallel (actually MUST) ⚫ Make sure you move expensive operation outside configuration changes (large file uploads, importer tasks, etc..) ⚫ Default Java Opts & Config ⚫ Heap Memory must be tuned ⚫ JNDI & Connections Pool must be properly configured ⚫ Resource Limits must be properly configured ⚫ Control Flow must be installed and must be properly configured 29th of June 2021- Online
  18. False Myths ⚫ GeoServer Needs a lot of memory ⚫

    With properly configured data and styles the bottleneck is usually the CPU not the memory ⚫ Our reference dimensioning is 4CPU, 2 to 4 GB of HEAP ⚫ Do you have 1M+ layers? If no, 4GB is enough ⚫ Do you generate large PDF prints of PNG maps? If no, 4GB is enough ⚫ Do you have 8 or more CPUs? If no, 4GB is enough ⚫ GeoServer is slow ⚫ Are you expecting GeoServer to serve a 1TB striped Bigtiff with no overviews? ⚫ Are you trying to visualize 10M points from a corporate Oracle table? ⚫ Did you optimize the standard configuration? ⚫ Are you running PROD with the prototype cross-platform binary? 29th of June 2021- Online
  19. False Myths ⚫ GeoServer is slow ⚫ You have deployed

    a single GeoServer instance with 2 CPUs, no caching and you expect it to handle 200 req/sec? ⚫ Example → GeoSolutions Maps Server ⚫ WMS, WMTS ⚫ Cloudless Sentinel 2 ⚫ OpenStreetMap → multiple styles 29th of June 2021- Online
  20. False Myths ⚫ Serving a large number of layers ⚫

    Large usually mean 50k or more ⚫ Start up times / Reload times can grow (e.g. Oracle tables) ⚫ Heap Memory usage might grow ⚫ GetCapabilities documents become slow and hard to parse for clients (e.g. bloated 100MB+ files) ⚫ Partitioning with Virtual Services can help ⚫ Sharding on different instances can help 29th of June 2021- Online
  21. Additional Resources ⚫ GeoServer in production webinar ⚫ Available here

    ⚫ Covers input data preparation ⚫ Covers Styling Optimization ⚫ Covers JVM Options tuning ⚫ Covers Configuration for robustness (resource limits and control flow) ⚫ Covers the basic info for tile caching ⚫ GeoServer Deployment Planning webinar ⚫ Available here ⚫ Covers Guidelines for proper deployment ⚫ Covers Scaling and Clustering ⚫ And more.. 29th of June 2021- Online
  22. Additional Resources ⚫ GeoServer in production presentations ⚫ FOSS4G 2016

    ⚫ FOSS4G 2018 ⚫ Our Training material ⚫ Advanced GeoServer Configuration ⚫ Enterprise Set-up Recommendations 29th of June 2021- Online
  23. Basics of GeoServer Logging ⚫ Logging ⚫ Collect and aggregate

    standard logs ⚫ Collect and aggregate audit logs ⚫ Collect and aggregate access logs ⚫ Metrics ⚫ Response time and throughput → aggregate, per layer, per service ⚫ Uptime ⚫ CPU & Memory usage → GeoServer is CPU intensive ⚫ Disk Space Usage → Running out of disk, common issue ⚫ Errors & exceptions rate ⚫ Alerts ⚫ OOM errors | Service Down ⚫ Response time high | Error rate high ⚫ Disk low | Memory high | CPU high 29th of June 2021- Online
  24. Application Container Logs ⚫ Apache Tomcat Example ⚫ Catalina.out →

    Application lifecycle, startup issues ⚫ Localhost.log → Host related information, in some cases webapp initialization errors ⚫ Manager logs → Tomcat Manager application ⚫ Access logs → All requests, one per line with timestamp and HTTP status codes 29th of June 2021- Online
  25. GeoServer Logs ⚫ Application Logs ⚫ Information is logged by

    GeoServer (GeoTools, GeoWebCache) itself ⚫ Can be quite verbose 29th of June 2021- Online
  26. GeoServer Log Profiles ⚫ Log Profiles ⚫ Defined in GeoServer

    Datadir ⚫ Are rotated and pruned by GeoServer at each startup 29th of June 2021- Online
  27. GeoServer Logs ⚫ Log files location ⚫ Avoid log file

    name clashing → Override default location ⚫ Avoid I/O waiting by using reasonably fast storage and set the appropriate log level ⚫ Log Levels ⚫ Logging to stdout ⚫ QUIET / PRODUCTION -> For Prod environments ☺ ⚫ GEOTOOLS_DEVELOPER -> Data access level issues (database access, file system access, files corrupted / unavailable) ⚫ VERBOSE -> Last resort, Extremely verbose ⚫ Performance Hit ⚫ Logging exceptions can be time consuming 29th of June 2021- Online
  28. GeoServer Audit Logging ⚫ Monitor Extension ⚫ Install in the

    WEB-INF/lib directory ⚫ Tracks requests made against a GeoServer instance ⚫ Can be stored in memory, on disk (audit files) or in a database ⚫ Configuration → Avoid name clashing → Move out of the data directory (GEOSERVER_AUDIT_PATH) 29th of June 2021- Online
  29. GeoServer Metering ⚫ Measure performance and uptime of your services

    ⚫ Many tools and services available out there for Uptime ⚫ GeoHealthCheck is a specialized QoS checker ⚫ Open Source ⚫ understands OGC services (GetCap, GetMap, GetFeature, …) ⚫ Measure the performance of the service “from the outside” ⚫ Easy to set up with little requirements ⚫ GeoServer Audits can also be used for passive metering ⚫ Fine-grained information about all requests hitting GeoServer ⚫ No uptime information ⚫ Measure the performance of the individual instance (need to be aggregated) “from the inside” ⚫ Little harder to set up and hardware requirements are significantly higher ⚫ Get started here 29th of June 2021- Online
  30. Logs Shipping ⚫ Centralize logs ⚫ Collect and display all

    logs in a single location ⚫ Troubleshoot issues in a timely fashion by having all the logs at hand ⚫ Search capability ⚫ Set up your own service or rely on third parties ⚫ Analyze logs ⚫ Ability to filter out the noise and look at relevant information ⚫ Enrich the logs ⚫ Shippers can add metadata and tags to the events to help filtering and drill down ⚫ On what machine is the application running ⚫ What OS version ⚫ … 29th of June 2021- Online
  31. Analytics ⚫ From audit files to Analytics ⚫ Elastic Stack

    ⚫ ElasticSearch - Search Engine ⚫ Logstash - Ingest Pipeline ⚫ Kibana - Charts, Visualizations, Dashboards ⚫ Beats – Log Shipping ⚫ Open Source Projects ⚫ Shipping, Parsing and Enriching the Events ⚫ Visualizations and Dashboards ⚫ GeoIP 29th of June 2021- Online
  32. GeoServer Analytics – Filtering ⚫ Events drill-down and debugging ⚫

    Filter on ⚫ Error condition ⚫ Event time ⚫ Layer Name ⚫ Response time (AVG, Max, ..) ⚫ IP of the requester ⚫ Location of the requester ⚫ Replay the request to reproduce the error ⚫ … 29th of June 2021- Online
  33. GeoServer Analytics - Perfomance ⚫ Analyzing the performance of your

    cluster ⚫ Global Response Time ⚫ Identify slow layers ⚫ Cache Usage ⚫ Popular Layers ⚫ Bandwidth Usage over time ⚫ … 29th of June 2021- Online
  34. Troubleshooting GeoServer ⚫ Troubleshooting ⚫ Despite all efforts bad things

    will happen → prepare for the worst! ⚫ Install JDK needed JDK tools → jstack, jmap ⚫ Quick access application container and GeoServer logs ⚫ Temporarily Raise log level (careful, can be verbose) ⚫ Have a test environment to reproduce issues ⚫ Scale down your cluster if necessary ⚫ Direct access to the UI to review the configuration, make quick adjustments, change log level, isolate the node, … 29th of June 2021- Online
  35. Toubleshooting GeoServer ⚫ Inspecting Logs ⚫ GeoServer logs ⚫ Configuration

    issues, Data access issues, … 29th of June 2021- Online
  36. Toubleshooting GeoServer ⚫ Application Container logs ⚫ Application lifecycle info

    ⚫ Out of Memory errors ⚫ Startup issues 29th of June 2021- Online
  37. Toubleshooting GeoServer ⚫ Startup Issues ⚫ Check Application Container logs

    ⚫ catalina.out and localhost.out ⚫ Did Tomcat start or not? Maybe the issues is not directly related with GeoServer ⚫ Check GeoServer logs ⚫ Catalog loading issues ⚫ Startup is taking a long time ⚫ Datadir is not accessible or broken 29th of June 2021- Online
  38. Toubleshooting GeoServer ⚫ Connectivity Issues ⚫ Databases ⚫ Check that

    the remote service is up and running ⚫ Check the credentials ⚫ Manually try to connect from the same machine as geoserver / from inside the geoserver container using telnet or DBMS specific client like psql ⚫ Network or Shared Storage ⚫ Try to access the storage yourself ⚫ Is it readable / writable by GeoServer? ⚫ External / Cascade Services ⚫ Are these slow / unreliable? 29th of June 2021- Online
  39. Toubleshooting GeoServer ⚫ Machine issues ⚫ Disk space ⚫ Run

    “dh –h” and check the file systems ⚫ Memory ⚫ Use “free -m” to check available memory on the system at this point in time ⚫ Look for signs of memory exhaustion in system logs and using the “dmesg” command. If the machine runs out of memory the scheduler is likely going to kill GeoServer ⚫ CPU ⚫ Use “top” or “htop” tools to monitor CPU usage ⚫ Inodes ⚫ Run “df -i” and check the number of inodes available in the file system 29th of June 2021- Online
  40. Toubleshooting GeoServer ⚫ Jstack / Jmap usage Examples Other resources

    ⚫ GeoServer Troubleshooting Documentation 29th of June 2021- Online
  41. Common Mistakes ⚫ Using the GS binary in prod? ⚫

    Use the WAR, use an Application Server at your choice ⚫ Not enough HW resources ⚫ Your deployment has less cores than your laptop? ⚫ Data not optimized ⚫ Serving a 1TB GeoTiff with no overviews? ⚫ Styling not optimized ⚫ Are you sure you need this much data at all zoom levels? ⚫ GeoServer not optimized ⚫ Did you tweak the Java opts? ⚫ Wrong Expectations ⚫ Speedy rendering 10M points with 1 CORE? ⚫ Serving maps nationwide with a single instance on a VPS? 29th of June 2021- Online
  42. Common Mistakes ⚫ Too many layers ⚫ Use ImageMosaic for

    TIME Series data ⚫ Use Parametric SQL Views ⚫ Shard if no other ways around ⚫ No Test / QA Environment ⚫ No possibility to experiment, everything happens in PROD ⚫ No monitoring, metering or logging ⚫ Do you like driving blindfolded? ⚫ No Caching ⚫ TileCaching and HTTP Caching are crucial when possible ⚫ Choosing Memory Optimize Instances ⚫ The first bottleneck you hit with a properly configured GeoServer is the CPU !! 29th of June 2021- Online
  43. Common Mistakes ⚫ Logging badly or too much ⚫ Wrong

    logging level (e.g. verbose) ⚫ Unnecessary logging (log to files with containers, log to stdout without containers) ⚫ Having multiple instances write to the same log file ⚫ Unsupported JDK version ⚫ JDK 8 and JDK 11 for GeoServer 2.15+ ⚫ Corrupted / Non readable Datadir ⚫ Machine level issues ⚫ Not enough disk space ⚫ Not enough RAM ⚫ Networking issues 29th of June 2021- Online
  44. ⚫ Logs, Audits and Metrics are shipped or scraped by

    FileBeat and Prometheus ⚫ Enriched and parsed by Logstash ⚫ Centralized in Elasticsearch / Prometheus ⚫ Accessed via Kibana and Grafana ⚫ Alerting managed by Alertmanager, ElastAlert and GeoHealthCheck Monitoring Infrastructure 29th of June 2021- Online
  45. Conclusions ⚫ Proper monitoring is crucial for production deployment ⚫

    Keep it in mind for your next deployment / think about how to integrate it in your existing system ⚫ There are many tools and technologies available ⚫ Use GeoServer Monitor extension and aggregated audit files to get in depth metrics and insights ⚫ Set up alerts to find out when is not working as expected before your clients 29th of June 2021- Online