Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Supporting precision farming with GeoServer: p...

Supporting precision farming with GeoServer: past experiences and way forward

This presentation will condense 10 years of GeoSolutions in ingesting, managing and disseminate data at scale in the cloud with GeoServer for the precision farming industry covering items like:
- Proper optimizations and organization of raster data
- Proper optimizations and organization of vector data
- Modeling data for performance & scalability in GeoServer and PostGIS
- Deployment guidelines for performance and scaling GeoServer
- Styling to create NDVI and other visualizations on the fly

At the end of the presentation the attendees will be able to design and plan properly a GeoServer deployment to serve precision farming data at scale.

Simone Giannecchini

August 29, 2022
Tweet

More Decks by Simone Giannecchini

Other Decks in Technology

Transcript

  1. GeoSolutions Enterprise Support Services Deployment Subscription Professional Training Customized Solutions

    GeoNode • Offices in Italy & US, Global Clients/Team • 40+ collaborators, 30+ Engineers • Our products • Our Offer
  2. Affiliations We strongly support Open Source, it Is in our

    core We actively participate in OGC working groups and get funded to advance new open standards We support standards critical to GEOINT
  3. GeoServer • GeoSpatial enterprise gateway − Modular & extensible −

    Management & dissemination of raster and vector data • Cloud/ Big Data friendly − Cluster deployments in AWS, Azure, On Premise − Powering petabytes-size data (EO, MetOcean, etc.) − Serving up to 10k+ requests per second • Standards compliant − OGC WCS 1.0, 1.1.1 (RI), 2.0.1 − OGC WFS 1.0, 1.1 (RI), 2.0 − OGC WMS 1.1.1, 1.3 − OGC WPS 1.0.0 − OGC CSW 2.0.2 • Google Earth/Maps support • License is GPL v2.0 More Information
  4. GeoServer GeoServer WFS WMS PostGIS Oracle H2 DB2 SQL Server

    GeoPackag e MySql Spatialite Elastic MongoDB Shapefile ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- ---------- ---------- --------- ---------- WFS WMS WMTS PNG, GIF JPEG TIFF, GeoTIFF SVG, PDF KML/KMZ Shapefile GML2 GML3 GeoRSS GeoJSON CSV/XLS GeoPackage Raw vector data Servers Styled maps DBMS Vector files WCS GeoTIFF WMS ArcGrid Img+world Mosaic MrSID JPEG 2000 ECW,Pyrami d, NetCDF,... Raster files Raw raster data GeoTIFF ArcGrid GTopo30 Img+World WMTS, TMS KML superoverlays Google maps tiles OGC tiles OSGEO tiles KML WPS CSW ESRI REST
  5. Key Concepts • Supporting farming decision makers • with consuming

    various types of data • EO Data • Drone Imagery • Field Sensors (e.g. Meteo Stations) • Vehicles Positions & Sensors • Meteorological Models • More (not related to our work) • from different deployments • (Private|Public) Cloud • On Prem
  6. Key Concepts • via Heterogeneous types of platforms • All-in-one

    DAAS solution to farmers • EO focused platform • IOT Focused platform for vehicle data • IOT Focused platform for field sensors • Various combinations of the above • setup by entities with diff bkgs & objectives • Large Corp. focusing solely on Precision Farming • Subsidiaries|Branches of large Corp. focusing solely on Precision Farming • Startups focusing solely on Precision Farming
  7. Key Concepts • Challenging environments to support • Lots of

    different data • Continuously ingested • & processed into new data, alerts, charts, etc.. • We are going to address some challenges & Scenarios • Serving EO Data • Serving Drone Data • Serving IOT Like Data • Deploying & Operating GeoServer
  8. Serving EO Data • Typical EO scenario • Multispectral data

    (Sentinel, Landsat, Planet, etc…) • Hyperspectral data (Hyperion, EnMap, etc..) • Rarely SAR data • Focus on • RGB data → mostly pure visualization • Indexes → mostly band algebra like NDVI and friends • Deep time series for comparison and analysis • Continuous ingestion is crucial
  9. Serving EO Data Hyperion 4 years MSAVI Sentinel 2 Time

    Series Multiple Indexes with Sentinel, Landsat and Planet Data
  10. • Poor Data Preprocessing|Optimization • GeoServer is slow at serving

    index XXX out of my uncompressed, untiled, multiband geotiff, why? • Rule #1 → preprocess|optimize data at ingestion • Rule #2 → preprocess|optimize data at ingestion (not a typo, just reinforcing) • Hint #1 → GDAL is your friend|swiss knife • Hint #2 → Automate! Airflow, Lambda, etc.. • Suggestions • GeoTiff is usually the best format (COG later on) and it is free! • Compression, Tiling and Overviews are crucial → spend some early CPU cycle to get performance later on • Do not fear lossy compression when possible • Do not be afraid of sacrificing space for performance if needed Typical Mistakes and how to avoid them
  11. Typical Mistakes and how to avoid them • Resources on

    data preprocessing • Our training material • Best Practices for Optimizing Performance with GeoServer (deck, video) • A DevOps perspective on GeoServer: Deployment planning guidelines, (video, deck, FOSS4G2021) • Precision Farming with GeoServer, GeoSolutions Recommendations-v01.01 (beware, long document!) • GeoServer on steroids presentations from FOSS4G • A ton of GDAL related material you can find on the web!
  12. • Poor data organization in GeoServer • My GeoServer does

    not scale as I expected, despite proper data preprocessing. It takes minutes to (re)start, why? • Fact #1 → if you have thousands of layers, you need to reorganize data quickly • Fact #2 → if you have millions of layers, you are doomed! • Suggestions • Conceptually organize EO data by common sensor, data type, etc.. • Map this to multidimensional ImageMosaic → break the 1 scene 1 layer circle • Use REST API to automate data management → no need to reload configuration, can use simple passive clustering • 99% of the time all you need are a handful (<100) of multidimensional ImageMosaic layers! Typical Mistakes and how to avoid them
  13. • Resources • A DevOps perspective on GeoServer: Deployment planning

    guidelines, (video, deck) • Precision Farming with GeoServer, GeoSolutions Recommendations-v01.01 (beware, long document!) Typical Mistakes and how to avoid them
  14. • Poor caching • My GeoServer does not scale as

    I expected, despite proper data preprocessing and organization, why? • Rule number 1 → preprocess|optimize data at ingestion (not a typo, just reinforcing) • Rule number 2 → introduce tile caching • Rule number 3 → introduce HTTP caching (if possible) • Suggestions • Once data is optimized and all, focus on caching, not before! • Tile Caching comes at a cost, make sure you can reuse tiles multiple times otherwise avoid caching • HTTP Caching is super important but also tricky, evaluate carefully the impact before adopting • Tile|HTTP Caching works nicely with URLs whose content does not change over time Typical Mistakes and how to avoid them
  15. • Resources • A DevOps perspective on GeoServer: Deployment planning

    guidelines, (video, deck) • Precision Farming with GeoServer, GeoSolutions Recommendations-v01.01 (beware, long document!) Typical Mistakes and how to avoid them
  16. More Goodies: COG • Cloud Optimize Geotiff • EFS costs

    are killing us, what can we do? • COG is (usually) the answer • GeoServer supports COG as single GeoTiff or ImageMosaic • We support S3, Google Cloud Storage and (soon) Azure • Missing bits • Local caching • Support for Binary Masks • Before you ask • Object Storage gives you better scalability at a fraction of the price → you might sacrifice some raw speed!
  17. More Goodies: COG • Resources • Modern Cloud Geospatial Architectures

    Survey • Best Practices for Optimizing Performance with GeoServer (deck, video) • Serving earth observation data with GeoServer: COG, STAC, OpenSearch and more • GeoServer docs • Many other resources on the web
  18. More Goodies: ImageMosaic & Time Series • About data •

    Data is added or removed to GS • Data is rarely updated • E.g. data from EO, MetOc, Drones, IOT Sensors • Refrain from creating millions of layers, it won’t scale 🡪 organize your data properly • Use ImageMosaic for rasters if possible • Use TIME and other dimensions • Use CQL_Filter • Keep # of layers low if not constant • Manager data via REST API 🡪 no configuration reload • Use single tables for vector if possible • Use TIME and other dimensions • Use CQL_Filter or Parametric SQL View • Keep # of layers low if not constant as you ingest data
  19. More Goodies: ImageMosaic & Time Series • Resources • Best

    Practices for Optimizing Performance with GeoServer (deck, video) • Serving earth observation data with GeoServer: COG, STAC, OpenSearch and more • Our training material • Precision Farming with GeoServer, GeoSolutions Recommendations-v01.01 (beware, long document!) • GeoServer on steroids presentations from FOSS4G
  20. More Goodies: Map Algebra with Jiffle • Map algebra for

    GeoServer • Simple expressions, but also decisions, loops, arrays, variables • Reduce time to publishing • Compute products on the fly • Add new products without writing code • Docs • Use it in • Straight WPS calculation • As part of styles • Large computed downloads (also WPS)
  21. More Goodies: Map Algebra with Jiffle • Normalized Differential Vegetation

    Index (Hyperion) <Transformation> <ogc:Function name="ras:Jiffle"> <ogc:Function name="parameter"> <ogc:Literal>coverage</ogc:Literal> </ogc:Function> <ogc:Function name="parameter"> <ogc:Literal>script</ogc:Literal> <ogc:Literal> nir = src[46]; vir = src[31]; dest = (nir - vir) / (nir + vir); </ogc:Literal> </ogc:Function> </ogc:Function> </Transformation> <Rule> <RasterSymbolizer> <Opacity>1.0</Opacity> <ColorMap> <ColorMapEntry color="#000000" quantity="-1"/> …. <ColorMapEntry color="#00ff00" quantity="1"/> </ColorMap> </RasterSymbolizer> </Rule>
  22. Serving Drone Data • Typical Drone scenario • Multispectral data

    • Hyperspectral data • Downstream Products → vector and raster • Focus on • RGB data → mostly pure visualization • Downstream Products → analysis & decision making • Time series for comparison and analysis • Ad-hoc or Periodic ingestion • Heavy automated processing • Data size can be challenging
  23. Serving Drone Data • Most (all?) the previous recommendations are

    valid • Optimize data and product before serving • Use COG • Use Tile Caching • ImageMosaic is still crucial • Organize conceptually your data: same data type, same CRS, etc.. • Use ImageMosaic with dimensions: acquisition time, flightUUID, anything • Decouple # of layers from # of acquisitions • Caching is crucial
  24. Serving Drone Data • What about vector products? • Many

    products are vector datasets • We need to be able to related them to raw raster data and rater products • SQL Views & Table Partitioning to the rescue • Create a single table in Postgis • add pivot attributes to filter later like like acquisition time, flightUUID, anything • use SQL views in GeoServer to decouple # of layers from # of acquisitions • ingest directly in the database, no changes to the GeoServer configuration!
  25. Serving Drone Data • Mosaicking datastore in under development! •

    Ability to create a seamless vector store in GeoServer • which can index products stored separately (flatgeobuf?) • and works similarly to ImageMosaic to support dimensions like acquisition time, flightUUID, anything • to decouple # of layers from # of acquisitions • Simplify ingestion & reduce cost • This solution does not fit all use cases • Secondary store might not support advanced filtering
  26. Serving IOT Data • Typical Scenario • Vehicle telemetries •

    Vehicle sensors data • Seeds planting information • Huge amounts of points & trucks • Focus on • RGB data → mostly pure visualization • Continuous ingestion
  27. Serving IOT Data • SQL Views & Table Partitioning to

    the rescue • Create a single table in Postgis • add pivot attributes to filter later like like acquisition time, flightUUID, anything • use SQL views in GeoServer to decouple # of layers from # of acquisitions • ingest directly in the database, no changes to the GeoServer configuration! • Potential issues • Operating a large Postgis cluster • Cost (AWS, Azure)
  28. Serving IOT Data • ElasticSearch to the rescue • Manage

    billions of points, shard data and scale queries linearly • Add extra hardware as needed, automatic resharding • Very fast aggregation of point to build summary views (GeoHash gridding, Tile gridding) • Potential issues • Operating ES • Cost (AWS, Azure) • Mosaicking Datastore?
  29. Serving IOT Data • Rasterization to speed up serving? •

    Rasterize common visualizations • Use vector data visualization for other • Caching is crucial
  30. Serving IOT Data • SLD Service • Generate classified styles

    based on data • Methods: equalInterval, uniqueInterval, quantile, jenks, equalArea, standardDeviation • Provide custom colors • Clip by standard deviation • Works both on raster and vector • Work on large rasters by statistical sampling MapStore driving SLDService
  31. Deploying & Operating GeoServer → Resources • A DevOps perspective

    on GeoServer material • A DevOps perspective on GeoServer: Deployment planning guidelines, (video, deck, FOSS4G21) • GeoServer on Kubernetes: Set up and operate a GeoServer Cluster in K8s, (video, deck) • A DevOps perspective on GeoServer: Monitoring, Metering, Logging and Troubleshooting, (video, deck) • Our Training material • Advanced GeoServer Configuration • Enterprise Set-up Recommendations • More material • Best Practices for Optimizing Performance with GeoServer (deck, video) • Precision Farming with GeoServer, GeoSolutions Recommendations-v01.01 (beware, long document!) • GeoServer on steroids presentations from FOSS4G
  32. Cloud & GeoServer • GeoServer is not cloud-native • It

    was born when cloud meant this 🡪 • We can’t depend on any cloud provider • GeoServer is cloud-ready • It is known to run in AWS, Azure, GCP, OpenShift, IBM Cloud, etc.. • It is known to run in K8s, Rancher • It can autoscale (CPU is the resource to look at) • It can use Object Storage (Tile Cache, COG, etc..) • Prefers compute intensive instances • Likes Containers • Likes Automation! (Azure Pipelines, Jenkins, etc..)
  33. GeoServer Facts & Myths • GeoServer Data Directory • Where

    GeoServer stores configuration in files • No automatic way to pick up config changes from files • Data can live in it, but we do not recommend it in enterprise set ups • Manually messing with the configuration files is dangerous • Memory-bound configuration • GeoServer loads data configuration in memory at startup (configuration not data itself) • GeoServer exposes GUI and REST endpoints to reload config when needed • Configuration reloading does not break OGC services • Configuration reloading blocks GUI and REST API
  34. GeoServer Facts & Myths • GeoServer Needs a lot of

    memory • With properly configured data and styles the bottleneck is usually the CPU not the memory • Our reference dimensioning is 4CPU, 2 to 4 GB of HEAP • Do you have 1M+ layers? If no, 2GB is enough • Do you generate large PDF prints of PNG maps? If no, 2GB is enough • Do you have 8 or more CPUs? If no, 2GB is enough • GeoServer is slow • Are you serving a 1TB striped Bigtiff with no overviews? Are you visualizing 10M points from an Oracle table? • You have deployed a single GeoServer instance with 2 CPUs, no caching and you expect it to handle 200 req/sec? • 99% of performance issues is either data not optimized or simply trying to render too much data
  35. GeoServer Facts & Myths • Serving a large number of

    layers • Large usually mean 50k or more • Start up times / Reload times can grow (e.g. DBMS tables) • Heap Memory usage might grow • GetCapabilities documents become slow and hard to parse for clients (e.g. bloated 100MB+ files) • Partitioning with Virtual Services can help • Sharding on different instances can help
  36. Clustering • Scalability + High Availability • Scaling GeoServer •

    Scaling up to 64 cores has been proven in the past • Scaling out has been done in K8s, AWS, Azure, GCP, etc… • Our Recommendations • Scale out instances with 4CPU + 8GB Ram • Scale up instances if really needed • Autoscaling based on CPU usage is expected under load instances with 4CPU + 8GB Ram • Autoscaling based on RAM might be the effect of an issue or a big • Clustering Paradigms • Passive Clustering 🡪 GS instances ignore each other • Active Clustering 🡪 GS instances talk to each other
  37. Preferred Clustering Layout Backoffice - Production • Backoffice instance is

    for administration • Changes via GUI or via REST Interface • Can do Active/Passive • Productions instances are for data serving • No config changes • Can scale horizontally (autoscaling) • Data is centralized and shared between instances • Configuration changes requires reload • It can cover most use cases
  38. Clustering - Takeaways • GS stores its config in files

    in the data directory • GS load its config in memory at startup • GS does not automatically pick up config changes from the data directory • GS startup/reload times can be long with 10k+ layers • GS continuously write to log files • GS TileCache can work in clustering • 99% of cases clustering with backoffice-production is the way to go!
  39. Final Checklist ✔ Study/Analyze your data ✔ Study/Analyze your users/scenario

    ✔ Study/Analyze the deployment environment ✔ Study/Analyze GeoServer strengths and limitations wrt to the above (we can help here ) ✔ Prepare a deployment plan ✔ Repeat 🡪 perfection comes from practice! • Do your homework or suffer forever!
  40. Conclusions • Key element to focus on • data optimization

    • data modeling • GeoServer configuration • Favor parametric stores in GeoServer • ImageMosaic • SQL Views • After data optimization → plan caching • Favor simpler deployment → easier to scale • Automate as much as possible