$30 off During Our Annual Pro Sale. View Details »

Supporting precision farming with GeoServer: past experiences and way forward

Supporting precision farming with GeoServer: past experiences and way forward

This presentation will condense 10 years of GeoSolutions in ingesting, managing and disseminate data at scale in the cloud with GeoServer for the precision farming industry covering items like:
- Proper optimizations and organization of raster data
- Proper optimizations and organization of vector data
- Modeling data for performance & scalability in GeoServer and PostGIS
- Deployment guidelines for performance and scaling GeoServer
- Styling to create NDVI and other visualizations on the fly

At the end of the presentation the attendees will be able to design and plan properly a GeoServer deployment to serve precision farming data at scale.

Simone Giannecchini
PRO

August 29, 2022
Tweet

More Decks by Simone Giannecchini

Other Decks in Technology

Transcript

  1. Simone Giannecchini
    Andrea Aime
    GeoSolutions
    Supporting precision farming
    with GeoServer
    Past experiences and way forward

    View Slide

  2. GeoSolutions
    Enterprise Support
    Services
    Deployment
    Subscription
    Professional
    Training
    Customized
    Solutions
    GeoNode
    • Offices in Italy & US, Global Clients/Team
    • 40+ collaborators, 30+ Engineers
    • Our products
    • Our Offer

    View Slide

  3. Affiliations
    We strongly support Open
    Source, it Is in our core
    We actively participate in
    OGC working groups and
    get funded to advance new
    open standards
    We support standards
    critical to GEOINT

    View Slide

  4. Setting the context

    View Slide

  5. GeoServer

    GeoSpatial enterprise gateway
    − Modular & extensible
    − Management & dissemination of raster and vector data

    Cloud/ Big Data friendly
    − Cluster deployments in AWS, Azure, On Premise
    − Powering petabytes-size data (EO, MetOcean, etc.)
    − Serving up to 10k+ requests per second

    Standards compliant
    − OGC WCS 1.0, 1.1.1 (RI), 2.0.1
    − OGC WFS 1.0, 1.1 (RI), 2.0
    − OGC WMS 1.1.1, 1.3
    − OGC WPS 1.0.0
    − OGC CSW 2.0.2

    Google Earth/Maps support

    License is GPL v2.0
    More Information

    View Slide

  6. GeoServer
    GeoServer
    WFS
    WMS
    PostGIS
    Oracle
    H2
    DB2
    SQL Server
    GeoPackag
    e
    MySql
    Spatialite
    Elastic
    MongoDB
    Shapefile ----------
    ----------
    ---------
    ----------
    ----------
    ----------
    ---------
    ----------
    ----------
    ----------
    ---------
    ----------
    WFS
    WMS
    WMTS
    PNG, GIF
    JPEG
    TIFF,
    GeoTIFF
    SVG, PDF
    KML/KMZ
    Shapefile
    GML2
    GML3
    GeoRSS
    GeoJSON
    CSV/XLS
    GeoPackage
    Raw vector
    data
    Servers
    Styled
    maps
    DBMS
    Vector files
    WCS
    GeoTIFF
    WMS
    ArcGrid
    Img+world
    Mosaic
    MrSID
    JPEG 2000
    ECW,Pyrami
    d, NetCDF,... Raster files
    Raw raster
    data
    GeoTIFF
    ArcGrid
    GTopo30
    Img+World
    WMTS,
    TMS
    KML superoverlays
    Google maps tiles
    OGC tiles
    OSGEO tiles
    KML
    WPS
    CSW
    ESRI
    REST

    View Slide

  7. Key Concepts
    • Supporting farming decision makers
    • with consuming various types of data
    • EO Data
    • Drone Imagery
    • Field Sensors (e.g. Meteo Stations)
    • Vehicles Positions & Sensors
    • Meteorological Models
    • More (not related to our work)
    • from different deployments
    • (Private|Public) Cloud
    • On Prem

    View Slide

  8. Key Concepts
    • via Heterogeneous types of platforms
    • All-in-one DAAS solution to farmers
    • EO focused platform
    • IOT Focused platform for vehicle data
    • IOT Focused platform for field sensors
    • Various combinations of the above
    • setup by entities with diff bkgs & objectives
    • Large Corp. focusing solely on Precision Farming
    • Subsidiaries|Branches of large Corp. focusing solely
    on Precision Farming
    • Startups focusing solely on Precision Farming

    View Slide

  9. Key Concepts
    • Challenging environments to support
    • Lots of different data
    • Continuously ingested
    • & processed into new data, alerts, charts, etc..
    • We are going to address some challenges &
    Scenarios
    • Serving EO Data
    • Serving Drone Data
    • Serving IOT Like Data
    • Deploying & Operating GeoServer

    View Slide

  10. Serving EO Data

    View Slide

  11. Serving EO Data
    • Typical EO scenario
    • Multispectral data (Sentinel, Landsat, Planet, etc…)
    • Hyperspectral data (Hyperion, EnMap, etc..)
    • Rarely SAR data
    • Focus on
    • RGB data → mostly pure visualization
    • Indexes → mostly band algebra like NDVI and friends
    • Deep time series for comparison and analysis
    • Continuous ingestion is crucial

    View Slide

  12. Serving EO Data
    Hyperion
    4 years MSAVI Sentinel
    2 Time Series
    Multiple Indexes with
    Sentinel, Landsat and
    Planet Data

    View Slide

  13. • Poor Data Preprocessing|Optimization
    • GeoServer is slow at serving index XXX out of my
    uncompressed, untiled, multiband geotiff, why?
    • Rule #1 → preprocess|optimize data at ingestion
    • Rule #2 → preprocess|optimize data at ingestion (not a
    typo, just reinforcing)
    • Hint #1 → GDAL is your friend|swiss knife
    • Hint #2 → Automate! Airflow, Lambda, etc..
    • Suggestions
    • GeoTiff is usually the best format (COG later on) and it is free!
    • Compression, Tiling and Overviews are crucial → spend some
    early CPU cycle to get performance later on
    • Do not fear lossy compression when possible
    • Do not be afraid of sacrificing space for performance if needed
    Typical Mistakes and how to avoid them

    View Slide

  14. Typical Mistakes and how to avoid them
    • Resources on data preprocessing
    • Our training material
    • Best Practices for Optimizing Performance with
    GeoServer (deck, video)
    • A DevOps perspective on GeoServer: Deployment
    planning guidelines, (video, deck, FOSS4G2021)
    • Precision Farming with GeoServer, GeoSolutions
    Recommendations-v01.01 (beware, long document!)
    • GeoServer on steroids presentations from FOSS4G
    • A ton of GDAL related material you can find on the
    web!

    View Slide

  15. • Poor data organization in GeoServer
    • My GeoServer does not scale as I expected, despite
    proper data preprocessing. It takes minutes to (re)start,
    why?
    • Fact #1 → if you have thousands of layers, you need to
    reorganize data quickly
    • Fact #2 → if you have millions of layers, you are doomed!
    • Suggestions
    • Conceptually organize EO data by common sensor, data type, etc..
    • Map this to multidimensional ImageMosaic → break the 1 scene 1
    layer circle
    • Use REST API to automate data management → no need to reload
    configuration, can use simple passive clustering
    • 99% of the time all you need are a handful (<100)
    of multidimensional ImageMosaic layers!
    Typical Mistakes and how to avoid them

    View Slide

  16. • Resources
    • A DevOps perspective on GeoServer: Deployment
    planning guidelines, (video, deck)
    • Precision Farming with GeoServer, GeoSolutions
    Recommendations-v01.01 (beware, long document!)
    Typical Mistakes and how to avoid them

    View Slide

  17. • Poor caching
    • My GeoServer does not scale as I expected, despite
    proper data preprocessing and organization, why?
    • Rule number 1 → preprocess|optimize data at
    ingestion (not a typo, just reinforcing)
    • Rule number 2 → introduce tile caching
    • Rule number 3 → introduce HTTP caching (if possible)
    • Suggestions
    • Once data is optimized and all, focus on caching, not before!
    • Tile Caching comes at a cost, make sure you can reuse tiles
    multiple times otherwise avoid caching
    • HTTP Caching is super important but also tricky, evaluate carefully
    the impact before adopting
    • Tile|HTTP Caching works nicely with URLs whose content
    does not change over time
    Typical Mistakes and how to avoid them

    View Slide

  18. • Resources
    • A DevOps perspective on GeoServer: Deployment
    planning guidelines, (video, deck)
    • Precision Farming with GeoServer, GeoSolutions
    Recommendations-v01.01 (beware, long document!)
    Typical Mistakes and how to avoid them

    View Slide

  19. More Goodies: COG
    • Cloud Optimize Geotiff
    • EFS costs are killing us, what can we do?
    • COG is (usually) the answer
    • GeoServer supports COG as single GeoTiff or
    ImageMosaic
    • We support S3, Google Cloud Storage and (soon)
    Azure
    • Missing bits
    • Local caching
    • Support for Binary Masks
    • Before you ask
    • Object Storage gives you better scalability at a fraction
    of the price → you might sacrifice some raw speed!

    View Slide

  20. More Goodies: COG
    • Resources
    • Modern Cloud Geospatial Architectures Survey
    • Best Practices for Optimizing Performance with
    GeoServer (deck, video)
    • Serving earth observation data with GeoServer: COG,
    STAC, OpenSearch and more
    • GeoServer docs
    • Many other resources on the web

    View Slide

  21. More Goodies: ImageMosaic & Time Series

    About data

    Data is added or removed to GS

    Data is rarely updated

    E.g. data from EO, MetOc, Drones, IOT Sensors

    Refrain from creating millions of layers, it won’t scale 🡪
    organize your data properly

    Use ImageMosaic for rasters if possible

    Use TIME and other dimensions

    Use CQL_Filter

    Keep # of layers low if not constant

    Manager data via REST API 🡪 no configuration reload

    Use single tables for vector if possible

    Use TIME and other dimensions

    Use CQL_Filter or Parametric SQL View

    Keep # of layers low if not constant as you ingest data

    View Slide

  22. More Goodies: ImageMosaic & Time Series
    • Resources
    • Best Practices for Optimizing Performance with
    GeoServer (deck, video)
    • Serving earth observation data with GeoServer: COG,
    STAC, OpenSearch and more
    • Our training material
    • Precision Farming with GeoServer, GeoSolutions
    Recommendations-v01.01 (beware, long document!)
    • GeoServer on steroids presentations from FOSS4G

    View Slide

  23. More Goodies: Map Algebra with Jiffle
    • Map algebra for GeoServer
    • Simple expressions, but also decisions, loops, arrays,
    variables
    • Reduce time to publishing
    • Compute products on the fly
    • Add new products without writing code
    • Docs
    • Use it in
    • Straight WPS calculation
    • As part of styles
    • Large computed downloads (also WPS)

    View Slide

  24. More Goodies: Map Algebra with Jiffle
    • Normalized Differential Vegetation Index
    (Hyperion)



    coverage


    script

    nir = src[46];
    vir = src[31];
    dest = (nir - vir) / (nir + vir);






    1.0


    ….




    View Slide

  25. Serving Drone Data

    View Slide

  26. Serving Drone Data
    • Typical Drone scenario
    • Multispectral data
    • Hyperspectral data
    • Downstream Products → vector and raster
    • Focus on
    • RGB data → mostly pure visualization
    • Downstream Products → analysis & decision making
    • Time series for comparison and analysis
    • Ad-hoc or Periodic ingestion
    • Heavy automated processing
    • Data size can be challenging

    View Slide

  27. Serving Drone Data
    • Most (all?) the previous recommendations are
    valid
    • Optimize data and product before serving
    • Use COG
    • Use Tile Caching
    • ImageMosaic is still crucial
    • Organize conceptually your data: same data type,
    same CRS, etc..
    • Use ImageMosaic with dimensions: acquisition time,
    flightUUID, anything
    • Decouple # of layers from # of acquisitions
    • Caching is crucial

    View Slide

  28. Serving Drone Data
    • What about vector products?
    • Many products are vector datasets
    • We need to be able to related them to raw raster data
    and rater products
    • SQL Views & Table Partitioning to the rescue
    • Create a single table in Postgis
    • add pivot attributes to filter later like like acquisition
    time, flightUUID, anything
    • use SQL views in GeoServer to decouple # of layers
    from # of acquisitions
    • ingest directly in the database, no changes to the
    GeoServer configuration!

    View Slide

  29. Serving Drone Data
    • Mosaicking datastore in under development!
    • Ability to create a seamless vector store in GeoServer
    • which can index products stored separately
    (flatgeobuf?)
    • and works similarly to ImageMosaic to support
    dimensions like acquisition time, flightUUID, anything
    • to decouple # of layers from # of acquisitions
    • Simplify ingestion & reduce cost
    • This solution does not fit all use cases
    • Secondary store might not support advanced filtering

    View Slide

  30. Serving Drone Data

    View Slide

  31. Serving IOT Like Data

    View Slide

  32. Serving IOT Data
    • Typical Scenario
    • Vehicle telemetries
    • Vehicle sensors data
    • Seeds planting information
    • Huge amounts of points & trucks
    • Focus on
    • RGB data → mostly pure visualization
    • Continuous ingestion

    View Slide

  33. Serving IOT Data
    • SQL Views & Table Partitioning to the rescue
    • Create a single table in Postgis
    • add pivot attributes to filter later like like acquisition
    time, flightUUID, anything
    • use SQL views in GeoServer to decouple # of layers
    from # of acquisitions
    • ingest directly in the database, no changes to the
    GeoServer configuration!
    • Potential issues
    • Operating a large Postgis cluster
    • Cost (AWS, Azure)

    View Slide

  34. Serving IOT Data
    • ElasticSearch to the rescue
    • Manage billions of points, shard data and scale queries
    linearly
    • Add extra hardware as needed, automatic resharding
    • Very fast aggregation of point to build summary views
    (GeoHash gridding, Tile gridding)
    • Potential issues
    • Operating ES
    • Cost (AWS, Azure)
    • Mosaicking Datastore?

    View Slide

  35. Serving IOT Data
    • Rasterization to speed up serving?
    • Rasterize common visualizations
    • Use vector data visualization for other
    • Caching is crucial

    View Slide

  36. Serving IOT Data
    • SLD Service
    • Generate classified
    styles based on data
    • Methods: equalInterval,
    uniqueInterval, quantile,
    jenks, equalArea,
    standardDeviation
    • Provide custom colors
    • Clip by standard deviation
    • Works both on raster and
    vector
    • Work on large rasters by
    statistical sampling MapStore driving SLDService

    View Slide

  37. Deploying & Operating GeoServer

    View Slide

  38. Deploying & Operating GeoServer → Resources
    • A DevOps perspective on GeoServer material
    • A DevOps perspective on GeoServer: Deployment
    planning guidelines, (video, deck, FOSS4G21)
    • GeoServer on Kubernetes: Set up and operate a
    GeoServer Cluster in K8s, (video, deck)
    • A DevOps perspective on GeoServer: Monitoring,
    Metering, Logging and Troubleshooting, (video, deck)
    • Our Training material
    • Advanced GeoServer Configuration
    • Enterprise Set-up Recommendations
    • More material
    • Best Practices for Optimizing Performance with
    GeoServer (deck, video)
    • Precision Farming with GeoServer, GeoSolutions
    Recommendations-v01.01 (beware, long document!)
    • GeoServer on steroids presentations from FOSS4G

    View Slide

  39. Cloud & GeoServer

    GeoServer is not cloud-native

    It was born when cloud meant this 🡪

    We can’t depend on any cloud provider

    GeoServer is cloud-ready

    It is known to run in AWS, Azure, GCP, OpenShift, IBM
    Cloud, etc..

    It is known to run in K8s, Rancher

    It can autoscale (CPU is the resource to look at)

    It can use Object Storage (Tile Cache, COG, etc..)

    Prefers compute intensive instances

    Likes Containers

    Likes Automation! (Azure Pipelines, Jenkins, etc..)

    View Slide

  40. GeoServer Facts & Myths

    GeoServer Data Directory

    Where GeoServer stores configuration in files

    No automatic way to pick up config changes from files

    Data can live in it, but we do not recommend it in
    enterprise set ups

    Manually messing with the configuration files is
    dangerous

    Memory-bound configuration

    GeoServer loads data configuration in memory at
    startup (configuration not data itself)

    GeoServer exposes GUI and REST endpoints to reload
    config when needed

    Configuration reloading does not break OGC services

    Configuration reloading blocks GUI and REST API

    View Slide

  41. GeoServer Facts & Myths

    GeoServer Needs a lot of memory

    With properly configured data and styles the bottleneck
    is usually the CPU not the memory

    Our reference dimensioning is 4CPU, 2 to 4 GB of HEAP

    Do you have 1M+ layers? If no, 2GB is enough

    Do you generate large PDF prints of PNG maps? If no,
    2GB is enough

    Do you have 8 or more CPUs? If no, 2GB is enough

    GeoServer is slow

    Are you serving a 1TB striped Bigtiff with no overviews?
    Are you visualizing 10M points from an Oracle table?

    You have deployed a single GeoServer instance with 2
    CPUs, no caching and you expect it to handle 200
    req/sec?

    99% of performance issues is either data not optimized
    or simply trying to render too much data

    View Slide

  42. GeoServer Facts & Myths

    Serving a large number of layers

    Large usually mean 50k or more

    Start up times / Reload times can grow (e.g. DBMS
    tables)

    Heap Memory usage might grow

    GetCapabilities documents become slow and hard to
    parse for clients (e.g. bloated 100MB+ files)

    Partitioning with Virtual Services can help

    Sharding on different instances can help

    View Slide

  43. Clustering

    Scalability + High Availability

    Scaling GeoServer

    Scaling up to 64 cores has been proven in the past

    Scaling out has been done in K8s, AWS, Azure, GCP,
    etc…

    Our Recommendations

    Scale out instances with 4CPU + 8GB Ram

    Scale up instances if really needed

    Autoscaling based on CPU usage is expected under
    load instances with 4CPU + 8GB Ram

    Autoscaling based on RAM might be the effect of an
    issue or a big

    Clustering Paradigms

    Passive Clustering 🡪 GS instances ignore each other

    Active Clustering 🡪 GS instances talk to each other

    View Slide

  44. Preferred Clustering Layout
    Backoffice - Production

    Backoffice instance is for administration

    Changes via GUI or via REST Interface

    Can do Active/Passive

    Productions instances are for data serving

    No config changes

    Can scale horizontally (autoscaling)

    Data is centralized and shared between instances

    Configuration changes requires reload

    It can cover most use cases

    View Slide

  45. Clustering - Takeaways
    ● GS stores its config in files in the data directory
    ● GS load its config in memory at startup
    ● GS does not automatically pick up config changes
    from the data directory
    ● GS startup/reload times can be long with 10k+
    layers
    ● GS continuously write to log files
    ● GS TileCache can work in clustering
    ● 99% of cases clustering with
    backoffice-production is the way to go!

    View Slide

  46. Final Checklist
    ✔ Study/Analyze your data
    ✔ Study/Analyze your users/scenario
    ✔ Study/Analyze the deployment
    environment
    ✔ Study/Analyze GeoServer strengths and
    limitations wrt to the above (we can help
    here )
    ✔ Prepare a deployment plan
    ✔ Repeat 🡪 perfection comes from
    practice!
    • Do your homework or suffer forever!

    View Slide

  47. Conclusions

    View Slide

  48. Conclusions
    • Key element to focus on
    • data optimization
    • data modeling
    • GeoServer configuration
    • Favor parametric stores in GeoServer
    • ImageMosaic
    • SQL Views
    • After data optimization → plan caching
    • Favor simpler deployment → easier to scale
    • Automate as much as possible

    View Slide

  49. View Slide