Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Geospatial Data Processing using Rust

Geospatial Data Processing using Rust

FOSS4G Japan 2025

Avatar for Keitaroh Kobayashi

Keitaroh Kobayashi

October 10, 2025
Tweet

Other Decks in Technology

Transcript

  1. INTRODUCTION • What is Rust? • Safety and Performance •

    Multithreaded Processing • Geospatial Processing • Examples Keita Kobayashi, Founder KotobaMedia Born Tokyo, raised US, living in Yakushima First GIS app: a tool to tell me if I need to run or walk to the bus stop for university. Development & cloud operations experience with PHP, Ruby on Rails, Python, JavaScript / TypeScript. Now doing GIS data processing, product development. 90%+ Rust. https://keita.blog/foss4g2025jp Example code available!
  2. WHAT IS RUST? • “Safety” / “Memory Safe” • “Fast”

    / “Efficient” / “Low-level systems language” • “Usability” / “Modern” / “Expressive”
  3. SAFETY = PERFORMANCE • Compiler enforced safety means fearless concurrency

    • Shared memory concerns (data races) are completely solved – while holding a reference, you know it’s valid and won’t change under your feet. Only one mutable reference can be held at a time. • Implemented via types • Arc<T> for shared data, Mutex<T> to guard mutable data across threads • Shared mutex? Arc<Mutex<T>> • RwLock<T> - one mutable reference or multiple read-only references
  4. ECOSYSTEM – “CRATES” • CLI essentials • Argument parsing –

    clap, progress bars – indicatif • Async / concurrency • Cooperative async runtime – tokio, easy threaded parallel iteration – rayon, cross-thread communication – crossbeam • Data parsing • Serde and family – serde_json, toml, messagepack, serde_qs (query string) • GeoRust – GIS functions • Geo, GEOS / PROJ / GDAL (bindings), Rstar (R*-tree index) • Formats: GeoJSON, GPX, GeoTIFF, KML, OSM, Shapefile, TileJSON, GTFS, WKT, WKB • More in GeoZero (a little hard to work with)
  5. LEARNING CURVE • Pretty high • Before doing Rust full-time,

    I had been doing TypeScript. From there, it felt relatively natural. • No escape hatch to JavaScript, though! • Dabbled in C & C++ before, but manual memory management and other hazards turned me off. • Erlang / Elixir was attractive, but Rust’s expressive type system and performance won me over. • Erlang “everything is messages” is a good thing to learn – this applies to Rust as well. Or any language when you have parallel processing
  6. MULTITHREADED PROCESSING • Async vs Sync • Async – network

    I/O. Sync – CPU work, file I/O. • Async is cooperative multitasking – similar to async/await on NodeJS. • Doing blocking tasks on async code can prevent other async code from running! • Synchronous code is the majority of heavy processing. • Rayon is usually a good choice. • Async 㲗 ︎ Sync • Message passing
  7. GEOSPATIAL PROCESSING • Vector Processing • Use Geo – Rust

    equivalent of TurfJS • When you need more, you can always use GEOS bindings • Lose WASM compatibility • Static builds with the “static” feature, but beware of GEOS LGPL requirements • Use rstar for indexing • Raster Processing • Use gdal bindings • Some native crates, but not that many • Want to try doing some transforms on the GPU using wgpu sometime Just use Geo
  8. MOJXML-RS Converts the Ministry of Justice “Chizu XML” format Input

    XML Files (Zipped) Unzip Parse Output Parallel Serial https://github.com/KotobaMedia/mojxml-rs
  9. MOJXML-RS https://github.com/KotobaMedia/mojxml-rs Parse • Uses roxmltree (“read-only XML tree”) to

    parse the document • Creates a struct for properties • Parses the topology parameters, transform to WGS84, returns a geo::Polygon • Returns the common properties for all features in file, individual features with their properties and geometries
  10. MOJXML-RS https://github.com/KotobaMedia/mojxml-rs Parse • Uses roxmltree (“read-only XML tree”) to

    parse the document • Creates a struct for properties • Parses the topology parameters, transform to WGS84, returns a geo::Polygon • Returns the common properties for all features in file, individual features with their properties and geometries
  11. MOJXML-RS https://github.com/KotobaMedia/mojxml-rs Parse • Uses roxmltree (“read-only XML tree”) to

    parse the document • Creates a struct for properties • Parses the topology parameters, transform to WGS84, returns a geo::Polygon • Returns the common properties for all features in file, individual features with their properties and geometries
  12. MOJXML-RS https://github.com/KotobaMedia/mojxml-rs • Currently supports FlatGeobuf, GeoParquet, ND-GeoJSON outputs •

    Uses a trait for output format polymorphism • Each FGB, GeoParquet, GeoJSON has its own output adapter, but the interface is the same Output
  13. MOJXML-RS https://github.com/KotobaMedia/mojxml-rs • Currently supports FlatGeobuf, GeoParquet, ND-GeoJSON outputs •

    Uses a trait for output format polymorphism • Each FGB, GeoParquet, GeoJSON has its own output adapter, but the interface is the same Output
  14. MOJXML-RS https://github.com/KotobaMedia/mojxml-rs • Currently supports FlatGeobuf, GeoParquet, ND-GeoJSON outputs •

    Uses a trait for output format polymorphism • Each FGB, GeoParquet, GeoJSON has its own output adapter, but the interface is the same Output
  15. MVT-WRANGLER Performs post-processing on Mapbox Vector Tiles in PMTiles archives

    Input PMTiles Process Output Each tile processed in parallel Serial Re-ordered https://github.com/KotobaMedia/mvt-wrangler Filter File Compile, Index
  16. • GeoJSON with filter definitions. Very simple DSL for doing

    small patches • Compiles the filters to an internal representation – regex, etc. Looks very similar to Maplibre Filter Syntax • Uses the rstar library to create an index of the filter polygons • Each tile, we query the index for filter polygons that intersect the tile MVT-WRANGLER Filter File Compile, Index Removes any feature in the “pois” layer
  17. • GeoJSON with filter definitions. Very simple DSL for doing

    small patches • Compiles the filters to an internal representation – regex, etc. Looks very similar to Maplibre Filter Syntax • Uses the rstar library to create an index of the filter polygons • Each tile, we query the index for filter polygons that intersect the tile MVT-WRANGLER Filter File Compile, Index Removes any feature in the “pois” layer Removes features in the “buildings” layer where the tag “kind” is equal to
  18. • GeoJSON with filter definitions. Very simple DSL for doing

    small patches • Compiles the filters to an internal representation – regex, etc. Looks very similar to Maplibre Filter Syntax • Uses the rstar library to create an index of the filter polygons • Each tile, we query the index for filter polygons that intersect the tile MVT-WRANGLER Filter File Compile, Index Removes any feature in the “pois” layer Removes features in the “buildings” layer where the tag “kind” is equal to Removes any tags in any layer (“*”) where the key starts with “pgf:name:”, or if the key starts with “name”, the name regex capture doesn’t match
  19. MVT-WRANGLER Input PMTiles https://github.com/KotobaMedia/mvt-wrangler • Create a Vec (array) of

    all referenced tiles • Iterate over them, and send them to the “tile coordinate channel” • We add an index here for reordering on the output end • The tile fetching task takes coordinates off this channel, fetches the requested tile, and sends the tile data to the “tile processing channel”
  20. MVT-WRANGLER https://github.com/KotobaMedia/mvt-wrangler • Pipes the “tile processing channel” to Rayon

    • Use into_iter() to create an iterator, then par_bridge() to parallelize it. • Transforms the tile • Compresses the tile • compression is CPU heavy, but pmtiles requires a single-thread write (in order). For high-performance pmtiles output, always compress in parallel! • Sends the compressed tile data to the “output channel” Process
  21. MVT-WRANGLER https://github.com/KotobaMedia/mvt-wrangler • Pipes the “tile processing channel” to Rayon

    • Use into_iter() to create an iterator, then par_bridge() to parallelize it. • Transforms the tile • Compresses the tile • compression is CPU heavy, but pmtiles requires a single-thread write (in order). For high-performance pmtiles output, always compress in parallel! • Sends the compressed tile data to the “output channel” Process
  22. MVT-WRANGLER https://github.com/KotobaMedia/mvt-wrangler • Takes tiles off the “output channel” •

    Re-orders them • Uses a BTreeMap (ordered map) to buffer output tiles in order • Sends them to the pmtiles-rs library to output them Output
  23. APPENDIX: GEO WKT MACRO Rust uses macros for metaprogramming –

    the geo create has a wkt! macro for compile-time WKT parsing
  24. APPENDIX: MOJXML-RS WEBASSEMBLY Use wasm-bindgen and wasm-pack to compile and

    connect your Rust code to JavaScript. Generates TS definitions. https://wasm-bindgen.github.io/wasm-bindgen/ https://kotobamedia.github.io/mojxml-rs/ https://drager.github.io/wasm-pack/
  25. WRAP-UP • Memory safety translates to easy parallelism • Expressive

    type system leads to good ergonomics without compromising performance • GIS ecosystem is growing fast • “Escape hatches” exist – bindings to C libraries or even running subprocesses Try the example code ↓ https://keita.blog/foss4g2025jp