Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Location-aware Documents

Location-aware Documents

Using Elasticsearch for StadtKatalog.org and the Seestadt.bot

Philipp Naderer

April 18, 2018

More Decks by Philipp Naderer

Other Decks in Technology


  1. Location-aware Documents Philipp Naderer-Puiu

  2. www.StadtKatalog.org docs.StadtKatalog.org m.me/SeestadtBot t.me/SeestadtBot

  3. Geo Data Basics Disclaimer: I am a web developer, not

    a GIS expert!
  4. How do we orient around the globe? Latitude – Breitengrad

    [-90, +90] Vienna is around 48.2° Longitude – Längengrad [-180, +180] Vienna is around 16.5° And you need a reference system! How are coordinates projected on the globe? WGS 84 EPSG:4326 WGS 84 Pseudo-Mercator EPSG:3857 … but also many others!
  5. Why different reference systems? You can buy this 3D relief

    globe in the SpaceStore: https://spacestore.co/products/false-colour-relief-earth-globe WGS 84 Local Reference System WGS84 and ETRS89 are drifting away from each other! Two points in ETRS89 will keep their distance to each other over a longer time.
  6. How to store geo data A very short introduction, you

    find all the details in the docs
  7. What geo datatypes are available? geo_point – latitude & longitude

  8. What geo datatypes are available? geo_shape – a shape mapped

    on a globe in WGS 84 • Point & Multi-Point LineString & Multi-LineString Polygon & Multi-Polygon (with support for holes) • Quite a lot of parameters for the mapping available, will be reduced to points_only in Elasticsearch 7 • Mostly used in queries to retrieve points inside a shape • But you might store shapes as bounding boxes ◦ If you store polygons from a GeoJSON object ◦ A shopping center must have a geo_point to map it, but also a polygon to query all shops inside
  9. Update to Elasticsearch 6 (and asap use 7) • 5.x

    the underlying Lucene index can handle numeric datatypes ◦ Lucene 6.0 introduced geo-spatial data structures ◦ Indexing ▪ <= 2.x Term-based encoding of points ▪ >= 5.x in a far more efficient Bdk-Tree ◦ “The Evolution of Numeric Range Filters in Apache Lucene” https://www.elastic.co/blog/apache-lucene-numeric-filters ◦ “Numeric and Date Ranges in Elasticsearch: Just Another Brick in the Wall” https://www.elastic.co/blog/numeric-and-date-ranges-in-elasticsearch-just-another-brick-in-the-wall • 7.x will further optimize geo shape indexing ◦ Look at the “The State of Geo in Elasticsearch” Elastic{ON} talk https://www.elastic.co/elasticon/conf/2018/sf/the-state-of-geo-in-elasticsearch
  10. Kibana Visualizations Great way to debug your stuff!

  11. Kibana Visualizations • Coordinate Maps ◦ Plot points on a

    map ◦ Alternative to Google Fusion Tables • Region Maps ◦ Map data into regions ◦ “How many users do I have all over Europe?” • Elastic Map Service in the background ◦ Basic world map ◦ Only a small set of shape layers, but at least one with ISO country codes • Use your own services ◦ WMS (not WMTS ) maps ◦ GeoJSON / TopoJSON for shape layers
  12. Coordinate Map Example

  13. Region Map Example

  14. Practical Part I:

  15. First Iteration: JSON Files on Github

  16. But enough to run a website and the Seestadt Bot

    Btw. everything is Open Data under the Open Database License (ODbL)
  17. Next Generation: StadtKatalog.org

  18. Next Generation: StadtKatalog.org

  19. Internal Architecture Postgres with PostGIS User Management Raw Entries Entry

    Versioning Permissions Elasticsearch 6.2 Entries Addresses
  20. Lessons Learned – Use Geo-fencing • Geo-fences are a great

    tool to limit visibility of geo-based data ◦ Seestadt-Admins should only see streets in the Seestadt geo-fence ◦ Users reporting incorrect data via the bot should only see suggestions from their neighborhood • Defined as geo_shape polygon ◦ You can even use holes (Vienna vs. Lower Austria) ◦ If the Seestadt grows, just increase the geo-fence to the new areas • Stick to one single definition standard ◦ Counterclockwise oriented definition of the polygon ◦ Closed polygon whose first and last point must match
  21. Lessons Learned – Avoid Boxes as Geo-fence

  22. Lessons Learned – Use Open Data Address Services • Enforce

    valid and standardized addresses ◦ Währinger Straße – Währingerstrasse – Währingerstraße ◦ Autocomplete all address inputs • Addresses are managed by municipals (Gemeinden) ◦ Open Data: „Adressen Standorte Wien“ https://www.data.gv.at/katalog/dataset/1d5c2411-9719-4c8f-b99d-57a5f4a4ae41 ◦ Public Sector Infomation: BEV “Österreichisches Adressregister” http://www.bev.gv.at/portal/page?_pageid=713,2170374&_dad=portal&_schema=PORTAL • Enables you to geo-code existing data ◦ Used in the StadtKatalog crawler to import Spar / Libro / dm
  23. Lessons Learned – Addresses are complicated … • An address

    has exactly one ◦ Street Name ◦ ONR – Orientierungsnummer ▪ Simple number 1 or a range 1–7 ▪ „Stiegen“ are not consistent and can be defined by the owner • Can be assigned clockwise or counterclockwise • A / B / C • 1 / 2 / 3 • A2/ A3 / C1 / C2 • But … Praterstern Bahnhof ◦ Did you know that all shops in the station “Praterstern” have no ONR? ◦ Their address is just “Praterstern” or “Bahnhof Praterstern”
  24. Lessons Learned – Streets can reach out of a geo-fence

  25. Lessons Learned – Context Suggester Suggesters can only filter based

    on geohashes, not on geo shapes … … but geo-fences are shapes, not hashes / boxes
  26. Lessons Learned – Context Suggester

  27. Lessons Learned – Locations are relative … • Precision Errors

    • Conversion Errors between Reference Systems • Different Maps, Different Positions for Streets ◦ Google Maps ◦ OpenStreetMap ◦ Basemap.at
  28. Practical Part II: GTFS

  29. What is GTFS? • It’s not an API • GTFS

    Static vs. GTFS Realtime ◦ … but Wiener Linien provide you a realtime API • Standardized format to describe public transport ◦ CSV-based ◦ UTF-8 with or without BOM ◦ Raw data, you have to process everything … ◦ Well documented • Used by Google for Google Maps • There exist open source parsers and APIs
  30. How can we use GTFS in Elasticsearch? 1. Parse all

    stops 2. For each stop: a. Filter out all trips that run via this stop b. For each trip: i. Find which route the trip belongs to c. Look which service times a trip have (Monday-only, weekdays or weekend?) 3. Index the denormalized stop time for all stops 1. Look for current departure times for the stop 2. Check if there is no service exception for the current departure time a. Holidays might have special service times and some trips will not run
  31. All stops have a location

  32. Wiener Linien Naming Convention …

  33. Bad News Works as a prototype, but not suitable for

    real users …
  34. None
  35. Merci. Philipp Naderer-Puiu [email protected]