Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Location-aware Documents

Location-aware Documents

Using Elasticsearch for StadtKatalog.org and the Seestadt.bot

Philipp Naderer

April 18, 2018
Tweet

More Decks by Philipp Naderer

Other Decks in Technology

Transcript

  1. How do we orient around the globe? Latitude – Breitengrad

    [-90, +90] Vienna is around 48.2° Longitude – Längengrad [-180, +180] Vienna is around 16.5° And you need a reference system! How are coordinates projected on the globe? WGS 84 EPSG:4326 WGS 84 Pseudo-Mercator EPSG:3857 … but also many others!
  2. Why different reference systems? You can buy this 3D relief

    globe in the SpaceStore: https://spacestore.co/products/false-colour-relief-earth-globe WGS 84 Local Reference System WGS84 and ETRS89 are drifting away from each other! Two points in ETRS89 will keep their distance to each other over a longer time.
  3. How to store geo data A very short introduction, you

    find all the details in the docs
  4. What geo datatypes are available? geo_shape – a shape mapped

    on a globe in WGS 84 • Point & Multi-Point LineString & Multi-LineString Polygon & Multi-Polygon (with support for holes) • Quite a lot of parameters for the mapping available, will be reduced to points_only in Elasticsearch 7 • Mostly used in queries to retrieve points inside a shape • But you might store shapes as bounding boxes ◦ If you store polygons from a GeoJSON object ◦ A shopping center must have a geo_point to map it, but also a polygon to query all shops inside
  5. Update to Elasticsearch 6 (and asap use 7) • 5.x

    the underlying Lucene index can handle numeric datatypes ◦ Lucene 6.0 introduced geo-spatial data structures ◦ Indexing ▪ <= 2.x Term-based encoding of points ▪ >= 5.x in a far more efficient Bdk-Tree ◦ “The Evolution of Numeric Range Filters in Apache Lucene” https://www.elastic.co/blog/apache-lucene-numeric-filters ◦ “Numeric and Date Ranges in Elasticsearch: Just Another Brick in the Wall” https://www.elastic.co/blog/numeric-and-date-ranges-in-elasticsearch-just-another-brick-in-the-wall • 7.x will further optimize geo shape indexing ◦ Look at the “The State of Geo in Elasticsearch” Elastic{ON} talk https://www.elastic.co/elasticon/conf/2018/sf/the-state-of-geo-in-elasticsearch
  6. Kibana Visualizations • Coordinate Maps ◦ Plot points on a

    map ◦ Alternative to Google Fusion Tables • Region Maps ◦ Map data into regions ◦ “How many users do I have all over Europe?” • Elastic Map Service in the background ◦ Basic world map ◦ Only a small set of shape layers, but at least one with ISO country codes • Use your own services ◦ WMS (not WMTS ) maps ◦ GeoJSON / TopoJSON for shape layers
  7. But enough to run a website and the Seestadt Bot

    Btw. everything is Open Data under the Open Database License (ODbL)
  8. Internal Architecture Postgres with PostGIS User Management Raw Entries Entry

    Versioning Permissions Elasticsearch 6.2 Entries Addresses
  9. Lessons Learned – Use Geo-fencing • Geo-fences are a great

    tool to limit visibility of geo-based data ◦ Seestadt-Admins should only see streets in the Seestadt geo-fence ◦ Users reporting incorrect data via the bot should only see suggestions from their neighborhood • Defined as geo_shape polygon ◦ You can even use holes (Vienna vs. Lower Austria) ◦ If the Seestadt grows, just increase the geo-fence to the new areas • Stick to one single definition standard ◦ Counterclockwise oriented definition of the polygon ◦ Closed polygon whose first and last point must match
  10. Lessons Learned – Use Open Data Address Services • Enforce

    valid and standardized addresses ◦ Währinger Straße – Währingerstrasse – Währingerstraße ◦ Autocomplete all address inputs • Addresses are managed by municipals (Gemeinden) ◦ Open Data: „Adressen Standorte Wien“ https://www.data.gv.at/katalog/dataset/1d5c2411-9719-4c8f-b99d-57a5f4a4ae41 ◦ Public Sector Infomation: BEV “Österreichisches Adressregister” http://www.bev.gv.at/portal/page?_pageid=713,2170374&_dad=portal&_schema=PORTAL • Enables you to geo-code existing data ◦ Used in the StadtKatalog crawler to import Spar / Libro / dm
  11. Lessons Learned – Addresses are complicated … • An address

    has exactly one ◦ Street Name ◦ ONR – Orientierungsnummer ▪ Simple number 1 or a range 1–7 ▪ „Stiegen“ are not consistent and can be defined by the owner • Can be assigned clockwise or counterclockwise • A / B / C • 1 / 2 / 3 • A2/ A3 / C1 / C2 • But … Praterstern Bahnhof ◦ Did you know that all shops in the station “Praterstern” have no ONR? ◦ Their address is just “Praterstern” or “Bahnhof Praterstern”
  12. Lessons Learned – Context Suggester Suggesters can only filter based

    on geohashes, not on geo shapes … … but geo-fences are shapes, not hashes / boxes
  13. Lessons Learned – Locations are relative … • Precision Errors

    • Conversion Errors between Reference Systems • Different Maps, Different Positions for Streets ◦ Google Maps ◦ OpenStreetMap ◦ Basemap.at
  14. What is GTFS? • It’s not an API • GTFS

    Static vs. GTFS Realtime ◦ … but Wiener Linien provide you a realtime API • Standardized format to describe public transport ◦ CSV-based ◦ UTF-8 with or without BOM ◦ Raw data, you have to process everything … ◦ Well documented • Used by Google for Google Maps • There exist open source parsers and APIs
  15. How can we use GTFS in Elasticsearch? 1. Parse all

    stops 2. For each stop: a. Filter out all trips that run via this stop b. For each trip: i. Find which route the trip belongs to c. Look which service times a trip have (Monday-only, weekdays or weekend?) 3. Index the denormalized stop time for all stops 1. Look for current departure times for the stop 2. Check if there is no service exception for the current departure time a. Holidays might have special service times and some trips will not run