Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Day1-1410-Challenges in geonames and address extraction

sotm2017
September 01, 2017

Day1-1410-Challenges in geonames and address extraction

sotm2017

September 01, 2017
Tweet

More Decks by sotm2017

Other Decks in Research

Transcript

  1. Challenges in geonames
    and address extraction
    Prof. Stefan Keller

    Geometa Lab HSR

    University of Applied Sciences 

    Rapperswil (Switzerland)

    View Slide

  2. Agenda
    • Motivation

    • Geonames

    • Adresses

    • Issues

    View Slide

  3. Geoname search:
    Where is Aizu-Wakamatsu?

    Address geocoding:
    Tokyo Central Post Office
    5-3, Yaesu 1-Chome
    Chuo-ku, Tokyo 100-8994

    View Slide

  4. Search components
    • Data



    • Data pre-processing software


    • Search engine software

    View Slide

  5. Geonames with
    containment hierarchy

    View Slide

  6. Geoname Ex.: Aizu

    View Slide

  7. Nominatim.osm.org

    View Slide

  8. OSMNames.org

    View Slide

  9. Enriched Geonames Data
    "name": "会津若松市",
    "alternative_names": „Aizuwakamatsu, Айдзувакамацу"
    "street": "",
    "county": "Fukushima",
    "city": "",
    "state": "Fukushima",
    "country": "Japan",
    "boundingbox": [139.8389,37.3229,140.1133,37.5831],
    "osm_id": "4174424",
    "type": "administrative",
    "importance": 0.4,

    View Slide

  10. The power of hierarchy!
    1. country, national level (ev. main land!)

    2. state, subnational level

    3. city

    4. county/town

    5. village / suburb / neighborhood
    All administrative divisions are polygons (well: almost…)

    View Slide

  11. What’s a name anyway?
    • Toponymie: Endonym, Exonym.Example

    • name = ձ௡एদ৓

    • name:jp = ձ௡एদ৓ (Aizu-Wakamatsu-jō)

    • name:en = Aizu-Wakamatsu Castle

    • name:de = Burg Aizu-Wakamatsu

    • alt_name:en = Tsuruga Castle

    • alt_name:jp = ௽ϲ৓ (Tsuru-ga-jō)

    View Slide

  12. Issues in tagging geonames
    • Tag name: „Name is the name only“: Names are often misused to
    describe all kinds of things

    • Ranking of geonames!

    • Tag admin_level: There is no unified tagging yet in OSM for town
    parts and village parts

    • Issues in assigning hierarchy of city/town/village/suburb/
    neighborhood

    • How to deal with objects of larger aerial extent? Currently often
    captured as node: which to choose? what is the extent? what is
    the bbox?

    View Slide

  13. Addresses

    View Slide

  14. (Postal Building) Addresses
    • Given list of street geonames as processed before,
    including hierarchy

    • Select all OSM objects (node, way, relation) with key
    „addr:housenumber“ (Karlsruhe Schema)

    • Goal: Generate list of addresses pointing to a street
    (osm_id)

    View Slide

  15. Karlsruhe Schema
    Addresses can be tagged with key addr:housenumber and
    other addr keys on…

    • an node, isolated or other (e.g. shops)

    • a node on top of a building boundary with tag entrance=yes

    • a node on a polygon with key building

    • a node on a polygon representing the perimeter of a site

    • an relation with key associatedStreet

    • an invisible line (way) with key addr:interpolation

    View Slide

  16. Options to relate a house
    number to a street
    1. House number as part of relation exists? 



    2. Do addr:street or addr:place exists and match directly? 



    3. Apply fuzzy string/text search if street/place do not
    match.



    4. Apply street proximity search if there is no street/place


    View Slide

  17. Not covered here
    • Ways with addr:interpolation tag

    • Nodes with associated_street tag

    • POIs without addr:* get addr:* from buildings with addr:s

    • Zip codes

    • …

    View Slide

  18. Issues in tagging addresses
    • Misused name tags which introduce ambiguity

    • Sharing/deduplicating addr tags among objects inside a
    building polygon

    • addr nodes with entrance=yes sitting on top of a building
    (way): have to give addr to building (and building would
    have to give addy to everything inside

    • Special treatment of nodes with tag
    addr:housenumber=1;3;5 with values separated by
    semicolon

    View Slide

  19. Message to go
    • Addresses are an important asset of OSM

    • Geonames too

    • Wishlist:

    • A more consistent assignment of admin levels is needed

    • More consistent name content

    • Discussion on how to map geonames of larger aerial extent

    • …

    View Slide