Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why Geocoding is Hard and how You and OSM can Help (with speaker notes)

Why Geocoding is Hard and how You and OSM can Help (with speaker notes)

Julian Simioni

October 06, 2018
Tweet

More Decks by Julian Simioni

Other Decks in Technology

Transcript

  1. Why Geocoding is Hard, and
    how you and OSM can help
    Julian Simioni, Cleared for Takeoff, @juliansimioni

    View full-size slide

  2. github.com/pelias/pelias/
    Hi I’m Julian, and I’m a core maintainer of the Pelias geocoder. Pelias is an open source, data-agnostic geocoder that we originally started building at Mapzen. Now you
    might have heard that Mapzen shut down last year, which is sad, but an amazing thing happened after that, where pretty much all the projects Mapzen started have
    continued on in some way. Score one for open source!

    View full-size slide

  3. Personally, I co-founded Cleared for Takeoff, where we do consulting around geocoding and run geocode.earth, a hosted geocoding service.

    We’ve been able to keep working on and improving Pelias through our work. I want to talk a little bit about some of the problems we’ve solved, some of the problems we
    haven’t solved, and most importantly some of the ways OSM is uniquely suited to helping with geocoding in general, not just Pelias.

    View full-size slide

  4. https://www.nasa.gov/feature/top-15-earth-images-of-2015
    The world is a little bit different everywhere. That’s pretty much the core problem of geocoding. From space, the differences humans create are mostly invisible.
    Sometimes you can see them though. This is a photo from the international space station of the India/Pakistani border, one of those rare examples.

    Geocoders basically have to know about all those differences, at least when it comes to how people refer to places. Here’s a great story to illustrate one of the challenges
    we face.

    View full-size slide

  5. https://www.flickr.com/photos/thomashawk/15691203924/
    [here is where I tell a funny story about a relative from Italy who visited the US and asked “Who is Saint Main? He must be very famous because I see so many streets
    named after him”]

    View full-size slide

  6. Well, we do have streets named after saints, here’s one near where I live in NYC. But, at least here in the US, “st” is an abbreviation for Street as well. But not in Italy.

    View full-size slide

  7. To make things even trickier, the same abbreviation might expand to something different in different places. In Germany, “st” expands to Sankt. Same meaning, same
    abbreviation, different spelling.

    So what do we need? Do we need a massive machine learning project to deduce the right behavior in every country? Probably not. We can get pretty far with a list of
    different abbreviations in different places.

    View full-size slide

  8. https://wiki.openstreetmap.org/wiki/Name_finder:Abbreviations
    OSM already has this, and in true community spirit it’s an editable wiki!

    View full-size slide

  9. Now, I think no one has been excited by a wiki as a novel solution to a problem in about 10 years, but I think that’s actually a good thing. This _works_, it’s simple, and it’s
    pretty easy for anyone to edit.

    View full-size slide

  10. There’s a lot of sections, with lots of abbreviations in lots of different languages.

    View full-size slide

  11. Including, of course, the language of our neighbors across the Detroit river: Canadian.

    Anyways, it goes without saying that there’s probably more abbreviations, or even entire languages and countries, to add. So this is a great place to contribute to OSM
    that goes beyond editing the map. These wiki pages are used directly by Nominatim, which is really cool and ingenious, and we look at them to help make Pelias better.
    We should probably use them directly too.

    View full-size slide

  12. Beyond points, beyond
    polygons
    So, there are lots of open datasets out there. Many of them are just point data, like OpenAddresses. This can be really useful. Some datasets have polygon data. This can
    be useful too. But the _most_ useful of all are datasets that utilize both. OSM is one of them.

    All the examples I’m about to show comes from our work with the TriMet transit agency in Portland, Oregon (hence the cheesy image). Madeline Steele from their team
    was supposed to present here, and couldn’t make it, so I’ll do my best to show off some of the awesome work they’ve done, and the work we’ve done together.

    View full-size slide

  13. Here’s the Brookwood Library in Hillsboro, Oregon. This is a screenshot of a route from Trimet’s trip planner. You might be wondering why it’s telling you to walk around to
    the BACK of the library.

    The answer is that for routing, you need a specific point to route to. Baring any other data, the best single point usually ends up being the centroid of the shape of the
    building, or something roughly equivalent.

    The centroid, oddly enough, was closest to the path going to the back of the building, so that explains the routing.

    But the solution here was already in OSM: entrance tags! Entrance tags perfectly solve the problem by specifying a point to go along with the shape of the building, for
    exactly where the door is.

    View full-size slide

  14. After adding an entrance tag to that building, TriMet’s routing results are perfect.

    Adding entrance tags to buildings of almost any size is a super useful way to add data that’s missing in quite a few places, even those that are pretty well mapped.

    View full-size slide

  15. Here’s a challenge we’re working on for the future. Portland International Airport has no entrance tag. Worse, no point on the boundary of the way is remotely acceptable
    as an entrance tag. The way we were calculating the centroid, it ended up on a runway. Giving driving directions to an active runway is very bad.

    I looked at a lot of other airports and they have this problem too. Some airports are relations, and they _still_ don’t solve this problem. I think a relation with a way and a
    collection of points for entranceways might work. For some airports we will have to do this for each terminal. Or maybe we need a new tag!

    View full-size slide

  16. Actual video of OSM tag format debate
    Now some of you might cringe at the thought of coming up with consensus around a new tag, but really, the fact that we _can_ go through that process is one of OSMs
    strengths. No one knows how to build the perfect map, and OSM lets us all figure it out together. We probably need to make the process less painful, and we DEFINITELY
    need to have more voices involved in the discussion, but overall, it’s pretty awesome.

    View full-size slide

  17. A Tale of Two Cities
    Okay, lets look at one last problem I believe OSM can solve. Take a look at the two map screenshots above. They look pretty similar, right? Actually they’re from two very
    different cities on opposite sides of the planet. Anyone want to take a guess?

    View full-size slide

  18. Here’s the satellite view of the exact same place to give you some hints.

    Wow, they look a lot different in this view.

    View full-size slide

  19. Johannesburg Detroit
    So while there’s about the same amount of data in OSM for both places, the level of completeness is vastly different.

    View full-size slide

  20. cityofdetroit.github.io/demo-tracker
    How did this happen? For the last several years, the city of Detroit has been undergoing a massive project to demolish abandoned buildings causing “blight”. They’ve
    done a great job! I think they’ve demolished over 14 thousand buildings, and while it’s not great that they had to do that, it’s better than the alternative.

    But that leads to something interesting.

    View full-size slide

  21. A Tale of Two Cities
    So we’re left with two places that both LOOK incomplete, but one is actually pretty complete. Unfortunately, geocoders have long had to deal with the idea of missing
    data. Both Pelias and Nominatim have excellent address interpolation engines, for example. How do we tell the interpolation engines that in Detroit, there’s nothing to
    interpolate, but in Johannesburg there is?

    View full-size slide

  22. https://www.whosonfirst.org/blog/2017/10/24/whosonfirst-sotmus-2017
    You might have seen Aaron Cope’s talk last year, if you weren’t completely distracted by the beautiful mountains out the window, where he talked about Who’s on First.
    By the way, we use Who’s on First heavily in Pelias and it’s been an essential source of data.

    Aaron talks about the idea of _managing absense_. By that he meant managing lack of data, but there’s another type of absence, when we know something isn’t there.
    We’re lucky to be in a position to even start talking about this, but eventually we should think of something. After all, some day, the map, just like this talk, will be
    completely done, right? :)

    View full-size slide

  23. Thank You!
    twitter.com/juliansimioni
    Thank you, and of course, here’s a picture of my cat.

    View full-size slide