$30 off During Our Annual Pro Sale. View Details »

A Knapsack of Geotools: More Than Just Google Maps

A Knapsack of Geotools: More Than Just Google Maps

Where? As location becomes increasingly important, and as more and more data is geotagged, this may be the most important question your app needs to answer. How do you determine what city and country your users are coming from? Figure out which neighborhood a place is in? Keep a location history for a physical object? Group people together based on proximity? One of these days you'll need to reach into your knapsack of geo-tools to solve problems like these and this talk aims to make you ready. We'll cover using location-aware storage like MongoDB and ElasticSearch, GeoIP, reverse geocoding, third-party location web services, geo-hashing, and more.

Andrei Zmievski

March 01, 2012
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. A KNAPSACK OF GEOTOOLS
    more than just Google Maps
    Andrei Zmievski • ConFoo • Mar 1, 2012
    Thursday, March 1, 12

    View Slide

  2. MOI
    ❖ Andrei Z.
    ❖ Architect at AppDynamics
    ❖ PHP core dev, Smarty, PHP-GTK, Unicode project
    ❖ Coding, beer, brewing, photography, travel
    ❖ @a
    Thursday, March 1, 12

    View Slide

  3. INTRO
    ❖ Location is more important than ever
    ❖ So is basic understanding of geo-related principles
    ❖ Survey of tools, services, and technologies
    ❖ Not an exhaustive reference
    Thursday, March 1, 12

    View Slide

  4. REPRESENTING LOCATION
    Thursday, March 1, 12

    View Slide

  5. SPHERICAL COORDINATES
    ❖ Latitude (ϕ, phi, lat.)
    ❖ Longitude (λ, lambda, long.)
    Thursday, March 1, 12

    View Slide

  6. LATITUDE/LONGITUDE
    “webbing” = conjugate graticule
    Thursday, March 1, 12

    View Slide

  7. LATITUDE/LONGITUDE
    ❖ Lines of equal latitude are called parallels
    ‣ 0º parallel = equator
    ‣ 23º26’N parallel = Tropic of Cancer
    ‣ 23º26’S parallel = Tropic of Capricorn
    ❖ Lines of equal latitude are called meridians
    ‣ 0° meridian = Prime Meridian
    ‣ Antipodes of Prime Meridian = 180°W and 180°E
    Thursday, March 1, 12

    View Slide

  8. LATITUDE/LONGITUDE
    ❖ Measured in DMS or decimal degrees
    ❖ Latitude: 37° 45' 35” = 37.76°
    ❖ Longitude: -122° 28' 11 = -122.47°
    ❖ Positive latitude is N(orth), negative is S(outh)
    ❖ Positive longitudes are E(ast) of 0°, negative are W(est)
    Thursday, March 1, 12

    View Slide

  9. LATITUDE/LONGITUDE
    ❖ Latitude + longitude is enough to specify any location
    on the planet
    ❖ Does not consider height/depth
    Thursday, March 1, 12

    View Slide

  10. LATITUDE/LONGITUDE
    ❖ Earth != sphere
    ❖ Earth = oblate spheroid
    ❖ 6,378 km (equatorial) <= Radius => 6,357 km (polar)
    Thursday, March 1, 12

    View Slide

  11. LATITUDE/LONGITUDE
    ❖ 1” of latitude = 30.7 m
    ❖ 1” of longitude varies with latitude
    latitude town degree minute second
    60° Saint-Petersburg 55.65 km 0.93 km 15.42 m
    51.47° Greenwich 69.29 km 1.15 km 19.24 m
    45° Bordeaux 78.70 km 1.31 km 21.86 m
    30° New Orleans 96.39 km 1.61 km 26.77 m
    0° Quito 111.3 km 1.86 km 30.92 m
    (On the GRS80 or WGS84 spheroid at sea level)
    Thursday, March 1, 12

    View Slide

  12. LATITUDE/LONGITUDE
    ❖ Computer representations:
    ‣ “37.253,-122.0139”
    ‣ [37.253,-122.0139]
    ‣ {lat: 37.253, long:-122.0139}
    ❖ Historically, lat,long
    ❖ GeoJSON specifies long,lat because in geometry x-axis
    comes first
    Thursday, March 1, 12

    View Slide

  13. GOTCHAS
    ❖ International Date Line discontinuity
    ❖ Latitudes smoothly wrap at the poles: from < 90° over
    the pole and back to < 90°, then down to -90°, over the
    pole, and back to > -90°
    ❖ Longitude has a discontinuity: 180°E (+180°) turns into
    180°W (-180°W)
    ❖ This should be taken into account when making
    bounding box and other calculations
    Thursday, March 1, 12

    View Slide

  14. DISTANCE
    ❖ A bit more complicated than the normal Pythagorean
    planar distance formula
    ❖ All angle measurements are in radians, not degrees
    rad = deg

    180
    Thursday, March 1, 12

    View Slide

  15. HAVERSINE
    ❖ Calculates the great-circle (orthodrome) distance
    ❖ Well-conditioned even for small distances
    a
    =
    sin
    2

    lat
    2

    +
    cos
    (
    lat1)
    cos
    (
    lat2)
    sin
    2

    long
    2

    c
    = 2
    atan
    2(
    p
    a,
    p
    1
    a
    )
    d
    =
    R
    ·
    c
    Thursday, March 1, 12

    View Slide

  16. SPHERICAL LAW OF COSINES
    ❖ These days IEEE 754 64-bit floating-point numbers
    provide 15 significant figures of precision
    ❖ Enough to use a simple trigonometry law
    ❖ Gives well-conditioned results down to 1 meter
    d
    =
    acos
    (
    sin
    (
    lat1)
    sin
    (
    lat2) +
    cos
    (
    lat1)
    cos
    (
    lat2)
    cos
    (
    long
    )) ·
    R
    Thursday, March 1, 12

    View Slide

  17. VINCENTY FORMULA
    ❖ Does not assume spherical Earth
    ❖ Precision = 0.5 mm!
    ❖ Iterative calculation
    ❖ Explanation and JS implementation:
    ❖ http://www.movable-type.co.uk/scripts/latlong-
    vincenty.html
    Thursday, March 1, 12

    View Slide

  18. DISTANCE
    ❖ Lots of other goodies here:
    ❖ http://www.movable-type.co.uk/scripts/latlong.html
    Thursday, March 1, 12

    View Slide

  19. MIDPOINT
    ❖ Averaging is approximate at distances < 400 km
    ❖ Geographic midpoint method
    Thursday, March 1, 12

    View Slide

  20. 1.Convert lat/long to radians
    2.Convert lat/long to Cartesian coordinates
    MIDPOINT
    lat1 =
    lat1

    180
    lon1 =
    lon1

    180
    X1 =
    cos
    (
    lat1)
    cos
    (
    lon1)
    Y1 =
    cos
    (
    lat1)
    sin
    (
    lon1)
    Z1 =
    sin
    (
    lat1)
    Thursday, March 1, 12

    View Slide

  21. 3.Compute weighted average
    MIDPOINT
    x
    =
    1
    n
    n
    X
    1
    Xn = X1 +
    X2 +
    . . .
    +
    Xn
    n
    y
    =
    1
    n
    n
    X
    1
    Yn = Y1 +
    Y2 +
    . . .
    +
    Yn
    n
    z
    =
    1
    n
    n
    X
    1
    Zn = Z1 +
    Z2 +
    . . .
    +
    Zn
    n
    Thursday, March 1, 12

    View Slide

  22. 4.Convert Cartesian back into latitude/longitude
    5.Convert back to degrees
    MIDPOINT
    lon
    =
    atan
    2(
    y, x
    )
    hyp
    =
    p
    x
    2 +
    y
    2
    lat
    =
    atan
    2(
    z, hyp
    )
    lat
    =
    lat
    180

    lon
    =
    lon
    180

    Thursday, March 1, 12

    View Slide

  23. ❖ At small distances (and closer to equator), good enough
    ‣ i.e. 2 km east + 5 km north = 5.38 km from origin
    ‣ but at 100x distances the error increases
    ❖ Allows for simpler calculations
    ❖ But not correct, depends on projection
    EUCLIDIAN GEOMETRY
    Thursday, March 1, 12

    View Slide

  24. MANY OTHERS
    ❖ Universal Transverse Mercator (UTM)
    ❖ Military Grid Reference System (MGRS)
    ❖ World Geographic Reference System (GEOREF)
    Thursday, March 1, 12

    View Slide

  25. GEOREF SYSTEM
    Thursday, March 1, 12

    View Slide

  26. GEOHASH
    Thursday, March 1, 12

    View Slide

  27. GEOHASH
    ❖ Hierarchical spatial data structure which subdivides
    space into grid buckets
    ❖ Uses clever interleaved encoding scheme
    ❖ 9q8yyy is 37.78, -122.4
    ❖ 9q8yyyd3b11 is 37.78504 -122.39559
    Thursday, March 1, 12

    View Slide

  28. ADVANTAGES
    ❖ arbitrary precision, gradual degradation
    ❖ efficient encoding
    geohash  length lat  bits long  bits km  error
    1 2 3 ±2500
    2 5 5 ±630
    3 7 8 ±78
    4 10 10 ±20
    5 12 13 ±2.4
    6 15 15 ±0.61
    7 17 18 ±0.076
    8 20 20 ±0.019
    Thursday, March 1, 12

    View Slide

  29. ADVANTAGES
    ❖ easily shareable
    ❖ denotes an area of arbitrary size
    ❖ can be used for a simple version of clustering
    Thursday, March 1, 12

    View Slide

  30. LIMITATIONS
    ❖ hard to determine adjacency
    ❖ points close to each other can be in different cells
    ❖ not good for distance calculations
    ❖ projection-based model: given prefix length describes a
    much different region size near the equator than near
    the pole
    Thursday, March 1, 12

    View Slide

  31. TEXTUAL
    Thursday, March 1, 12

    View Slide

  32. TEXTUAL
    ❖ Point of interest
    ❖ Address
    ❖ Zip code
    ❖ Neighborhood
    ❖ City/country name
    Thursday, March 1, 12

    View Slide

  33. GEOLOCATION METHODS
    Thursday, March 1, 12

    View Slide

  34. OLD TECH
    ❖ Traditional maps
    ❖ Points of interest
    ❖ Celestial navigation, aka astronavigation
    Thursday, March 1, 12

    View Slide

  35. NEW TECH
    ❖ IP address
    ❖ GPS/GLONASS satellites (< 10 m)
    ❖ Cell tower ID (200 m - 32 km accuracy)
    ❖ WiFi base stations (< 100 m)
    ❖ Ping times to well known servers
    ‣ http://www.slac.stanford.edu/comp/net/wan-mon/tulip/
    Thursday, March 1, 12

    View Slide

  36. IMPLEMENTATIONS
    ❖ GPS devices
    ‣ accurate, but slower to acquire signal
    ‣ especially in cities, with poor signal conditions
    ❖ Mobile (iPhone/Android/other)
    ‣ mostly uses A-GPS
    ‣ use network to ask an assistance server for GPS
    satellite data
    Thursday, March 1, 12

    View Slide

  37. IMPLEMENTATIONS
    ❖ Browser-based
    ‣ API uses Location Information Servers
    ‣ Common sources of location information: IP address,
    Wi-Fi/Bluetooth MAC address, GPS, GSM/CDMA ID, etc
    ‣ Well-supported by modern browsers
    Thursday, March 1, 12

    View Slide

  38. BROWSER GEOLOCATION
    if  (navigator.geolocation)  {
      navigator.geolocation.getCurrentPosition(  
        function  (position)  {  /*  do  something  */  },
        function  (error)  //  error  callback
        {
          switch(error.code)  
          {
            case  error.TIMEOUT:
              break;
            case  error.POSITION_UNAVAILABLE:
              break;
            case  error.PERMISSION_DENIED:
              break;
            case  error.UNKNOWN_ERROR:
              break;
          }
        });
      }
    }
    else  //  no  geolocation
    Thursday, March 1, 12

    View Slide

  39. GEOIP
    Thursday, March 1, 12

    View Slide

  40. Thursday, March 1, 12

    View Slide

  41. THIS
    IS
    TERRIBLE
    Thursday, March 1, 12

    View Slide

  42. MAXMIND DB
    ❖ Most comprehensive IP positioning DB
    ‣ GeoLite City: free, decent coverage, less accurate,
    monthly updates
    ‣ GeoIP City: $370 initial, $90/month, weekly updates,
    more coverage, better accuracy
    Thursday, March 1, 12

    View Slide

  43. COVERAGE
    ❖ GeoLite City — over 99.5% on a country level and 79%
    on a city level for the US within a 25 mile radius
    country correct incorrect
    United  States 78% 17%
    Canada 81% 18%
    Kazakhstan 84% 18%
    Thursday, March 1, 12

    View Slide

  44. MAXMIND DB
    ❖ GeoIP C library
    ❖ Used by PHP, Python, etc
    ‣ http://pecl.php.net/package/geoip (older)
    ‣ https://github.com/Zakay/geoip (newer)
    ‣ http://www.maxmind.com/app/python (older)
    ‣ https://github.com/appliedsec/pygeoip (newer)
    Thursday, March 1, 12

    View Slide

  45. GEOIP IN PHP
    print_r(geoip_record_by_name('67.160.202.223'));
    Array
    (
    [continent_code] => NA
    [country_code] => US
    [country_code3] => USA
    [country_name] => United States
    [region] => CA
    [city] => San Francisco
    [postal_code] =>
    [latitude] => 37.764499664307
    [longitude] => -122.42939758301
    [dma_code] => 807
    [area_code] => 415
    )
    Thursday, March 1, 12

    View Slide

  46. GEOIP IN PHP
    print_r(geoip_record_by_name('84.92.229.1'));
    Array
    (
    [continent_code] => EU
    [country_code] => GB
    [country_code3] => GBR
    [country_name] => United Kingdom
    [region] => H9
    [city] => London
    [postal_code] =>
    [latitude] => 51.500198364258
    [longitude] => -0.1262000054121
    [dma_code] => 0
    [area_code] => 0
    )
    Thursday, March 1, 12

    View Slide

  47. GEOIP IN PHP
    print_r(geoip_record_by_name('124.28.8.8'));
    Array
    (
    [continent_code] => AS
    [country_code] => KR
    [country_code3] => KOR
    [country_name] => Korea, Republic of
    [region] => 13
    [city] => Bucheon
    [postal_code] =>
    [latitude] => 37.498901367188
    [longitude] => 126.78309631348
    [dma_code] => 0
    [area_code] => 0
    )
    Thursday, March 1, 12

    View Slide

  48. GEOIP IN PHP
    print_r(geoip_record_by_name('28.8.8.8'));
    Array
    (
    [continent_code] => NA
    [country_code] => US
    [country_code3] => USA
    [country_name] => United States
    [region] =>
    [city] =>
    [postal_code] =>
    [latitude] => 38
    [longitude] => -97
    [dma_code] => 0
    [area_code] => 0
    )
    Thursday, March 1, 12

    View Slide

  49. MIDDLE EARTH!
    Thursday, March 1, 12

    View Slide

  50. OTHER SERVICES
    ❖ http://ipinfodb.com/ (uses IP2Location lite DB)
    ❖ http://www.hostip.info/use.html (crowd-sourced)
    Thursday, March 1, 12

    View Slide

  51. COMPARISON
    ❖ http://ipinfodb.com/ip_locator.php?ip=84.92.229.1
    ‣ Country : UK
    ‣ State/Province : ENGLAND
    ‣ City : SHEFFIELD
    ‣ Zip or postal code : -
    ‣ Latitude : 53.383055
    ‣ Longitude : -1.464795
    Thursday, March 1, 12

    View Slide

  52. COMPARISON
    ❖ http://api.hostip.info/get_html.php?ip=84.92.229.1
    ‣ Country: UNITED KINGDOM (GB)
    ‣ City: (Unknown city)

    Thursday, March 1, 12

    View Slide

  53. ALSO
    ❖ One PHP library to rule them all
    ❖ http://geocoder-php.org/
    ❖ Unified interface for a variety of IP-Based geocoding
    providers
    ❖ And more
    Thursday, March 1, 12

    View Slide

  54. LESSONS
    ❖ Expect failures and word/work around them
    ❖ Don’t assume REMOTE_ADDR is correct
    ‣ Might be a proxy
    ‣ Check X-Forwarded-For header
    Thursday, March 1, 12

    View Slide

  55. GEOCODING AND MORE
    Thursday, March 1, 12

    View Slide

  56. CONVERSIONS
    ❖ Geocoding: address ➔ latitude & longitude
    ❖ Reverse geocoding: latitude & longitude ➔ address
    ❖ Not always reversible, e.g. “Farallon Islands”
    ❖ Disparity of results:
    ‣ http://maps.googleapis.com/maps/api/geocode/json?
    latlng=37.759947,-122.46866&sensor=true
    ‣ http://maps.googleapis.com/maps/api/geocode/json?
    latlng=45.499148,-73.566488&sensor=true
    Thursday, March 1, 12

    View Slide

  57. SERVICES/LIBRARIES
    ❖ JS: Google/Bing/MapQuest APIs
    ❖ PHP: http://geocoder-php.org/
    ❖ Python: http://code.google.com/p/geopy/
    ❖ Ruby: http://highearthorbit.com/geocommons-open-
    sourced-geocoder/
    ❖ Check ToS before launching publicly
    Thursday, March 1, 12

    View Slide

  58. LOCATION EXTRACTION
    ❖ Location is not always given in a structured format
    ❖ Blog posts, news clippings, status updates may contain
    embedded mentions of locations
    ❖ Goal is to extract these with as much precision as
    possible
    Thursday, March 1, 12

    View Slide

  59. LOCATION EXTRACTION
    ❖ Yahoo! PlaceMaker does this
    ❖ Disambiguates found locations and returns unique IDs
    (WOEIDs)
    ‣ "New York", "New York City", "NYC", and "the Big
    Apple" are all variant names for WOEID 2459115
    ❖ These can be used with Yahoo GeoPlanet
    Thursday, March 1, 12

    View Slide

  60. STORING AND SEARCHING
    Thursday, March 1, 12

    View Slide

  61. MONGO DB
    ❖ 2D geospatial indexes
    ❖ Location is an object or array at least 2 elements,
    which are interpreted as coordinates
    ❖ Implementation via geohashes on top of B-trees
    ❖ Supports multiple locations per document
    ❖ Uses GeoJSON spec - [long,lat]
    Thursday, March 1, 12

    View Slide

  62. MONGO DB
    ❖ Creating
    ❖ Querying
    db.places.ensureIndex({loc:  "2d"})
    db.places.save({loc:  [-­‐91.23,  28.25]})
    db.places.find({loc:  {$near:  [-­‐91,28],  $maxDistance:  5}})
    db.runCommand({geoNear:  "places",  near:  [-­‐91,28],  num:10})
    Thursday, March 1, 12

    View Slide

  63. MONGO DB
    ❖ Supports Euclidian (default) and spherical models
    var  earthRadius  =  6378;  /*  km  */
    var  range  =  30  /  earthRadius;  /*  to  radians  */
    db.places.find({loc:  {$nearSphere:  [20,50],  $maxDistance:  range}})
    distances  =  db.runCommand({geoNear:  "places",  near:  [20,  50],  
           spherical:  true,  maxDistance:  range}).results;
    pointDistance  =  distances[0].dis  *  earthRadius  //  back  to  km
    Thursday, March 1, 12

    View Slide

  64. MONGO DB
    ❖ Bounded queries: circle, bounding box, polygon (>= 1.9)
    ❖ Note that bounding box is specified via bottom left/top
    right corners
    box  =  [[-­‐73.99756,  40.73083],  [-­‐73.988135,  40.741404]]
    db.places.find({loc:  {"$within":  {"$box"  :  box}}})
    Thursday, March 1, 12

    View Slide

  65. MONGO DB
    ❖ Limitations
    ‣ only 1 geospatial index per collection
    ‣ some query limitations with multi-location docs
    ‣ somewhat awkward and inconsistent query syntax
    ‣ sharding on geo keys doesn’t quite work yet
    Thursday, March 1, 12

    View Slide

  66. MYSQL
    ❖ Based on the OpenGIS model
    ❖ Supports points, lines, polygons
    ❖ Implemented using R-trees
    Thursday, March 1, 12

    View Slide

  67. MYSQL
    ❖ Creating
    ❖ Inserting data
    CREATE  TABLE  geom  (g  GEOMETRY  NOT  NULL,  SPATIAL  INDEX(g))  
    ENGINE=MyISAM;
    INSERT  INTO  geom  VALUES  (GeomFromText('POINT(1  1)'))
    or
    //  not  OpenGIS  standard
    INSERT  INTO  geom  VALUES(Point(1,1));
    Thursday, March 1, 12

    View Slide

  68. MYSQL
    ❖ Querying
    ❖ Only MyISAM tables support SPATIAL indices
    ❖ Sphinx search engine supports geo-distance search as
    well
    SELECT  MBRContains(
      GeomFromText(  'POLYGON((0  0,0  3,3  3,3  0,0  0))'  ),
      coord
    )  from  geom;
    Thursday, March 1, 12

    View Slide

  69. ELASTICSEARCH
    ❖ Mapping type geo_point
    ❖ Implemented via geohashes
    ❖ Takes string, array, or object, or geohash
    representations on insert
    ❖ Uses GeoJSON ordering
    Thursday, March 1, 12

    View Slide

  70. ELASTICSEARCH
    ❖ Mapping specification
    {
           "mydoc"  :  {
                   "properties"  :  {
                           "location"  :  {
                                   "type"  :  "geo_point"
                           }
                   }
           }
    }
    Thursday, March 1, 12

    View Slide

  71. ELASTICSEARCH
    ❖ Inserting
    curl  -­‐XPUT  localhost:9200/myindex/mydoc/1  -­‐d'{
         "location"  :  "-­‐83,47"
    }'
    curl  -­‐XPUT  localhost:9200/myindex/mydoc/2  -­‐d'{
         "location"  :  {"lat":  47,  "lon":  -­‐83}
    }'
    Thursday, March 1, 12

    View Slide

  72. ELASTICSEARCH
    ❖ Can query by distance, bounding box, or polygon
    ❖ Does not support rich OpenGIS models, lines
    ❖ Bounding box is specified via top left/bottom right
    corners
    Thursday, March 1, 12

    View Slide

  73. ELASTICSEARCH
    {
           "filtered"  :  {
                   "query"  :  {
                           "match_all"  :  {}
                   },
                   "filter"  :  {
                           "geo_distance"  :  {
                                   "distance"  :  "200km",
                                   "pin.location"  :  {
                                           "lat"  :  40,
                                           "lon"  :  -­‐70
    }}}}}
    Thursday, March 1, 12

    View Slide

  74. ELASTICSEARCH
    ❖ geo + text = power
    ‣ Find all businesses with ‘pizza delivery’ in the description
    within 5 miles from my location and segment the results
    1, 2, and 5 miles
    Thursday, March 1, 12

    View Slide

  75. POSTGRES
    ❖ PostGIS = spatial extensions to Postgres
    ❖ Follows OpenGIS standard
    ❖ Backend spatial database for geographic information
    systems (GIS)
    Thursday, March 1, 12

    View Slide

  76. MORE…
    Thursday, March 1, 12

    View Slide

  77. DATASETS
    ❖ http://freegisdata.rtwilson.com/
    ❖ Neighborhoods
    ‣ http://www.zillow.com/howto/api/neighborhood-boundaries.htm
    ❖ Streets
    ‣ http://www.census.gov/geo/www/tiger/
    ‣ OpenStreetMap
    ❖ Other
    ‣ Flickr shapefiles
    ‣ Yahoo GeoPlanet data
    Thursday, March 1, 12

    View Slide

  78. APIS/SERVICES
    ❖ SimpleGeo -> UrbanAirship
    ❖ Web Maps Studio
    ❖ GeoAPI
    ❖ POIs: Google Places, Foursquare, Factual
    Thursday, March 1, 12

    View Slide

  79. APIS/SERVICES
    ❖ http://www.earthtools.org/webservices.htm
    ❖ http://www.geonames.org/export/web-services.html
    ❖ http://www.infochimps.com/products/geo
    Thursday, March 1, 12

    View Slide

  80. MAPPING
    ❖ Mapbox, CartoDB, TileMill, OSM, Mapnik, Leaflet,
    Polymaps
    ❖ Route planner:
    ‣ http://graphserver.github.com/graphserver/
    Thursday, March 1, 12

    View Slide

  81. MERCI!
    http://joind.in/6031
    Thursday, March 1, 12

    View Slide