Andrei Zmievski
March 01, 2012
1.1k

# A Knapsack of Geotools: More Than Just Google Maps

Where? As location becomes increasingly important, and as more and more data is geotagged, this may be the most important question your app needs to answer. How do you determine what city and country your users are coming from? Figure out which neighborhood a place is in? Keep a location history for a physical object? Group people together based on proximity? One of these days you'll need to reach into your knapsack of geo-tools to solve problems like these and this talk aims to make you ready. We'll cover using location-aware storage like MongoDB and ElasticSearch, GeoIP, reverse geocoding, third-party location web services, geo-hashing, and more.

March 01, 2012

## Transcript

1. A KNAPSACK OF GEOTOOLS
Andrei Zmievski • ConFoo • Mar 1, 2012
Thursday, March 1, 12

2. MOI
❖ Andrei Z.
❖ Architect at AppDynamics
❖ PHP core dev, Smarty, PHP-GTK, Unicode project
❖ Coding, beer, brewing, photography, travel
❖ @a
Thursday, March 1, 12

3. INTRO
❖ Location is more important than ever
❖ So is basic understanding of geo-related principles
❖ Survey of tools, services, and technologies
❖ Not an exhaustive reference
Thursday, March 1, 12

4. REPRESENTING LOCATION
Thursday, March 1, 12

5. SPHERICAL COORDINATES
❖ Latitude (ϕ, phi, lat.)
❖ Longitude (λ, lambda, long.)
Thursday, March 1, 12

6. LATITUDE/LONGITUDE
“webbing” = conjugate graticule
Thursday, March 1, 12

7. LATITUDE/LONGITUDE
❖ Lines of equal latitude are called parallels
‣ 0º parallel = equator
‣ 23º26’N parallel = Tropic of Cancer
‣ 23º26’S parallel = Tropic of Capricorn
❖ Lines of equal latitude are called meridians
‣ 0° meridian = Prime Meridian
‣ Antipodes of Prime Meridian = 180°W and 180°E
Thursday, March 1, 12

8. LATITUDE/LONGITUDE
❖ Measured in DMS or decimal degrees
❖ Latitude: 37° 45' 35” = 37.76°
❖ Longitude: -122° 28' 11 = -122.47°
❖ Positive latitude is N(orth), negative is S(outh)
❖ Positive longitudes are E(ast) of 0°, negative are W(est)
Thursday, March 1, 12

9. LATITUDE/LONGITUDE
❖ Latitude + longitude is enough to specify any location
on the planet
❖ Does not consider height/depth
Thursday, March 1, 12

10. LATITUDE/LONGITUDE
❖ Earth != sphere
❖ Earth = oblate spheroid
❖ 6,378 km (equatorial) <= Radius => 6,357 km (polar)
Thursday, March 1, 12

11. LATITUDE/LONGITUDE
❖ 1” of latitude = 30.7 m
❖ 1” of longitude varies with latitude
latitude town degree minute second
60° Saint-Petersburg 55.65 km 0.93 km 15.42 m
51.47° Greenwich 69.29 km 1.15 km 19.24 m
45° Bordeaux 78.70 km 1.31 km 21.86 m
30° New Orleans 96.39 km 1.61 km 26.77 m
0° Quito 111.3 km 1.86 km 30.92 m
(On the GRS80 or WGS84 spheroid at sea level)
Thursday, March 1, 12

12. LATITUDE/LONGITUDE
❖ Computer representations:
‣ “37.253,-122.0139”
‣ [37.253,-122.0139]
‣ {lat: 37.253, long:-122.0139}
❖ Historically, lat,long
❖ GeoJSON speciﬁes long,lat because in geometry x-axis
comes ﬁrst
Thursday, March 1, 12

13. GOTCHAS
❖ International Date Line discontinuity
❖ Latitudes smoothly wrap at the poles: from < 90° over
the pole and back to < 90°, then down to -90°, over the
pole, and back to > -90°
❖ Longitude has a discontinuity: 180°E (+180°) turns into
180°W (-180°W)
❖ This should be taken into account when making
bounding box and other calculations
Thursday, March 1, 12

14. DISTANCE
❖ A bit more complicated than the normal Pythagorean
planar distance formula
❖ All angle measurements are in radians, not degrees

180
Thursday, March 1, 12

15. HAVERSINE
❖ Calculates the great-circle (orthodrome) distance
❖ Well-conditioned even for small distances
a
=
sin
2

lat
2

+
cos
(
lat1)
cos
(
lat2)
sin
2

long
2

c
= 2
atan
2(
p
a,
p
1
a
)
d
=
R
·
c
Thursday, March 1, 12

16. SPHERICAL LAW OF COSINES
❖ These days IEEE 754 64-bit ﬂoating-point numbers
provide 15 signiﬁcant ﬁgures of precision
❖ Enough to use a simple trigonometry law
❖ Gives well-conditioned results down to 1 meter
d
=
acos
(
sin
(
lat1)
sin
(
lat2) +
cos
(
lat1)
cos
(
lat2)
cos
(
long
)) ·
R
Thursday, March 1, 12

17. VINCENTY FORMULA
❖ Does not assume spherical Earth
❖ Precision = 0.5 mm!
❖ Iterative calculation
❖ Explanation and JS implementation:
❖ http://www.movable-type.co.uk/scripts/latlong-
vincenty.html
Thursday, March 1, 12

18. DISTANCE
❖ Lots of other goodies here:
❖ http://www.movable-type.co.uk/scripts/latlong.html
Thursday, March 1, 12

19. MIDPOINT
❖ Averaging is approximate at distances < 400 km
❖ Geographic midpoint method
Thursday, March 1, 12

2.Convert lat/long to Cartesian coordinates
MIDPOINT
lat1 =
lat1

180
lon1 =
lon1

180
X1 =
cos
(
lat1)
cos
(
lon1)
Y1 =
cos
(
lat1)
sin
(
lon1)
Z1 =
sin
(
lat1)
Thursday, March 1, 12

21. 3.Compute weighted average
MIDPOINT
x
=
1
n
n
X
1
Xn = X1 +
X2 +
. . .
+
Xn
n
y
=
1
n
n
X
1
Yn = Y1 +
Y2 +
. . .
+
Yn
n
z
=
1
n
n
X
1
Zn = Z1 +
Z2 +
. . .
+
Zn
n
Thursday, March 1, 12

22. 4.Convert Cartesian back into latitude/longitude
5.Convert back to degrees
MIDPOINT
lon
=
atan
2(
y, x
)
hyp
=
p
x
2 +
y
2
lat
=
atan
2(
z, hyp
)
lat
=
lat
180

lon
=
lon
180

Thursday, March 1, 12

23. ❖ At small distances (and closer to equator), good enough
‣ i.e. 2 km east + 5 km north = 5.38 km from origin
‣ but at 100x distances the error increases
❖ Allows for simpler calculations
❖ But not correct, depends on projection
EUCLIDIAN GEOMETRY
Thursday, March 1, 12

24. MANY OTHERS
❖ Universal Transverse Mercator (UTM)
❖ Military Grid Reference System (MGRS)
❖ World Geographic Reference System (GEOREF)
Thursday, March 1, 12

25. GEOREF SYSTEM
Thursday, March 1, 12

26. GEOHASH
Thursday, March 1, 12

27. GEOHASH
❖ Hierarchical spatial data structure which subdivides
space into grid buckets
❖ Uses clever interleaved encoding scheme
❖ 9q8yyy is 37.78, -122.4
❖ 9q8yyyd3b11 is 37.78504 -122.39559
Thursday, March 1, 12

❖ eﬃcient encoding
geohash  length lat  bits long  bits km  error
1 2 3 ±2500
2 5 5 ±630
3 7 8 ±78
4 10 10 ±20
5 12 13 ±2.4
6 15 15 ±0.61
7 17 18 ±0.076
8 20 20 ±0.019
Thursday, March 1, 12

❖ easily shareable
❖ denotes an area of arbitrary size
❖ can be used for a simple version of clustering
Thursday, March 1, 12

30. LIMITATIONS
❖ points close to each other can be in diﬀerent cells
❖ not good for distance calculations
❖ projection-based model: given preﬁx length describes a
much diﬀerent region size near the equator than near
the pole
Thursday, March 1, 12

31. TEXTUAL
Thursday, March 1, 12

32. TEXTUAL
❖ Point of interest
❖ Zip code
❖ Neighborhood
❖ City/country name
Thursday, March 1, 12

33. GEOLOCATION METHODS
Thursday, March 1, 12

34. OLD TECH
❖ Points of interest
Thursday, March 1, 12

35. NEW TECH
❖ GPS/GLONASS satellites (< 10 m)
❖ Cell tower ID (200 m - 32 km accuracy)
❖ WiFi base stations (< 100 m)
❖ Ping times to well known servers
‣ http://www.slac.stanford.edu/comp/net/wan-mon/tulip/
Thursday, March 1, 12

36. IMPLEMENTATIONS
❖ GPS devices
‣ accurate, but slower to acquire signal
‣ especially in cities, with poor signal conditions
❖ Mobile (iPhone/Android/other)
‣ mostly uses A-GPS
‣ use network to ask an assistance server for GPS
satellite data
Thursday, March 1, 12

37. IMPLEMENTATIONS
❖ Browser-based
‣ API uses Location Information Servers
‣ Common sources of location information: IP address,
Wi-Fi/Bluetooth MAC address, GPS, GSM/CDMA ID, etc
‣ Well-supported by modern browsers
Thursday, March 1, 12

38. BROWSER GEOLOCATION
if  (navigator.geolocation)  {
navigator.geolocation.getCurrentPosition(
function  (position)  {  /*  do  something  */  },
function  (error)  //  error  callback
{
switch(error.code)
{
case  error.TIMEOUT:
break;
case  error.POSITION_UNAVAILABLE:
break;
case  error.PERMISSION_DENIED:
break;
case  error.UNKNOWN_ERROR:
break;
}
});
}
}
else  //  no  geolocation
Thursday, March 1, 12

39. GEOIP
Thursday, March 1, 12

40. Thursday, March 1, 12

41. THIS
IS
TERRIBLE
Thursday, March 1, 12

42. MAXMIND DB
❖ Most comprehensive IP positioning DB
‣ GeoLite City: free, decent coverage, less accurate,
‣ GeoIP City: \$370 initial, \$90/month, weekly updates,
more coverage, better accuracy
Thursday, March 1, 12

43. COVERAGE
❖ GeoLite City — over 99.5% on a country level and 79%
on a city level for the US within a 25 mile radius
country correct incorrect
United  States 78% 17%
Kazakhstan 84% 18%
Thursday, March 1, 12

44. MAXMIND DB
❖ GeoIP C library
❖ Used by PHP, Python, etc
‣ http://pecl.php.net/package/geoip (older)
‣ http://www.maxmind.com/app/python (older)
Thursday, March 1, 12

45. GEOIP IN PHP
print_r(geoip_record_by_name('67.160.202.223'));
Array
(
[continent_code] => NA
[country_code] => US
[country_code3] => USA
[country_name] => United States
[region] => CA
[city] => San Francisco
[postal_code] =>
[latitude] => 37.764499664307
[longitude] => -122.42939758301
[dma_code] => 807
[area_code] => 415
)
Thursday, March 1, 12

46. GEOIP IN PHP
print_r(geoip_record_by_name('84.92.229.1'));
Array
(
[continent_code] => EU
[country_code] => GB
[country_code3] => GBR
[country_name] => United Kingdom
[region] => H9
[city] => London
[postal_code] =>
[latitude] => 51.500198364258
[longitude] => -0.1262000054121
[dma_code] => 0
[area_code] => 0
)
Thursday, March 1, 12

47. GEOIP IN PHP
print_r(geoip_record_by_name('124.28.8.8'));
Array
(
[continent_code] => AS
[country_code] => KR
[country_code3] => KOR
[country_name] => Korea, Republic of
[region] => 13
[city] => Bucheon
[postal_code] =>
[latitude] => 37.498901367188
[longitude] => 126.78309631348
[dma_code] => 0
[area_code] => 0
)
Thursday, March 1, 12

48. GEOIP IN PHP
print_r(geoip_record_by_name('28.8.8.8'));
Array
(
[continent_code] => NA
[country_code] => US
[country_code3] => USA
[country_name] => United States
[region] =>
[city] =>
[postal_code] =>
[latitude] => 38
[longitude] => -97
[dma_code] => 0
[area_code] => 0
)
Thursday, March 1, 12

49. MIDDLE EARTH!
Thursday, March 1, 12

50. OTHER SERVICES
❖ http://ipinfodb.com/ (uses IP2Location lite DB)
❖ http://www.hostip.info/use.html (crowd-sourced)
Thursday, March 1, 12

51. COMPARISON
❖ http://ipinfodb.com/ip_locator.php?ip=84.92.229.1
‣ Country : UK
‣ State/Province : ENGLAND
‣ City : SHEFFIELD
‣ Zip or postal code : -
‣ Latitude : 53.383055
‣ Longitude : -1.464795
Thursday, March 1, 12

52. COMPARISON
❖ http://api.hostip.info/get_html.php?ip=84.92.229.1
‣ Country: UNITED KINGDOM (GB)
‣ City: (Unknown city)

Thursday, March 1, 12

53. ALSO
❖ One PHP library to rule them all
❖ http://geocoder-php.org/
❖ Uniﬁed interface for a variety of IP-Based geocoding
providers
❖ And more
Thursday, March 1, 12

54. LESSONS
❖ Expect failures and word/work around them
❖ Don’t assume REMOTE_ADDR is correct
‣ Might be a proxy
Thursday, March 1, 12

55. GEOCODING AND MORE
Thursday, March 1, 12

56. CONVERSIONS
❖ Geocoding: address ➔ latitude & longitude
❖ Reverse geocoding: latitude & longitude ➔ address
❖ Not always reversible, e.g. “Farallon Islands”
❖ Disparity of results:
latlng=37.759947,-122.46866&sensor=true
latlng=45.499148,-73.566488&sensor=true
Thursday, March 1, 12

57. SERVICES/LIBRARIES
❖ PHP: http://geocoder-php.org/
❖ Ruby: http://highearthorbit.com/geocommons-open-
sourced-geocoder/
❖ Check ToS before launching publicly
Thursday, March 1, 12

58. LOCATION EXTRACTION
❖ Location is not always given in a structured format
❖ Blog posts, news clippings, status updates may contain
embedded mentions of locations
❖ Goal is to extract these with as much precision as
possible
Thursday, March 1, 12

59. LOCATION EXTRACTION
❖ Yahoo! PlaceMaker does this
❖ Disambiguates found locations and returns unique IDs
(WOEIDs)
‣ "New York", "New York City", "NYC", and "the Big
Apple" are all variant names for WOEID 2459115
❖ These can be used with Yahoo GeoPlanet
Thursday, March 1, 12

60. STORING AND SEARCHING
Thursday, March 1, 12

61. MONGO DB
❖ 2D geospatial indexes
❖ Location is an object or array at least 2 elements,
which are interpreted as coordinates
❖ Implementation via geohashes on top of B-trees
❖ Supports multiple locations per document
❖ Uses GeoJSON spec - [long,lat]
Thursday, March 1, 12

62. MONGO DB
❖ Creating
❖ Querying
db.places.ensureIndex({loc:  "2d"})
db.places.save({loc:  [-­‐91.23,  28.25]})
db.places.find({loc:  {\$near:  [-­‐91,28],  \$maxDistance:  5}})
db.runCommand({geoNear:  "places",  near:  [-­‐91,28],  num:10})
Thursday, March 1, 12

63. MONGO DB
❖ Supports Euclidian (default) and spherical models
var  earthRadius  =  6378;  /*  km  */
db.places.find({loc:  {\$nearSphere:  [20,50],  \$maxDistance:  range}})
distances  =  db.runCommand({geoNear:  "places",  near:  [20,  50],
spherical:  true,  maxDistance:  range}).results;
pointDistance  =  distances[0].dis  *  earthRadius  //  back  to  km
Thursday, March 1, 12

64. MONGO DB
❖ Bounded queries: circle, bounding box, polygon (>= 1.9)
❖ Note that bounding box is speciﬁed via bottom left/top
right corners
box  =  [[-­‐73.99756,  40.73083],  [-­‐73.988135,  40.741404]]
db.places.find({loc:  {"\$within":  {"\$box"  :  box}}})
Thursday, March 1, 12

65. MONGO DB
❖ Limitations
‣ only 1 geospatial index per collection
‣ some query limitations with multi-location docs
‣ somewhat awkward and inconsistent query syntax
‣ sharding on geo keys doesn’t quite work yet
Thursday, March 1, 12

66. MYSQL
❖ Based on the OpenGIS model
❖ Supports points, lines, polygons
❖ Implemented using R-trees
Thursday, March 1, 12

67. MYSQL
❖ Creating
❖ Inserting data
CREATE  TABLE  geom  (g  GEOMETRY  NOT  NULL,  SPATIAL  INDEX(g))
ENGINE=MyISAM;
INSERT  INTO  geom  VALUES  (GeomFromText('POINT(1  1)'))
or
//  not  OpenGIS  standard
INSERT  INTO  geom  VALUES(Point(1,1));
Thursday, March 1, 12

68. MYSQL
❖ Querying
❖ Only MyISAM tables support SPATIAL indices
❖ Sphinx search engine supports geo-distance search as
well
SELECT  MBRContains(
GeomFromText(  'POLYGON((0  0,0  3,3  3,3  0,0  0))'  ),
coord
)  from  geom;
Thursday, March 1, 12

69. ELASTICSEARCH
❖ Mapping type geo_point
❖ Implemented via geohashes
❖ Takes string, array, or object, or geohash
representations on insert
❖ Uses GeoJSON ordering
Thursday, March 1, 12

70. ELASTICSEARCH
❖ Mapping speciﬁcation
{
"mydoc"  :  {
"properties"  :  {
"location"  :  {
"type"  :  "geo_point"
}
}
}
}
Thursday, March 1, 12

71. ELASTICSEARCH
❖ Inserting
curl  -­‐XPUT  localhost:9200/myindex/mydoc/1  -­‐d'{
"location"  :  "-­‐83,47"
}'
curl  -­‐XPUT  localhost:9200/myindex/mydoc/2  -­‐d'{
"location"  :  {"lat":  47,  "lon":  -­‐83}
}'
Thursday, March 1, 12

72. ELASTICSEARCH
❖ Can query by distance, bounding box, or polygon
❖ Does not support rich OpenGIS models, lines
❖ Bounding box is speciﬁed via top left/bottom right
corners
Thursday, March 1, 12

73. ELASTICSEARCH
{
"filtered"  :  {
"query"  :  {
"match_all"  :  {}
},
"filter"  :  {
"geo_distance"  :  {
"distance"  :  "200km",
"pin.location"  :  {
"lat"  :  40,
"lon"  :  -­‐70
}}}}}
Thursday, March 1, 12

74. ELASTICSEARCH
❖ geo + text = power
‣ Find all businesses with ‘pizza delivery’ in the description
within 5 miles from my location and segment the results
1, 2, and 5 miles
Thursday, March 1, 12

75. POSTGRES
❖ PostGIS = spatial extensions to Postgres
❖ Follows OpenGIS standard
❖ Backend spatial database for geographic information
systems (GIS)
Thursday, March 1, 12

76. MORE…
Thursday, March 1, 12

77. DATASETS
❖ http://freegisdata.rtwilson.com/
❖ Neighborhoods
‣ http://www.zillow.com/howto/api/neighborhood-boundaries.htm
❖ Streets
‣ http://www.census.gov/geo/www/tiger/
‣ OpenStreetMap
❖ Other
‣ Flickr shapeﬁles
‣ Yahoo GeoPlanet data
Thursday, March 1, 12

78. APIS/SERVICES
❖ SimpleGeo -> UrbanAirship
❖ Web Maps Studio
❖ GeoAPI
❖ POIs: Google Places, Foursquare, Factual
Thursday, March 1, 12

79. APIS/SERVICES
❖ http://www.earthtools.org/webservices.htm
❖ http://www.geonames.org/export/web-services.html
❖ http://www.infochimps.com/products/geo
Thursday, March 1, 12

80. MAPPING
❖ Mapbox, CartoDB, TileMill, OSM, Mapnik, Leaﬂet,
Polymaps
❖ Route planner:
‣ http://graphserver.github.com/graphserver/
Thursday, March 1, 12

81. MERCI!
http://joind.in/6031
Thursday, March 1, 12