A KNAPSACK OF GEOTOOLS
more
than
just
Google
Maps
Andrei
Zmievski
•
DPC
•
June
9,
2012
Slide 2
Slide 2 text
ME
❖ Andrei Z.
❖ Architect at AppDynamics
❖ PHP core dev, Smarty, PHP-GTK, Unicode project
❖ Coding, beer, brewing, photography, travel
❖ @a
Slide 3
Slide 3 text
GEOGRAPHY
Slide 4
Slide 4 text
INTRO
❖ Location is more important than ever
❖ So is basic understanding of geo-related principles
❖ Survey of tools, services, and technologies
❖ Not an exhaustive reference
LATITUDE/LONGITUDE
❖ Lines of equal latitude are called parallels
‣ 0º parallel = equator
‣ 23º26’N parallel = Tropic of Cancer
‣ 23º26’S parallel = Tropic of Capricorn
❖ Lines of equal latitude are called meridians
‣ 0° meridian = Prime Meridian
‣ Antipodes of Prime Meridian = 180°W and 180°E
Slide 9
Slide 9 text
LATITUDE/LONGITUDE
❖ Measured in DMS or decimal degrees
❖ Latitude: 37° 45' 35” = 37.76°
❖ Longitude: -122° 28' 11 = -122.47°
❖ Positive latitude is N(orth), negative is S(outh)
❖ Positive longitudes are E(ast) of 0°, negative are W(est)
Slide 10
Slide 10 text
LATITUDE/LONGITUDE
❖ Latitude + longitude is enough to specify any location
on the planet
❖ Does not consider height/depth
Slide 11
Slide 11 text
LATITUDE/LONGITUDE
❖ Earth != sphere
❖ Earth = oblate spheroid
❖ 6,378 km (equatorial) <= Radius => 6,357 km (polar)
Slide 12
Slide 12 text
LATITUDE/LONGITUDE
❖ 1” of latitude = 30.7 m
❖ 1” of longitude varies with latitude
latitude town degree minute second
60° Saint-Petersburg 55.65 km 0.93 km 15.42 m
51.47° Greenwich 69.29 km 1.15 km 19.24 m
45° Bordeaux 78.70 km 1.31 km 21.86 m
30° New Orleans 96.39 km 1.61 km 26.77 m
0° Quito 111.3 km 1.86 km 30.92 m
(On the GRS80 or WGS84 spheroid at sea level)
Slide 13
Slide 13 text
LATITUDE/LONGITUDE
❖ Computer representations:
‣ “37.253,-122.0139”
‣ [37.253,-122.0139]
‣ {lat: 37.253, long:-122.0139}
❖ Historically, lat,long
❖ GeoJSON specifies long,lat because in geometry x-axis
comes first
Slide 14
Slide 14 text
GOTCHAS
❖ International Date Line discontinuity
❖ Latitudes smoothly wrap at the poles: from < 90° over
the pole and back to < 90°, then down to -90°, over the
pole, and back to > -90°
❖ Longitude has a discontinuity: 180°E (+180°) turns into
180°W (-180°W)
❖ This should be taken into account when making
bounding box and other calculations
Slide 15
Slide 15 text
DISTANCE
❖ A bit more complicated than the normal Pythagorean
planar distance formula
❖ All angle measurements are in radians, not degrees
rad = deg
⇡
180
Slide 16
Slide 16 text
HAVERSINE
❖ Calculates the great-circle (orthodrome) distance
❖ Well-conditioned even for small distances
a
=
sin
2
✓
lat
2
◆
+
cos
(
lat1)
cos
(
lat2)
sin
2
✓
long
2
◆
c
= 2
atan
2(
p
a,
p
1
a
)
d
=
R
·
c
Slide 17
Slide 17 text
SPHERICAL LAW OF COSINES
❖ These days IEEE 754 64-bit floating-point numbers
provide 15 significant figures of precision
❖ Enough to use a simple trigonometry law
❖ Gives well-conditioned results down to 1 meter
d
=
acos
(
sin
(
lat1)
sin
(
lat2) +
cos
(
lat1)
cos
(
lat2)
cos
(
long
)) ·
R
Slide 18
Slide 18 text
VINCENTY FORMULA
❖ Does not assume spherical Earth
❖ Precision = 0.5 mm!
❖ Iterative calculation
❖ Explanation and JS implementation:
❖ http://www.movable-type.co.uk/scripts/latlong-
vincenty.html
Slide 19
Slide 19 text
MIDPOINT
❖ Averaging is approximate at distances < 400 km
❖ Geographic midpoint method
Slide 20
Slide 20 text
1.Convert lat/long to radians
2.Convert lat/long to Cartesian coordinates
MIDPOINT
lat1 =
lat1
⇡
180
lon1 =
lon1
⇡
180
X1 =
cos
(
lat1)
cos
(
lon1)
Y1 =
cos
(
lat1)
sin
(
lon1)
Z1 =
sin
(
lat1)
Slide 21
Slide 21 text
3.Compute weighted average
MIDPOINT
x
=
1
n
n
X
1
Xn = X1 +
X2 +
. . .
+
Xn
n
y
=
1
n
n
X
1
Yn = Y1 +
Y2 +
. . .
+
Yn
n
z
=
1
n
n
X
1
Zn = Z1 +
Z2 +
. . .
+
Zn
n
Slide 22
Slide 22 text
4.Convert Cartesian back into latitude/longitude
5.Convert back to degrees
MIDPOINT
lon
=
atan
2(
y, x
)
hyp
=
p
x
2 +
y
2
lat
=
atan
2(
z, hyp
)
lat
=
lat
180
⇡
lon
=
lon
180
⇡
Slide 23
Slide 23 text
❖ At small distances (and closer to equator), good enough
‣ i.e. 2 km east + 5 km north = 5.38 km from origin
‣ but at 100x distances the error increases
❖ Allows for simpler calculations
❖ But not correct, depends on projection
EUCLIDIAN GEOMETRY
Slide 24
Slide 24 text
MANY OTHERS
❖ Universal Transverse Mercator (UTM)
❖ Military Grid Reference System (MGRS)
❖ World Geographic Reference System (GEOREF)
Slide 25
Slide 25 text
GEOREF SYSTEM
Slide 26
Slide 26 text
GEOHASH
Slide 27
Slide 27 text
GEOHASH
❖ Hierarchical spatial data structure which subdivides
space into grid buckets
❖ Uses clever interleaved encoding scheme
❖ 9q8yyy is 37.78,-122.4
❖ 9q8yyyd3b11 is 37.78504,-122.39559
ADVANTAGES
❖ easily shareable
❖ denotes an area of arbitrary size
❖ can be used for a simple version of clustering
Slide 30
Slide 30 text
LIMITATIONS
❖ hard to determine adjacency
❖ points close to each other can be in different cells
❖ not good for distance calculations
❖ projection-based model: given prefix length describes a
much different region size near the equator than near
the pole
Slide 31
Slide 31 text
TEXTUAL
Slide 32
Slide 32 text
TEXTUAL
❖ Point of interest
❖ Address
❖ Zip code
❖ Neighborhood
❖ City/country name
Slide 33
Slide 33 text
GEOLOCATION METHODS
Slide 34
Slide 34 text
OLD TECH
❖ Traditional maps
❖ Points of interest
❖ Celestial navigation, aka astronavigation
Slide 35
Slide 35 text
NEW TECH
❖ IP address
❖ GPS/GLONASS satellites (< 10 m)
❖ Cell tower ID (200 m - 32 km accuracy)
❖ WiFi base stations (< 100 m)
❖ Ping times to well known servers
‣ http://www.slac.stanford.edu/comp/net/wan-mon/tulip/
Slide 36
Slide 36 text
IMPLEMENTATIONS
❖ GPS devices
‣ accurate, but slower to acquire signal
‣ especially in cities, with poor signal conditions
❖ Mobile (iPhone/Android/other)
‣ mostly uses A-GPS
‣ use network to ask an assistance server for GPS
satellite data
Slide 37
Slide 37 text
IMPLEMENTATIONS
❖ Browser-based
‣ API uses Location Information Servers
‣ Common sources of location information: IP address,
Wi-Fi/Bluetooth MAC address, GPS, GSM/CDMA ID, etc
‣ Well-supported by modern browsers
Slide 38
Slide 38 text
BROWSER GEOLOCATION
if (navigator.geolocation) {
navigator.geolocation.getCurrentPosition(
function (position) { /* do something */ },
function (error) // error callback
{
switch(error.code)
{
case error.TIMEOUT:
break;
case error.POSITION_UNAVAILABLE:
break;
case error.PERMISSION_DENIED:
break;
case error.UNKNOWN_ERROR:
break;
}
});
}
}
else // no geolocation
Slide 39
Slide 39 text
GEOIP
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
THIS
IS
TERRIBLE
Slide 42
Slide 42 text
MAXMIND DB
❖ Most comprehensive IP positioning DB
‣ GeoLite City: free, decent coverage, less accurate,
monthly updates
‣ GeoIP City: $370 initial, $90/month, weekly updates,
more coverage, better accuracy
Slide 43
Slide 43 text
COVERAGE
❖ GeoLite City — over 99.5% on a country level and 79%
on a city level for the US within a 25 mile radius
country correct incorrect
United
States 78% 17%
Canada 81% 18%
Kazakhstan 84% 18%
Slide 44
Slide 44 text
MAXMIND DB
❖ GeoIP C library
❖ Used by PHP, Python, etc
‣ http://pecl.php.net/package/geoip (older)
‣ https://github.com/Zakay/geoip (newer)
‣ http://www.maxmind.com/app/python (older)
‣ https://github.com/appliedsec/pygeoip (newer)
Slide 45
Slide 45 text
GEOIP IN PHP
print_r(geoip_record_by_name('67.160.202.223'));
Array
(
[continent_code] => NA
[country_code] => US
[country_code3] => USA
[country_name] => United States
[region] => CA
[city] => San Francisco
[postal_code] =>
[latitude] => 37.764499664307
[longitude] => -122.42939758301
[dma_code] => 807
[area_code] => 415
)
Slide 46
Slide 46 text
GEOIP IN PHP
print_r(geoip_record_by_name('84.92.229.1'));
Array
(
[continent_code] => EU
[country_code] => GB
[country_code3] => GBR
[country_name] => United Kingdom
[region] => H9
[city] => London
[postal_code] =>
[latitude] => 51.500198364258
[longitude] => -0.1262000054121
[dma_code] => 0
[area_code] => 0
)
GEOIP IN PHP
print_r(geoip_record_by_name('28.8.8.8'));
Array
(
[continent_code] => NA
[country_code] => US
[country_code3] => USA
[country_name] => United States
[region] =>
[city] =>
[postal_code] =>
[latitude] => 38
[longitude] => -97
[dma_code] => 0
[area_code] => 0
)
Slide 49
Slide 49 text
MIDDLE EARTH!
Slide 50
Slide 50 text
OTHER SERVICES
❖ http://ipinfodb.com/ (uses IP2Location lite DB)
❖ http://www.hostip.info/use.html (crowd-sourced)
Slide 51
Slide 51 text
COMPARISON
❖ http://ipinfodb.com/ip_locator.php?ip=84.92.229.1
‣ Country : UK
‣ State/Province : ENGLAND
‣ City : SHEFFIELD
‣ Zip or postal code : -
‣ Latitude : 53.383055
‣ Longitude : -1.464795
LOCATION EXTRACTION
❖ Location is not always given in a structured format
❖ Blog posts, news clippings, status updates may contain
embedded mentions of locations
❖ Goal is to extract these with as much precision as
possible
Slide 59
Slide 59 text
LOCATION EXTRACTION
❖ Yahoo! PlaceMaker does this
❖ Disambiguates found locations and returns unique IDs
(WOEIDs)
‣ "New York", "New York City", "NYC", and "the Big
Apple" are all variant names for WOEID 2459115
❖ These can be used with Yahoo GeoPlanet
Slide 60
Slide 60 text
STORING AND SEARCHING
Slide 61
Slide 61 text
MONGO DB
❖ 2D geospatial indexes
❖ Location is an object or array at least 2 elements,
which are interpreted as coordinates
❖ Implementation via geohashes on top of B-trees
❖ Supports multiple locations per document
❖ Uses GeoJSON spec - [long,lat]
MONGO DB
❖ Supports Euclidian (default) and spherical models
var earthRadius = 6378; /* km */
var range = 30 / earthRadius; /* to radians */
db.places.find({loc: {$nearSphere: [20,50], $maxDistance: range}})
distances = db.runCommand({geoNear: "places", near: [20, 50],
spherical: true, maxDistance: range}).results;
pointDistance = distances[0].dis * earthRadius // back to km
Slide 64
Slide 64 text
MONGO DB
❖ Bounded queries: circle, bounding box, polygon (>= 1.9)
❖ Note that bounding box is specified via bottom left/top
right corners
box = [[-‐73.99756, 40.73083], [-‐73.988135, 40.741404]]
db.places.find({loc: {"$within": {"$box" : box}}})
Slide 65
Slide 65 text
MONGO DB
❖ Limitations
‣ only 1 geospatial index per collection
‣ some query limitations with multi-location docs
‣ somewhat awkward and inconsistent query syntax
‣ sharding on geo keys doesn’t quite work yet
Slide 66
Slide 66 text
MYSQL
❖ Based on the OpenGIS model
❖ Supports points, lines, polygons
❖ Implemented using R-trees
Slide 67
Slide 67 text
MYSQL
❖ Creating
❖ Inserting data
CREATE TABLE geom (g GEOMETRY NOT NULL, SPATIAL INDEX(g))
ENGINE=MyISAM;
INSERT INTO geom VALUES (GeomFromText('POINT(1 1)'))
or
// not OpenGIS standard
INSERT INTO geom VALUES(Point(1,1));
Slide 68
Slide 68 text
MYSQL
❖ Querying
❖ Only MyISAM tables support SPATIAL indices
❖ Sphinx search engine supports geo-distance search as
well
SELECT MBRContains(
GeomFromText( 'POLYGON((0 0,0 3,3 3,3 0,0 0))' ),
coord
) from geom;
Slide 69
Slide 69 text
ELASTICSEARCH
❖ Mapping type geo_point
❖ Implemented via geohashes
❖ Takes string, array, or object, or geohash
representations on insert
❖ Uses GeoJSON ordering
ELASTICSEARCH
❖ Can query by distance, bounding box, or polygon
❖ Does not support rich OpenGIS models, lines
❖ Bounding box is specified via top left/bottom right
corners
ELASTICSEARCH
❖ geo + text = power
‣ Find all businesses with ‘pizza delivery’ in the description
within 5 miles from my location and segment the results
by 1, 2, and 5 miles
Slide 75
Slide 75 text
POSTGRES
❖ PostGIS = spatial extensions to Postgres
❖ Follows OpenGIS standard
❖ Backend spatial database for geographic information
systems (GIS)