Slide 1

Slide 1 text

A KNAPSACK OF GEOTOOLS more than just Google Maps Andrei Zmievski • ConFoo • Mar 1, 2012 Thursday, March 1, 12

Slide 2

Slide 2 text

MOI ❖ Andrei Z. ❖ Architect at AppDynamics ❖ PHP core dev, Smarty, PHP-GTK, Unicode project ❖ Coding, beer, brewing, photography, travel ❖ @a Thursday, March 1, 12

Slide 3

Slide 3 text

INTRO ❖ Location is more important than ever ❖ So is basic understanding of geo-related principles ❖ Survey of tools, services, and technologies ❖ Not an exhaustive reference Thursday, March 1, 12

Slide 4

Slide 4 text

REPRESENTING LOCATION Thursday, March 1, 12

Slide 5

Slide 5 text

SPHERICAL COORDINATES ❖ Latitude (ϕ, phi, lat.) ❖ Longitude (λ, lambda, long.) Thursday, March 1, 12

Slide 6

Slide 6 text

LATITUDE/LONGITUDE “webbing” = conjugate graticule Thursday, March 1, 12

Slide 7

Slide 7 text

LATITUDE/LONGITUDE ❖ Lines of equal latitude are called parallels ‣ 0º parallel = equator ‣ 23º26’N parallel = Tropic of Cancer ‣ 23º26’S parallel = Tropic of Capricorn ❖ Lines of equal latitude are called meridians ‣ 0° meridian = Prime Meridian ‣ Antipodes of Prime Meridian = 180°W and 180°E Thursday, March 1, 12

Slide 8

Slide 8 text

LATITUDE/LONGITUDE ❖ Measured in DMS or decimal degrees ❖ Latitude: 37° 45' 35” = 37.76° ❖ Longitude: -122° 28' 11 = -122.47° ❖ Positive latitude is N(orth), negative is S(outh) ❖ Positive longitudes are E(ast) of 0°, negative are W(est) Thursday, March 1, 12

Slide 9

Slide 9 text

LATITUDE/LONGITUDE ❖ Latitude + longitude is enough to specify any location on the planet ❖ Does not consider height/depth Thursday, March 1, 12

Slide 10

Slide 10 text

LATITUDE/LONGITUDE ❖ Earth != sphere ❖ Earth = oblate spheroid ❖ 6,378 km (equatorial) <= Radius => 6,357 km (polar) Thursday, March 1, 12

Slide 11

Slide 11 text

LATITUDE/LONGITUDE ❖ 1” of latitude = 30.7 m ❖ 1” of longitude varies with latitude latitude town degree minute second 60° Saint-Petersburg 55.65 km 0.93 km 15.42 m 51.47° Greenwich 69.29 km 1.15 km 19.24 m 45° Bordeaux 78.70 km 1.31 km 21.86 m 30° New Orleans 96.39 km 1.61 km 26.77 m 0° Quito 111.3 km 1.86 km 30.92 m (On the GRS80 or WGS84 spheroid at sea level) Thursday, March 1, 12

Slide 12

Slide 12 text

LATITUDE/LONGITUDE ❖ Computer representations: ‣ “37.253,-122.0139” ‣ [37.253,-122.0139] ‣ {lat: 37.253, long:-122.0139} ❖ Historically, lat,long ❖ GeoJSON specifies long,lat because in geometry x-axis comes first Thursday, March 1, 12

Slide 13

Slide 13 text

GOTCHAS ❖ International Date Line discontinuity ❖ Latitudes smoothly wrap at the poles: from < 90° over the pole and back to < 90°, then down to -90°, over the pole, and back to > -90° ❖ Longitude has a discontinuity: 180°E (+180°) turns into 180°W (-180°W) ❖ This should be taken into account when making bounding box and other calculations Thursday, March 1, 12

Slide 14

Slide 14 text

DISTANCE ❖ A bit more complicated than the normal Pythagorean planar distance formula ❖ All angle measurements are in radians, not degrees rad = deg ⇡ 180 Thursday, March 1, 12

Slide 15

Slide 15 text

HAVERSINE ❖ Calculates the great-circle (orthodrome) distance ❖ Well-conditioned even for small distances a = sin 2 ✓ lat 2 ◆ + cos ( lat1) cos ( lat2) sin 2 ✓ long 2 ◆ c = 2 atan 2( p a, p 1 a ) d = R · c Thursday, March 1, 12

Slide 16

Slide 16 text

SPHERICAL LAW OF COSINES ❖ These days IEEE 754 64-bit floating-point numbers provide 15 significant figures of precision ❖ Enough to use a simple trigonometry law ❖ Gives well-conditioned results down to 1 meter d = acos ( sin ( lat1) sin ( lat2) + cos ( lat1) cos ( lat2) cos ( long )) · R Thursday, March 1, 12

Slide 17

Slide 17 text

VINCENTY FORMULA ❖ Does not assume spherical Earth ❖ Precision = 0.5 mm! ❖ Iterative calculation ❖ Explanation and JS implementation: ❖ http://www.movable-type.co.uk/scripts/latlong- vincenty.html Thursday, March 1, 12

Slide 18

Slide 18 text

DISTANCE ❖ Lots of other goodies here: ❖ http://www.movable-type.co.uk/scripts/latlong.html Thursday, March 1, 12

Slide 19

Slide 19 text

MIDPOINT ❖ Averaging is approximate at distances < 400 km ❖ Geographic midpoint method Thursday, March 1, 12

Slide 20

Slide 20 text

1.Convert lat/long to radians 2.Convert lat/long to Cartesian coordinates MIDPOINT lat1 = lat1 ⇡ 180 lon1 = lon1 ⇡ 180 X1 = cos ( lat1) cos ( lon1) Y1 = cos ( lat1) sin ( lon1) Z1 = sin ( lat1) Thursday, March 1, 12

Slide 21

Slide 21 text

3.Compute weighted average MIDPOINT x = 1 n n X 1 Xn = X1 + X2 + . . . + Xn n y = 1 n n X 1 Yn = Y1 + Y2 + . . . + Yn n z = 1 n n X 1 Zn = Z1 + Z2 + . . . + Zn n Thursday, March 1, 12

Slide 22

Slide 22 text

4.Convert Cartesian back into latitude/longitude 5.Convert back to degrees MIDPOINT lon = atan 2( y, x ) hyp = p x 2 + y 2 lat = atan 2( z, hyp ) lat = lat 180 ⇡ lon = lon 180 ⇡ Thursday, March 1, 12

Slide 23

Slide 23 text

❖ At small distances (and closer to equator), good enough ‣ i.e. 2 km east + 5 km north = 5.38 km from origin ‣ but at 100x distances the error increases ❖ Allows for simpler calculations ❖ But not correct, depends on projection EUCLIDIAN GEOMETRY Thursday, March 1, 12

Slide 24

Slide 24 text

MANY OTHERS ❖ Universal Transverse Mercator (UTM) ❖ Military Grid Reference System (MGRS) ❖ World Geographic Reference System (GEOREF) Thursday, March 1, 12

Slide 25

Slide 25 text

GEOREF SYSTEM Thursday, March 1, 12

Slide 26

Slide 26 text

GEOHASH Thursday, March 1, 12

Slide 27

Slide 27 text

GEOHASH ❖ Hierarchical spatial data structure which subdivides space into grid buckets ❖ Uses clever interleaved encoding scheme ❖ 9q8yyy is 37.78, -122.4 ❖ 9q8yyyd3b11 is 37.78504 -122.39559 Thursday, March 1, 12

Slide 28

Slide 28 text

ADVANTAGES ❖ arbitrary precision, gradual degradation ❖ efficient encoding geohash  length lat  bits long  bits km  error 1 2 3 ±2500 2 5 5 ±630 3 7 8 ±78 4 10 10 ±20 5 12 13 ±2.4 6 15 15 ±0.61 7 17 18 ±0.076 8 20 20 ±0.019 Thursday, March 1, 12

Slide 29

Slide 29 text

ADVANTAGES ❖ easily shareable ❖ denotes an area of arbitrary size ❖ can be used for a simple version of clustering Thursday, March 1, 12

Slide 30

Slide 30 text

LIMITATIONS ❖ hard to determine adjacency ❖ points close to each other can be in different cells ❖ not good for distance calculations ❖ projection-based model: given prefix length describes a much different region size near the equator than near the pole Thursday, March 1, 12

Slide 31

Slide 31 text

TEXTUAL Thursday, March 1, 12

Slide 32

Slide 32 text

TEXTUAL ❖ Point of interest ❖ Address ❖ Zip code ❖ Neighborhood ❖ City/country name Thursday, March 1, 12

Slide 33

Slide 33 text

GEOLOCATION METHODS Thursday, March 1, 12

Slide 34

Slide 34 text

OLD TECH ❖ Traditional maps ❖ Points of interest ❖ Celestial navigation, aka astronavigation Thursday, March 1, 12

Slide 35

Slide 35 text

NEW TECH ❖ IP address ❖ GPS/GLONASS satellites (< 10 m) ❖ Cell tower ID (200 m - 32 km accuracy) ❖ WiFi base stations (< 100 m) ❖ Ping times to well known servers ‣ http://www.slac.stanford.edu/comp/net/wan-mon/tulip/ Thursday, March 1, 12

Slide 36

Slide 36 text

IMPLEMENTATIONS ❖ GPS devices ‣ accurate, but slower to acquire signal ‣ especially in cities, with poor signal conditions ❖ Mobile (iPhone/Android/other) ‣ mostly uses A-GPS ‣ use network to ask an assistance server for GPS satellite data Thursday, March 1, 12

Slide 37

Slide 37 text

IMPLEMENTATIONS ❖ Browser-based ‣ API uses Location Information Servers ‣ Common sources of location information: IP address, Wi-Fi/Bluetooth MAC address, GPS, GSM/CDMA ID, etc ‣ Well-supported by modern browsers Thursday, March 1, 12

Slide 38

Slide 38 text

BROWSER GEOLOCATION if  (navigator.geolocation)  {   navigator.geolocation.getCurrentPosition(       function  (position)  {  /*  do  something  */  },     function  (error)  //  error  callback     {       switch(error.code)         {         case  error.TIMEOUT:           break;         case  error.POSITION_UNAVAILABLE:           break;         case  error.PERMISSION_DENIED:           break;         case  error.UNKNOWN_ERROR:           break;       }     });   } } else  //  no  geolocation Thursday, March 1, 12

Slide 39

Slide 39 text

GEOIP Thursday, March 1, 12

Slide 40

Slide 40 text

Thursday, March 1, 12

Slide 41

Slide 41 text

THIS IS TERRIBLE Thursday, March 1, 12

Slide 42

Slide 42 text

MAXMIND DB ❖ Most comprehensive IP positioning DB ‣ GeoLite City: free, decent coverage, less accurate, monthly updates ‣ GeoIP City: $370 initial, $90/month, weekly updates, more coverage, better accuracy Thursday, March 1, 12

Slide 43

Slide 43 text

COVERAGE ❖ GeoLite City — over 99.5% on a country level and 79% on a city level for the US within a 25 mile radius country correct incorrect United  States 78% 17% Canada 81% 18% Kazakhstan 84% 18% Thursday, March 1, 12

Slide 44

Slide 44 text

MAXMIND DB ❖ GeoIP C library ❖ Used by PHP, Python, etc ‣ http://pecl.php.net/package/geoip (older) ‣ https://github.com/Zakay/geoip (newer) ‣ http://www.maxmind.com/app/python (older) ‣ https://github.com/appliedsec/pygeoip (newer) Thursday, March 1, 12

Slide 45

Slide 45 text

GEOIP IN PHP print_r(geoip_record_by_name('67.160.202.223')); Array ( [continent_code] => NA [country_code] => US [country_code3] => USA [country_name] => United States [region] => CA [city] => San Francisco [postal_code] => [latitude] => 37.764499664307 [longitude] => -122.42939758301 [dma_code] => 807 [area_code] => 415 ) Thursday, March 1, 12

Slide 46

Slide 46 text

GEOIP IN PHP print_r(geoip_record_by_name('84.92.229.1')); Array ( [continent_code] => EU [country_code] => GB [country_code3] => GBR [country_name] => United Kingdom [region] => H9 [city] => London [postal_code] => [latitude] => 51.500198364258 [longitude] => -0.1262000054121 [dma_code] => 0 [area_code] => 0 ) Thursday, March 1, 12

Slide 47

Slide 47 text

GEOIP IN PHP print_r(geoip_record_by_name('124.28.8.8')); Array ( [continent_code] => AS [country_code] => KR [country_code3] => KOR [country_name] => Korea, Republic of [region] => 13 [city] => Bucheon [postal_code] => [latitude] => 37.498901367188 [longitude] => 126.78309631348 [dma_code] => 0 [area_code] => 0 ) Thursday, March 1, 12

Slide 48

Slide 48 text

GEOIP IN PHP print_r(geoip_record_by_name('28.8.8.8')); Array ( [continent_code] => NA [country_code] => US [country_code3] => USA [country_name] => United States [region] => [city] => [postal_code] => [latitude] => 38 [longitude] => -97 [dma_code] => 0 [area_code] => 0 ) Thursday, March 1, 12

Slide 49

Slide 49 text

MIDDLE EARTH! Thursday, March 1, 12

Slide 50

Slide 50 text

OTHER SERVICES ❖ http://ipinfodb.com/ (uses IP2Location lite DB) ❖ http://www.hostip.info/use.html (crowd-sourced) Thursday, March 1, 12

Slide 51

Slide 51 text

COMPARISON ❖ http://ipinfodb.com/ip_locator.php?ip=84.92.229.1 ‣ Country : UK ‣ State/Province : ENGLAND ‣ City : SHEFFIELD ‣ Zip or postal code : - ‣ Latitude : 53.383055 ‣ Longitude : -1.464795 Thursday, March 1, 12

Slide 52

Slide 52 text

COMPARISON ❖ http://api.hostip.info/get_html.php?ip=84.92.229.1 ‣ Country: UNITED KINGDOM (GB) ‣ City: (Unknown city) ☹ Thursday, March 1, 12

Slide 53

Slide 53 text

ALSO ❖ One PHP library to rule them all ❖ http://geocoder-php.org/ ❖ Unified interface for a variety of IP-Based geocoding providers ❖ And more Thursday, March 1, 12

Slide 54

Slide 54 text

LESSONS ❖ Expect failures and word/work around them ❖ Don’t assume REMOTE_ADDR is correct ‣ Might be a proxy ‣ Check X-Forwarded-For header Thursday, March 1, 12

Slide 55

Slide 55 text

GEOCODING AND MORE Thursday, March 1, 12

Slide 56

Slide 56 text

CONVERSIONS ❖ Geocoding: address ➔ latitude & longitude ❖ Reverse geocoding: latitude & longitude ➔ address ❖ Not always reversible, e.g. “Farallon Islands” ❖ Disparity of results: ‣ http://maps.googleapis.com/maps/api/geocode/json? latlng=37.759947,-122.46866&sensor=true ‣ http://maps.googleapis.com/maps/api/geocode/json? latlng=45.499148,-73.566488&sensor=true Thursday, March 1, 12

Slide 57

Slide 57 text

SERVICES/LIBRARIES ❖ JS: Google/Bing/MapQuest APIs ❖ PHP: http://geocoder-php.org/ ❖ Python: http://code.google.com/p/geopy/ ❖ Ruby: http://highearthorbit.com/geocommons-open- sourced-geocoder/ ❖ Check ToS before launching publicly Thursday, March 1, 12

Slide 58

Slide 58 text

LOCATION EXTRACTION ❖ Location is not always given in a structured format ❖ Blog posts, news clippings, status updates may contain embedded mentions of locations ❖ Goal is to extract these with as much precision as possible Thursday, March 1, 12

Slide 59

Slide 59 text

LOCATION EXTRACTION ❖ Yahoo! PlaceMaker does this ❖ Disambiguates found locations and returns unique IDs (WOEIDs) ‣ "New York", "New York City", "NYC", and "the Big Apple" are all variant names for WOEID 2459115 ❖ These can be used with Yahoo GeoPlanet Thursday, March 1, 12

Slide 60

Slide 60 text

STORING AND SEARCHING Thursday, March 1, 12

Slide 61

Slide 61 text

MONGO DB ❖ 2D geospatial indexes ❖ Location is an object or array at least 2 elements, which are interpreted as coordinates ❖ Implementation via geohashes on top of B-trees ❖ Supports multiple locations per document ❖ Uses GeoJSON spec - [long,lat] Thursday, March 1, 12

Slide 62

Slide 62 text

MONGO DB ❖ Creating ❖ Querying db.places.ensureIndex({loc:  "2d"}) db.places.save({loc:  [-­‐91.23,  28.25]}) db.places.find({loc:  {$near:  [-­‐91,28],  $maxDistance:  5}}) db.runCommand({geoNear:  "places",  near:  [-­‐91,28],  num:10}) Thursday, March 1, 12

Slide 63

Slide 63 text

MONGO DB ❖ Supports Euclidian (default) and spherical models var  earthRadius  =  6378;  /*  km  */ var  range  =  30  /  earthRadius;  /*  to  radians  */ db.places.find({loc:  {$nearSphere:  [20,50],  $maxDistance:  range}}) distances  =  db.runCommand({geoNear:  "places",  near:  [20,  50],          spherical:  true,  maxDistance:  range}).results; pointDistance  =  distances[0].dis  *  earthRadius  //  back  to  km Thursday, March 1, 12

Slide 64

Slide 64 text

MONGO DB ❖ Bounded queries: circle, bounding box, polygon (>= 1.9) ❖ Note that bounding box is specified via bottom left/top right corners box  =  [[-­‐73.99756,  40.73083],  [-­‐73.988135,  40.741404]] db.places.find({loc:  {"$within":  {"$box"  :  box}}}) Thursday, March 1, 12

Slide 65

Slide 65 text

MONGO DB ❖ Limitations ‣ only 1 geospatial index per collection ‣ some query limitations with multi-location docs ‣ somewhat awkward and inconsistent query syntax ‣ sharding on geo keys doesn’t quite work yet Thursday, March 1, 12

Slide 66

Slide 66 text

MYSQL ❖ Based on the OpenGIS model ❖ Supports points, lines, polygons ❖ Implemented using R-trees Thursday, March 1, 12

Slide 67

Slide 67 text

MYSQL ❖ Creating ❖ Inserting data CREATE  TABLE  geom  (g  GEOMETRY  NOT  NULL,  SPATIAL  INDEX(g))   ENGINE=MyISAM; INSERT  INTO  geom  VALUES  (GeomFromText('POINT(1  1)')) or //  not  OpenGIS  standard INSERT  INTO  geom  VALUES(Point(1,1)); Thursday, March 1, 12

Slide 68

Slide 68 text

MYSQL ❖ Querying ❖ Only MyISAM tables support SPATIAL indices ❖ Sphinx search engine supports geo-distance search as well SELECT  MBRContains(   GeomFromText(  'POLYGON((0  0,0  3,3  3,3  0,0  0))'  ),   coord )  from  geom; Thursday, March 1, 12

Slide 69

Slide 69 text

ELASTICSEARCH ❖ Mapping type geo_point ❖ Implemented via geohashes ❖ Takes string, array, or object, or geohash representations on insert ❖ Uses GeoJSON ordering Thursday, March 1, 12

Slide 70

Slide 70 text

ELASTICSEARCH ❖ Mapping specification {        "mydoc"  :  {                "properties"  :  {                        "location"  :  {                                "type"  :  "geo_point"                        }                }        } } Thursday, March 1, 12

Slide 71

Slide 71 text

ELASTICSEARCH ❖ Inserting curl  -­‐XPUT  localhost:9200/myindex/mydoc/1  -­‐d'{      "location"  :  "-­‐83,47" }' curl  -­‐XPUT  localhost:9200/myindex/mydoc/2  -­‐d'{      "location"  :  {"lat":  47,  "lon":  -­‐83} }' Thursday, March 1, 12

Slide 72

Slide 72 text

ELASTICSEARCH ❖ Can query by distance, bounding box, or polygon ❖ Does not support rich OpenGIS models, lines ❖ Bounding box is specified via top left/bottom right corners Thursday, March 1, 12

Slide 73

Slide 73 text

ELASTICSEARCH {        "filtered"  :  {                "query"  :  {                        "match_all"  :  {}                },                "filter"  :  {                        "geo_distance"  :  {                                "distance"  :  "200km",                                "pin.location"  :  {                                        "lat"  :  40,                                        "lon"  :  -­‐70 }}}}} Thursday, March 1, 12

Slide 74

Slide 74 text

ELASTICSEARCH ❖ geo + text = power ‣ Find all businesses with ‘pizza delivery’ in the description within 5 miles from my location and segment the results 1, 2, and 5 miles Thursday, March 1, 12

Slide 75

Slide 75 text

POSTGRES ❖ PostGIS = spatial extensions to Postgres ❖ Follows OpenGIS standard ❖ Backend spatial database for geographic information systems (GIS) Thursday, March 1, 12

Slide 76

Slide 76 text

MORE… Thursday, March 1, 12

Slide 77

Slide 77 text

DATASETS ❖ http://freegisdata.rtwilson.com/ ❖ Neighborhoods ‣ http://www.zillow.com/howto/api/neighborhood-boundaries.htm ❖ Streets ‣ http://www.census.gov/geo/www/tiger/ ‣ OpenStreetMap ❖ Other ‣ Flickr shapefiles ‣ Yahoo GeoPlanet data Thursday, March 1, 12

Slide 78

Slide 78 text

APIS/SERVICES ❖ SimpleGeo -> UrbanAirship ❖ Web Maps Studio ❖ GeoAPI ❖ POIs: Google Places, Foursquare, Factual Thursday, March 1, 12

Slide 79

Slide 79 text

APIS/SERVICES ❖ http://www.earthtools.org/webservices.htm ❖ http://www.geonames.org/export/web-services.html ❖ http://www.infochimps.com/products/geo Thursday, March 1, 12

Slide 80

Slide 80 text

MAPPING ❖ Mapbox, CartoDB, TileMill, OSM, Mapnik, Leaflet, Polymaps ❖ Route planner: ‣ http://graphserver.github.com/graphserver/ Thursday, March 1, 12

Slide 81

Slide 81 text

MERCI! http://joind.in/6031 Thursday, March 1, 12