Slide 1

Slide 1 text

A KNAPSACK OF GEOTOOLS more  than  just  Google  Maps Andrei  Zmievski  •  DPC  •  June  9,  2012

Slide 2

Slide 2 text

ME ❖ Andrei Z. ❖ Architect at AppDynamics ❖ PHP core dev, Smarty, PHP-GTK, Unicode project ❖ Coding, beer, brewing, photography, travel ❖ @a

Slide 3

Slide 3 text

GEOGRAPHY

Slide 4

Slide 4 text

INTRO ❖ Location is more important than ever ❖ So is basic understanding of geo-related principles ❖ Survey of tools, services, and technologies ❖ Not an exhaustive reference

Slide 5

Slide 5 text

REPRESENTING LOCATION

Slide 6

Slide 6 text

SPHERICAL COORDINATES ❖ Latitude (ϕ, phi, lat.) ❖ Longitude (λ, lambda, long.)

Slide 7

Slide 7 text

LATITUDE/LONGITUDE “webbing” = conjugate graticule

Slide 8

Slide 8 text

LATITUDE/LONGITUDE ❖ Lines of equal latitude are called parallels ‣ 0º parallel = equator ‣ 23º26’N parallel = Tropic of Cancer ‣ 23º26’S parallel = Tropic of Capricorn ❖ Lines of equal latitude are called meridians ‣ 0° meridian = Prime Meridian ‣ Antipodes of Prime Meridian = 180°W and 180°E

Slide 9

Slide 9 text

LATITUDE/LONGITUDE ❖ Measured in DMS or decimal degrees ❖ Latitude: 37° 45' 35” = 37.76° ❖ Longitude: -122° 28' 11 = -122.47° ❖ Positive latitude is N(orth), negative is S(outh) ❖ Positive longitudes are E(ast) of 0°, negative are W(est)

Slide 10

Slide 10 text

LATITUDE/LONGITUDE ❖ Latitude + longitude is enough to specify any location on the planet ❖ Does not consider height/depth

Slide 11

Slide 11 text

LATITUDE/LONGITUDE ❖ Earth != sphere ❖ Earth = oblate spheroid ❖ 6,378 km (equatorial) <= Radius => 6,357 km (polar)

Slide 12

Slide 12 text

LATITUDE/LONGITUDE ❖ 1” of latitude = 30.7 m ❖ 1” of longitude varies with latitude latitude town degree minute second 60° Saint-Petersburg 55.65 km 0.93 km 15.42 m 51.47° Greenwich 69.29 km 1.15 km 19.24 m 45° Bordeaux 78.70 km 1.31 km 21.86 m 30° New Orleans 96.39 km 1.61 km 26.77 m 0° Quito 111.3 km 1.86 km 30.92 m (On the GRS80 or WGS84 spheroid at sea level)

Slide 13

Slide 13 text

LATITUDE/LONGITUDE ❖ Computer representations: ‣ “37.253,-122.0139” ‣ [37.253,-122.0139] ‣ {lat: 37.253, long:-122.0139} ❖ Historically, lat,long ❖ GeoJSON specifies long,lat because in geometry x-axis comes first

Slide 14

Slide 14 text

GOTCHAS ❖ International Date Line discontinuity ❖ Latitudes smoothly wrap at the poles: from < 90° over the pole and back to < 90°, then down to -90°, over the pole, and back to > -90° ❖ Longitude has a discontinuity: 180°E (+180°) turns into 180°W (-180°W) ❖ This should be taken into account when making bounding box and other calculations

Slide 15

Slide 15 text

DISTANCE ❖ A bit more complicated than the normal Pythagorean planar distance formula ❖ All angle measurements are in radians, not degrees rad = deg ⇡ 180

Slide 16

Slide 16 text

HAVERSINE ❖ Calculates the great-circle (orthodrome) distance ❖ Well-conditioned even for small distances a = sin 2 ✓ lat 2 ◆ + cos ( lat1) cos ( lat2) sin 2 ✓ long 2 ◆ c = 2 atan 2( p a, p 1 a ) d = R · c

Slide 17

Slide 17 text

SPHERICAL LAW OF COSINES ❖ These days IEEE 754 64-bit floating-point numbers provide 15 significant figures of precision ❖ Enough to use a simple trigonometry law ❖ Gives well-conditioned results down to 1 meter d = acos ( sin ( lat1) sin ( lat2) + cos ( lat1) cos ( lat2) cos ( long )) · R

Slide 18

Slide 18 text

VINCENTY FORMULA ❖ Does not assume spherical Earth ❖ Precision = 0.5 mm! ❖ Iterative calculation ❖ Explanation and JS implementation: ❖ http://www.movable-type.co.uk/scripts/latlong- vincenty.html

Slide 19

Slide 19 text

MIDPOINT ❖ Averaging is approximate at distances < 400 km ❖ Geographic midpoint method

Slide 20

Slide 20 text

1.Convert lat/long to radians 2.Convert lat/long to Cartesian coordinates MIDPOINT lat1 = lat1 ⇡ 180 lon1 = lon1 ⇡ 180 X1 = cos ( lat1) cos ( lon1) Y1 = cos ( lat1) sin ( lon1) Z1 = sin ( lat1)

Slide 21

Slide 21 text

3.Compute weighted average MIDPOINT x = 1 n n X 1 Xn = X1 + X2 + . . . + Xn n y = 1 n n X 1 Yn = Y1 + Y2 + . . . + Yn n z = 1 n n X 1 Zn = Z1 + Z2 + . . . + Zn n

Slide 22

Slide 22 text

4.Convert Cartesian back into latitude/longitude 5.Convert back to degrees MIDPOINT lon = atan 2( y, x ) hyp = p x 2 + y 2 lat = atan 2( z, hyp ) lat = lat 180 ⇡ lon = lon 180 ⇡

Slide 23

Slide 23 text

❖ At small distances (and closer to equator), good enough ‣ i.e. 2 km east + 5 km north = 5.38 km from origin ‣ but at 100x distances the error increases ❖ Allows for simpler calculations ❖ But not correct, depends on projection EUCLIDIAN GEOMETRY

Slide 24

Slide 24 text

MANY OTHERS ❖ Universal Transverse Mercator (UTM) ❖ Military Grid Reference System (MGRS) ❖ World Geographic Reference System (GEOREF)

Slide 25

Slide 25 text

GEOREF SYSTEM

Slide 26

Slide 26 text

GEOHASH

Slide 27

Slide 27 text

GEOHASH ❖ Hierarchical spatial data structure which subdivides space into grid buckets ❖ Uses clever interleaved encoding scheme ❖ 9q8yyy is 37.78,-122.4 ❖ 9q8yyyd3b11 is 37.78504,-122.39559

Slide 28

Slide 28 text

ADVANTAGES ❖ arbitrary precision, gradual degradation ❖ efficient encoding geohash  length lat  bits long  bits km  error 1 2 3 ±2500 2 5 5 ±630 3 7 8 ±78 4 10 10 ±20 5 12 13 ±2.4 6 15 15 ±0.61 7 17 18 ±0.076 8 20 20 ±0.019

Slide 29

Slide 29 text

ADVANTAGES ❖ easily shareable ❖ denotes an area of arbitrary size ❖ can be used for a simple version of clustering

Slide 30

Slide 30 text

LIMITATIONS ❖ hard to determine adjacency ❖ points close to each other can be in different cells ❖ not good for distance calculations ❖ projection-based model: given prefix length describes a much different region size near the equator than near the pole

Slide 31

Slide 31 text

TEXTUAL

Slide 32

Slide 32 text

TEXTUAL ❖ Point of interest ❖ Address ❖ Zip code ❖ Neighborhood ❖ City/country name

Slide 33

Slide 33 text

GEOLOCATION METHODS

Slide 34

Slide 34 text

OLD TECH ❖ Traditional maps ❖ Points of interest ❖ Celestial navigation, aka astronavigation

Slide 35

Slide 35 text

NEW TECH ❖ IP address ❖ GPS/GLONASS satellites (< 10 m) ❖ Cell tower ID (200 m - 32 km accuracy) ❖ WiFi base stations (< 100 m) ❖ Ping times to well known servers ‣ http://www.slac.stanford.edu/comp/net/wan-mon/tulip/

Slide 36

Slide 36 text

IMPLEMENTATIONS ❖ GPS devices ‣ accurate, but slower to acquire signal ‣ especially in cities, with poor signal conditions ❖ Mobile (iPhone/Android/other) ‣ mostly uses A-GPS ‣ use network to ask an assistance server for GPS satellite data

Slide 37

Slide 37 text

IMPLEMENTATIONS ❖ Browser-based ‣ API uses Location Information Servers ‣ Common sources of location information: IP address, Wi-Fi/Bluetooth MAC address, GPS, GSM/CDMA ID, etc ‣ Well-supported by modern browsers

Slide 38

Slide 38 text

BROWSER GEOLOCATION if  (navigator.geolocation)  {   navigator.geolocation.getCurrentPosition(       function  (position)  {  /*  do  something  */  },     function  (error)  //  error  callback     {       switch(error.code)         {         case  error.TIMEOUT:           break;         case  error.POSITION_UNAVAILABLE:           break;         case  error.PERMISSION_DENIED:           break;         case  error.UNKNOWN_ERROR:           break;       }     });   } } else  //  no  geolocation

Slide 39

Slide 39 text

GEOIP

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

THIS IS TERRIBLE

Slide 42

Slide 42 text

MAXMIND DB ❖ Most comprehensive IP positioning DB ‣ GeoLite City: free, decent coverage, less accurate, monthly updates ‣ GeoIP City: $370 initial, $90/month, weekly updates, more coverage, better accuracy

Slide 43

Slide 43 text

COVERAGE ❖ GeoLite City — over 99.5% on a country level and 79% on a city level for the US within a 25 mile radius country correct incorrect United  States 78% 17% Canada 81% 18% Kazakhstan 84% 18%

Slide 44

Slide 44 text

MAXMIND DB ❖ GeoIP C library ❖ Used by PHP, Python, etc ‣ http://pecl.php.net/package/geoip (older) ‣ https://github.com/Zakay/geoip (newer) ‣ http://www.maxmind.com/app/python (older) ‣ https://github.com/appliedsec/pygeoip (newer)

Slide 45

Slide 45 text

GEOIP IN PHP print_r(geoip_record_by_name('67.160.202.223')); Array ( [continent_code] => NA [country_code] => US [country_code3] => USA [country_name] => United States [region] => CA [city] => San Francisco [postal_code] => [latitude] => 37.764499664307 [longitude] => -122.42939758301 [dma_code] => 807 [area_code] => 415 )

Slide 46

Slide 46 text

GEOIP IN PHP print_r(geoip_record_by_name('84.92.229.1')); Array ( [continent_code] => EU [country_code] => GB [country_code3] => GBR [country_name] => United Kingdom [region] => H9 [city] => London [postal_code] => [latitude] => 51.500198364258 [longitude] => -0.1262000054121 [dma_code] => 0 [area_code] => 0 )

Slide 47

Slide 47 text

GEOIP IN PHP print_r(geoip_record_by_name('124.28.8.8')); Array ( [continent_code] => AS [country_code] => KR [country_code3] => KOR [country_name] => Korea, Republic of [region] => 13 [city] => Bucheon [postal_code] => [latitude] => 37.498901367188 [longitude] => 126.78309631348 [dma_code] => 0 [area_code] => 0 )

Slide 48

Slide 48 text

GEOIP IN PHP print_r(geoip_record_by_name('28.8.8.8')); Array ( [continent_code] => NA [country_code] => US [country_code3] => USA [country_name] => United States [region] => [city] => [postal_code] => [latitude] => 38 [longitude] => -97 [dma_code] => 0 [area_code] => 0 )

Slide 49

Slide 49 text

MIDDLE EARTH!

Slide 50

Slide 50 text

OTHER SERVICES ❖ http://ipinfodb.com/ (uses IP2Location lite DB) ❖ http://www.hostip.info/use.html (crowd-sourced)

Slide 51

Slide 51 text

COMPARISON ❖ http://ipinfodb.com/ip_locator.php?ip=84.92.229.1 ‣ Country : UK ‣ State/Province : ENGLAND ‣ City : SHEFFIELD ‣ Zip or postal code : - ‣ Latitude : 53.383055 ‣ Longitude : -1.464795

Slide 52

Slide 52 text

COMPARISON ❖ http://api.hostip.info/get_html.php?ip=84.92.229.1 ‣ Country: UNITED KINGDOM (GB) ‣ City: (Unknown city) ☹

Slide 53

Slide 53 text

ALSO ❖ One PHP library to rule them all ❖ http://geocoder-php.org/ ❖ Unified interface for a variety of IP-Based geocoding providers ❖ And more

Slide 54

Slide 54 text

LESSONS ❖ Expect failures and word/work around them ❖ Don’t assume REMOTE_ADDR is correct ‣ Might be a proxy ‣ Check X-Forwarded-For header

Slide 55

Slide 55 text

GEOCODING AND MORE

Slide 56

Slide 56 text

CONVERSIONS ❖ Geocoding: address ➔ latitude & longitude ❖ Reverse geocoding: latitude & longitude ➔ address ❖ Not always reversible, e.g. “Farallon Islands” ❖ Disparity of results: ‣ http://maps.googleapis.com/maps/api/geocode/json? latlng=37.759947,-122.46866&sensor=true ‣ http://maps.googleapis.com/maps/api/geocode/json? latlng=45.499148,-73.566488&sensor=true

Slide 57

Slide 57 text

SERVICES/LIBRARIES ❖ JS: Google/Bing/MapQuest APIs ❖ PHP: http://geocoder-php.org/ ❖ Python: http://code.google.com/p/geopy/ ❖ Ruby: http://highearthorbit.com/geocommons-open- sourced-geocoder/ ❖ Check ToS before launching publicly

Slide 58

Slide 58 text

LOCATION EXTRACTION ❖ Location is not always given in a structured format ❖ Blog posts, news clippings, status updates may contain embedded mentions of locations ❖ Goal is to extract these with as much precision as possible

Slide 59

Slide 59 text

LOCATION EXTRACTION ❖ Yahoo! PlaceMaker does this ❖ Disambiguates found locations and returns unique IDs (WOEIDs) ‣ "New York", "New York City", "NYC", and "the Big Apple" are all variant names for WOEID 2459115 ❖ These can be used with Yahoo GeoPlanet

Slide 60

Slide 60 text

STORING AND SEARCHING

Slide 61

Slide 61 text

MONGO DB ❖ 2D geospatial indexes ❖ Location is an object or array at least 2 elements, which are interpreted as coordinates ❖ Implementation via geohashes on top of B-trees ❖ Supports multiple locations per document ❖ Uses GeoJSON spec - [long,lat]

Slide 62

Slide 62 text

MONGO DB ❖ Creating ❖ Querying db.places.ensureIndex({loc:  "2d"}) db.places.save({loc:  [-­‐91.23,  28.25]}) db.places.find({loc:  {$near:  [-­‐91,28],  $maxDistance:  5}}) db.runCommand({geoNear:  "places",  near:  [-­‐91,28],  num:10})

Slide 63

Slide 63 text

MONGO DB ❖ Supports Euclidian (default) and spherical models var  earthRadius  =  6378;  /*  km  */ var  range  =  30  /  earthRadius;  /*  to  radians  */ db.places.find({loc:  {$nearSphere:  [20,50],  $maxDistance:  range}}) distances  =  db.runCommand({geoNear:  "places",  near:  [20,  50],          spherical:  true,  maxDistance:  range}).results; pointDistance  =  distances[0].dis  *  earthRadius  //  back  to  km

Slide 64

Slide 64 text

MONGO DB ❖ Bounded queries: circle, bounding box, polygon (>= 1.9) ❖ Note that bounding box is specified via bottom left/top right corners box  =  [[-­‐73.99756,  40.73083],  [-­‐73.988135,  40.741404]] db.places.find({loc:  {"$within":  {"$box"  :  box}}})

Slide 65

Slide 65 text

MONGO DB ❖ Limitations ‣ only 1 geospatial index per collection ‣ some query limitations with multi-location docs ‣ somewhat awkward and inconsistent query syntax ‣ sharding on geo keys doesn’t quite work yet

Slide 66

Slide 66 text

MYSQL ❖ Based on the OpenGIS model ❖ Supports points, lines, polygons ❖ Implemented using R-trees

Slide 67

Slide 67 text

MYSQL ❖ Creating ❖ Inserting data CREATE  TABLE  geom  (g  GEOMETRY  NOT  NULL,  SPATIAL  INDEX(g))   ENGINE=MyISAM; INSERT  INTO  geom  VALUES  (GeomFromText('POINT(1  1)')) or //  not  OpenGIS  standard INSERT  INTO  geom  VALUES(Point(1,1));

Slide 68

Slide 68 text

MYSQL ❖ Querying ❖ Only MyISAM tables support SPATIAL indices ❖ Sphinx search engine supports geo-distance search as well SELECT  MBRContains(   GeomFromText(  'POLYGON((0  0,0  3,3  3,3  0,0  0))'  ),   coord )  from  geom;

Slide 69

Slide 69 text

ELASTICSEARCH ❖ Mapping type geo_point ❖ Implemented via geohashes ❖ Takes string, array, or object, or geohash representations on insert ❖ Uses GeoJSON ordering

Slide 70

Slide 70 text

ELASTICSEARCH ❖ Mapping specification {        "mydoc"  :  {                "properties"  :  {                        "location"  :  {                                "type"  :  "geo_point"                        }                }        } }

Slide 71

Slide 71 text

ELASTICSEARCH ❖ Inserting curl  -­‐XPUT  localhost:9200/myindex/mydoc/1  -­‐d'{      "location"  :  "-­‐83,47" }' curl  -­‐XPUT  localhost:9200/myindex/mydoc/2  -­‐d'{      "location"  :  {"lat":  47,  "lon":  -­‐83} }'

Slide 72

Slide 72 text

ELASTICSEARCH ❖ Can query by distance, bounding box, or polygon ❖ Does not support rich OpenGIS models, lines ❖ Bounding box is specified via top left/bottom right corners

Slide 73

Slide 73 text

ELASTICSEARCH {        "filtered"  :  {                "query"  :  {                        "match_all"  :  {}                },                "filter"  :  {                        "geo_distance"  :  {                                "distance"  :  "200km",                                "pin.location"  :  {                                        "lat"  :  40,                                        "lon"  :  -­‐70 }}}}}

Slide 74

Slide 74 text

ELASTICSEARCH ❖ geo + text = power ‣ Find all businesses with ‘pizza delivery’ in the description within 5 miles from my location and segment the results by 1, 2, and 5 miles

Slide 75

Slide 75 text

POSTGRES ❖ PostGIS = spatial extensions to Postgres ❖ Follows OpenGIS standard ❖ Backend spatial database for geographic information systems (GIS)

Slide 76

Slide 76 text

MORE…

Slide 77

Slide 77 text

DATASETS ❖ http://freegisdata.rtwilson.com/ ❖ Neighborhoods ‣ http://www.zillow.com/howto/api/neighborhood-boundaries.htm ❖ Streets ‣ http://www.census.gov/geo/www/tiger/ ‣ OpenStreetMap ❖ Other ‣ Flickr shapefiles ‣ Yahoo GeoPlanet data

Slide 78

Slide 78 text

APIS/SERVICES ❖ SimpleGeo -> UrbanAirship ❖ Web Maps Studio ❖ GeoAPI ❖ POIs: Google Places, Foursquare, Factual

Slide 79

Slide 79 text

APIS/SERVICES ❖ http://www.earthtools.org/webservices.htm ❖ http://www.geonames.org/export/web-services.html ❖ http://www.infochimps.com/products/geo

Slide 80

Slide 80 text

MAPPING ❖ Mapbox, CartoDB, TileMill, OSM, Mapnik, Leaflet, Polymaps ❖ Route planner: ‣ http://graphserver.github.com/graphserver/

Slide 81

Slide 81 text

DANK U WEL! joind.in/6285