Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Geospatial Indexing with MongoDB

Avatar for mongodb mongodb
June 29, 2011

Geospatial Indexing with MongoDB

An overall introduction to GeoSpatial querying support in MongoDB, along with a tour of some of the new features introduced in 1.9.

Avatar for mongodb

mongodb

June 29, 2011
Tweet

More Decks by mongodb

Other Decks in Programming

Transcript

  1. Next 45 minutes • Intro to geospatial indexing in MongoDB

    • More advanced Geo queries >> 1.7 ◦ Spherical Queries >> 1.9 ◦ Polygon Queries ◦ Multi-location documents • Tips / FAQs / gotchas / kitchen sink throughout
  2. What's it for? "Find all other players near me" "Find

    all downtown businesses" • Uses standard collections • BSON documents have "location" objects: db.players.insert({ user : "gmoney$", lvl : 42, pos : [ 30.1, 40.2 ] }) db.biz.insert({ name : "Leonardo's Pizza", loc : [ -32.5, 66.7 ] })
  3. Other locations May also use: BSON : { x :

    30.1, y : 40.2 } BSON : { long : 30.1, lat : 40.2 } BSON : { a : 30.1, b : 40.2 } Java : BasicDBObject().append( "x", 30.1 ).append( "y", 40.2 ); Javascript : ... just use arrays
  4. Indexing Non-geo index: coll.ensureIndex({ pos : 1 }) // fast

    exact & range Geo index: coll.ensureIndex({ pos : "2d" }) // fast nearby & shape ( other fields can be added too! )
  5. Find using Geo indexes > db.ex.drop() > for( var i

    = 0; i < 100; i++) ... db.ex.insert({ pos : [ i % 10, ... Math.floor( i / 10 ) ] }) > db.ex.ensureIndex({ pos : "2d" }) > db.ex.find({ pos : { $near : [5, 5] } }, ... { _id : 0 }).limit(5) { "pos" : [ 5, 5 ] } { "pos" : [ 5, 4 ] } { "pos" : [ 4, 5 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 6, 5 ] }
  6. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  7. Caveats • Two-dimensional • One 2d index per collection •

    Locations specified as sub-doc or array • Ordering in points must be consistent ... and be careful of your hashes • Spherical queries have additional caveats ... we'll get there
  8. Query types Queries with distance returned: db.runCommand({ geoNear : ...

    }) Queries ordered by distance: position : [ $near, $nearSphere ] Queries within bounds, no ordering: position : { $within : [ $center, $box, $polygon, $centerSphere ] } Whenever possible, go for unordered queries.
  9. > db.ex.find({ pos : { $within : { $center :

    [ [ 5, 5 ], 2 ] } } }, { _id : 0 }) { "pos" : [ 5, 5 ] } { "pos" : [ 5, 4 ] } { "pos" : [ 5, 3 ] } { "pos" : [ 4, 5 ] } { "pos" : [ 3, 5 ] } { "pos" : [ 4, 4 ] } { "pos" : [ 4, 6 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 5, 7 ] } { "pos" : [ 6, 4 ] } { "pos" : [ 6, 5 ] } { "pos" : [ 7, 5 ] } // order not preserved { "pos" : [ 6, 6 ] }
  10. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  11. > db.ex.find({ pos : { $within : { $box :

    [[5, 5], [6, 6]] } } }, { _id : 0 }) { "pos" : [ 5, 5 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 6, 5 ] } { "pos" : [ 6, 6 ] } // $within is inclusive
  12. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  13. > db.ex.find({ pos : { $within : { $polygon :

    [[3, 4], [5, 7], [7, 3]] } } }, { _id : 0 }) // Polygons are inclusive and may be concave // Implicitly closed { "pos" : [ 5, 4 ] } { "pos" : [ 5, 5 ] } { "pos" : [ 4, 5 ] } { "pos" : [ 4, 4 ] } { "pos" : [ 3, 4 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 5, 7 ] } { "pos" : [ 6, 4 ] } { "pos" : [ 7, 3 ] } { "pos" : [ 6, 5 ] }
  14. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  15. Query types Queries with distance returned: db.runCommand({ geoNear : ...,

    spherical : true }) Queries ordered by distance: position : [ $near, $nearSphere ] Queries within bounds, no ordering: position : { $within : [ $center, $box, $polygon, $centerSphere ] }
  16. >db.zips.findOne({ zip : 95054 }) { "_id" : ObjectId("4da8d46aa981973d8ef5cf01"), "city"

    : "SANTA CLARA", "zip" : "95054", "loc" : [ -121.95394, // LONGITUDE 37.394673 // LATITUDE ], "pop" : 10370, "state" : "CA" }
  17. Spherical Step 0 Latitude and longitude are angles... 1° longitude

    != 1° latitude != X miles or km also worth remembering... x² + y² != z² and finally... $maxDistance : 1 // × ( Earth radius ) $maxDistance ≈ 6300 km
  18. > var mult = 3963 * (Math.PI / 180) //

    Scale to miles on Earth (very rough approximation) > db.runCommand({'geoNear': 'stops', 'near': [-122.419088, 37.75689], distanceMultiplier: mult }) { "ns" : "bart.stops", "near" : "0100100111000100111110110100100100110100110101001110", "results" : [ { "dis" : 0.31465508413025545, "obj" : { "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "zone_id" : 45, "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, // LONGITUDE 37.752411 // LATITUDE ] } } } ...
  19. > var mult = 3963 // Scale to miles on

    Earth (better) > db.runCommand({'geoNear': 'stops', 'near': [-122.419088, 37.75689], distanceMultiplier: mult, spherical : true }) { "ns" : "bart.stops", "near" : "0100100111000100111110110100100100110100110101001110", "results" : [ { "dis" : 0.3128890255156881 , "obj" : { "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "zone_id" : 45, "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, 37.752411 ] } } } ...
  20. Why radians? Usually used for Earth, but avoids hard-coding Could

    create a spherical "scrabble world" or "mmorpg world" with a globe of any size
  21. Square queries in a curved world What does it mean

    in MongoDB to do something like : > db.ex.insert({ coord : [ -122.418292, 37.752411 ] }) > var dist = 50 /* miles */ / 69 /* mi per degree */ > db.ex.ensureIndex({ coord : "2d" }) > db.ex.find({ coord : { $within : { $box : [[-123 - dist, 37 - dist], [-122 + dist, 38 + dist]] } } }) { "_id" : ObjectId("4dd847f06332a7d659189366"), "coord" : [ -122.418292, 37.752411 ] }
  22. > db.ex.insert({ coord : [ -122.418292, 37.752411 ] }) >

    var dist = 50 /* miles */ / 69 /* mi per degree */ > db.ex.ensureIndex({ coord : "2d" }) > db.ex.find({ coord : { $within : { $center : [[-123, 37], dist ] } } }) { "_id" : ObjectId("4dd847f06332a7d659189366"), "coord" : [ -122.418292, 37.752411 ] }
  23. > db.ex.insert({ coord : [ -122.418292, 37.752411 ] }) >

    db.ex.ensureIndex({ coord : "2d" }) > db.ex.find({ coord : { $within : { $polygon : [ [-125, 30], [-122, 40], [-120, 30] ] } } }) { "_id" : ObjectId("4dd847f06332a7d659189366"), "coord" : [ -122.418292, 37.752411 ] }
  24. Basically... • Non-spherical queries assume flat earth - use with

    care • Longitude and latitude are angles • Distances in spherical queries are entered and returned in radians • No wrapping at the poles and date line - detected and error thrown
  25. > // Appalachian trail > db.ex.insert({ user : "greg", lastSeenAt

    : [ [ -84.19871, 34.61768 ], [ -84.19268, 34.62946 ], [ -84.13631, 34.66603 ] ] }) > db.ex.insert({ user : "bigfoot", lastSeenAt : [ [ -83.93743, 34.74002 ], [ -84.19268, 34.62946 ] ] }) > db.ex.ensureIndex({ lastSeenAt : "2d" })
  26. Multi-location $within > // Find all who passed near a

    station > db.ex.find({ lastSeenAt : { $within : { $centerSphere : [ [-83.8245, 34.77624], 30 / 3963 /* radians */ ] } } }, { _id : 0, lastSeenAt : 0 }) { "user" : "bigfoot" } { "user" : "greg" }
  27. Multi-location (geo)$near > // Find all who passed near a

    station > db.ex.find({ lastSeenAt : { $nearSphere : [-83.8245, 34.77624], $maxDistance : 30 / 3963 /* radians */ } }, { _id : 0, lastSeenAt : 0 }) { "user" : "bigfoot" } { "user" : "greg" } { "user" : "bigfoot" } // same distance { "user" : "greg" } // same distance { "user" : "greg" }
  28. To remember... • Ordered queries ( $near, $nearSphere, geoNear )

    return 1 doc per location { "user" : "bigfoot" } { "user" : "greg" } { "user" : "bigfoot" } { "user" : "greg" } { "user" : "greg" } • Unordered queries ( $within ) return unique documents { "user" : "bigfoot" } { "user" : "greg" }
  29. ... also ... • Multi-locations are not paths, but can

    approximate paths depending on segment length. • Multi-location docs with many locations will perform better with $within queries, as do single-location queries
  30. Other embeddings > db.employees.findOne() { "employee" : "johndoe", "reg_addrs" :

    [ { "name" : "Home", "coord" : [ -84.19268, 34.62946 ] }, { "name" : "Office", "coord" : [ -86.01932, 37.88322 ] } ] } > db.employees.ensureIndex({ "reg_addrs.coord" : "2d" })
  31. Last 40 minutes... • Intro to geospatial indexing in MongoDB

    • More advanced Geo queries >> 1.7 ◦ Spherical Queries >> 1.9 ◦ Polygon Queries ◦ Multi-location documents
  32. Add a Geo index, find via index > db.stops.ensureIndex({'stop_geo': '2d'})

    > // Find the BART station closest to the Foreign Cinema > db.stops.find({ 'stop_geo': {'$near': [-122.419088, 37.75689]}}). limit(1){ "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "stop_lat" : 37.752411, "zone_id" : 45, "stop_lon" : -122.418292, "stop_url" : "http://www.bart.gov/stations/24TH/", "stop_id" : "24TH", "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, 37.75241 ] }
  33. Bounds queries • Used to find items $within a shape

    • Use $box for rectangles: ◦ Must specify the lower-left and upper-right corners > box = [[-73.99756, 40.73083], [-73.988135, 40.741404]] > db.places.find({"loc" : {"$within" : {"$box" : box}}}) • Use $center for circles: > var center = [50, 50] > var radius = 10 > db.places.find({"loc" : {"$within" : {"$center" : [center, radius]}}})
  34. Bounds Queries (Find within a shape) > // Number of

    people living within (roughly)100 miles > // of the Empire State Building. > // Second argument to $center is the radius. > var q = {'$within': {'$center': [[-73.985656, 40.748433], 100 / (3963 * (Math.PI / 180))]}}> var sum = 0 > var cur = db.zips.find({'loc': q}) > for( var i = 0; i < cur.length(); i++ ){ ... sum += cur[i].pop;... } 23266107
  35. Spherical queries • New in 1.7.0 (officially released in 1.8.0)

    • Calculate accurate spherical distances • $nearSphere or $centerSphere ◦ $boxSphere doesn't really make sense :-) • For geoNear add 'spherical: true' option • There are some caveats...
  36. Spherical queries • Points must be stored in decimal degrees

    • Must be specified in longitude / latitude order • All distances returned in radians ◦ Multiply by Earth's radius to get distance in useful units ▪ e.g. ~6371 km or ~3963 miles ◦ Divide by Earth's radius for maxDistance • Doesn't handle wrapping at poles or the transition from -180° to +180° longitude
  37. Spherical example with geoNear > db.runCommand( { 'geoNear': 'stops','near': [-122.419088,

    37.75689], distanceMultiplier: 3963, spherical: true}) // distanceMultiplier is just Earth's radius since distance is in radians { "ns" : "bart.stops", "near" : "0100100111000100111110110100100100110100110101001110", "results" : [ { "dis" : 0.31284408378898926, // 0.31465508413025545 in non-spherical example "obj" : { "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "stop_lat" : 37.752411, "zone_id" : 45, "stop_lon" : -122.418292, "stop_url" : "http://www.bart.gov/stations/24TH/", "stop_id" : "24TH", "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, 37.752411 ] } },
  38. Another spherical example... > // Number of people living within100

    miles of the> // Empire State Building (again). > var sum = 0 > var maxdist = 100 / 3963 > // limit(100) is default if not specified...> var cur = db.zips.find({'loc': {'$nearSphere': [-73.985656, 40.748433], $maxDistance: maxdist}}).limit(60000)> for(var i = 0; i < cur.length(); i++){ ... sum += cur[i]. pop;... }26452212 > // Was 23266107 in our previous non-spherical attempt.
  39. Coming in MongoDB 1.9 • Multi-location Documents > db.places.insert({ addresses

    : [ { name : "Home", loc : [55.5, 42.3] }, { name : "Work", loc : [32.3, 44.2] } ] }) > db.places.ensureIndex({ "addresses.loc" : "2d" }) • Map/Reduce and Group with Geo > var map_func = function(){ emit(this.state, this.pop); } > var reduce_func = function(key, values){ return Array. sum(values); } > var q = {'loc': {'$near': [-73.985656, 40.748433]}} > db.runCommand({mapreduce: 'zips', map: map_func, reduce: reduce_func, out: {inline : 1}, query: q}) ▪ Simple polygon searches We’re Hiring ! [email protected]