Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Geospatial Indexing with MongoDB

mongodb
June 29, 2011

Geospatial Indexing with MongoDB

An overall introduction to GeoSpatial querying support in MongoDB, along with a tour of some of the new features introduced in 1.9.

mongodb

June 29, 2011
Tweet

More Decks by mongodb

Other Decks in Programming

Transcript

  1. Next 45 minutes • Intro to geospatial indexing in MongoDB

    • More advanced Geo queries >> 1.7 ◦ Spherical Queries >> 1.9 ◦ Polygon Queries ◦ Multi-location documents • Tips / FAQs / gotchas / kitchen sink throughout
  2. What's it for? "Find all other players near me" "Find

    all downtown businesses" • Uses standard collections • BSON documents have "location" objects: db.players.insert({ user : "gmoney$", lvl : 42, pos : [ 30.1, 40.2 ] }) db.biz.insert({ name : "Leonardo's Pizza", loc : [ -32.5, 66.7 ] })
  3. Other locations May also use: BSON : { x :

    30.1, y : 40.2 } BSON : { long : 30.1, lat : 40.2 } BSON : { a : 30.1, b : 40.2 } Java : BasicDBObject().append( "x", 30.1 ).append( "y", 40.2 ); Javascript : ... just use arrays
  4. Indexing Non-geo index: coll.ensureIndex({ pos : 1 }) // fast

    exact & range Geo index: coll.ensureIndex({ pos : "2d" }) // fast nearby & shape ( other fields can be added too! )
  5. Find using Geo indexes > db.ex.drop() > for( var i

    = 0; i < 100; i++) ... db.ex.insert({ pos : [ i % 10, ... Math.floor( i / 10 ) ] }) > db.ex.ensureIndex({ pos : "2d" }) > db.ex.find({ pos : { $near : [5, 5] } }, ... { _id : 0 }).limit(5) { "pos" : [ 5, 5 ] } { "pos" : [ 5, 4 ] } { "pos" : [ 4, 5 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 6, 5 ] }
  6. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  7. Caveats • Two-dimensional • One 2d index per collection •

    Locations specified as sub-doc or array • Ordering in points must be consistent ... and be careful of your hashes • Spherical queries have additional caveats ... we'll get there
  8. Query types Queries with distance returned: db.runCommand({ geoNear : ...

    }) Queries ordered by distance: position : [ $near, $nearSphere ] Queries within bounds, no ordering: position : { $within : [ $center, $box, $polygon, $centerSphere ] } Whenever possible, go for unordered queries.
  9. > db.ex.find({ pos : { $within : { $center :

    [ [ 5, 5 ], 2 ] } } }, { _id : 0 }) { "pos" : [ 5, 5 ] } { "pos" : [ 5, 4 ] } { "pos" : [ 5, 3 ] } { "pos" : [ 4, 5 ] } { "pos" : [ 3, 5 ] } { "pos" : [ 4, 4 ] } { "pos" : [ 4, 6 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 5, 7 ] } { "pos" : [ 6, 4 ] } { "pos" : [ 6, 5 ] } { "pos" : [ 7, 5 ] } // order not preserved { "pos" : [ 6, 6 ] }
  10. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  11. > db.ex.find({ pos : { $within : { $box :

    [[5, 5], [6, 6]] } } }, { _id : 0 }) { "pos" : [ 5, 5 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 6, 5 ] } { "pos" : [ 6, 6 ] } // $within is inclusive
  12. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  13. > db.ex.find({ pos : { $within : { $polygon :

    [[3, 4], [5, 7], [7, 3]] } } }, { _id : 0 }) // Polygons are inclusive and may be concave // Implicitly closed { "pos" : [ 5, 4 ] } { "pos" : [ 5, 5 ] } { "pos" : [ 4, 5 ] } { "pos" : [ 4, 4 ] } { "pos" : [ 3, 4 ] } { "pos" : [ 5, 6 ] } { "pos" : [ 5, 7 ] } { "pos" : [ 6, 4 ] } { "pos" : [ 7, 3 ] } { "pos" : [ 6, 5 ] }
  14. • • • • • • • • • •

    • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
  15. Query types Queries with distance returned: db.runCommand({ geoNear : ...,

    spherical : true }) Queries ordered by distance: position : [ $near, $nearSphere ] Queries within bounds, no ordering: position : { $within : [ $center, $box, $polygon, $centerSphere ] }
  16. >db.zips.findOne({ zip : 95054 }) { "_id" : ObjectId("4da8d46aa981973d8ef5cf01"), "city"

    : "SANTA CLARA", "zip" : "95054", "loc" : [ -121.95394, // LONGITUDE 37.394673 // LATITUDE ], "pop" : 10370, "state" : "CA" }
  17. Spherical Step 0 Latitude and longitude are angles... 1° longitude

    != 1° latitude != X miles or km also worth remembering... x² + y² != z² and finally... $maxDistance : 1 // × ( Earth radius ) $maxDistance ≈ 6300 km
  18. > var mult = 3963 * (Math.PI / 180) //

    Scale to miles on Earth (very rough approximation) > db.runCommand({'geoNear': 'stops', 'near': [-122.419088, 37.75689], distanceMultiplier: mult }) { "ns" : "bart.stops", "near" : "0100100111000100111110110100100100110100110101001110", "results" : [ { "dis" : 0.31465508413025545, "obj" : { "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "zone_id" : 45, "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, // LONGITUDE 37.752411 // LATITUDE ] } } } ...
  19. > var mult = 3963 // Scale to miles on

    Earth (better) > db.runCommand({'geoNear': 'stops', 'near': [-122.419088, 37.75689], distanceMultiplier: mult, spherical : true }) { "ns" : "bart.stops", "near" : "0100100111000100111110110100100100110100110101001110", "results" : [ { "dis" : 0.3128890255156881 , "obj" : { "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "zone_id" : 45, "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, 37.752411 ] } } } ...
  20. Why radians? Usually used for Earth, but avoids hard-coding Could

    create a spherical "scrabble world" or "mmorpg world" with a globe of any size
  21. Square queries in a curved world What does it mean

    in MongoDB to do something like : > db.ex.insert({ coord : [ -122.418292, 37.752411 ] }) > var dist = 50 /* miles */ / 69 /* mi per degree */ > db.ex.ensureIndex({ coord : "2d" }) > db.ex.find({ coord : { $within : { $box : [[-123 - dist, 37 - dist], [-122 + dist, 38 + dist]] } } }) { "_id" : ObjectId("4dd847f06332a7d659189366"), "coord" : [ -122.418292, 37.752411 ] }
  22. > db.ex.insert({ coord : [ -122.418292, 37.752411 ] }) >

    var dist = 50 /* miles */ / 69 /* mi per degree */ > db.ex.ensureIndex({ coord : "2d" }) > db.ex.find({ coord : { $within : { $center : [[-123, 37], dist ] } } }) { "_id" : ObjectId("4dd847f06332a7d659189366"), "coord" : [ -122.418292, 37.752411 ] }
  23. > db.ex.insert({ coord : [ -122.418292, 37.752411 ] }) >

    db.ex.ensureIndex({ coord : "2d" }) > db.ex.find({ coord : { $within : { $polygon : [ [-125, 30], [-122, 40], [-120, 30] ] } } }) { "_id" : ObjectId("4dd847f06332a7d659189366"), "coord" : [ -122.418292, 37.752411 ] }
  24. Basically... • Non-spherical queries assume flat earth - use with

    care • Longitude and latitude are angles • Distances in spherical queries are entered and returned in radians • No wrapping at the poles and date line - detected and error thrown
  25. > // Appalachian trail > db.ex.insert({ user : "greg", lastSeenAt

    : [ [ -84.19871, 34.61768 ], [ -84.19268, 34.62946 ], [ -84.13631, 34.66603 ] ] }) > db.ex.insert({ user : "bigfoot", lastSeenAt : [ [ -83.93743, 34.74002 ], [ -84.19268, 34.62946 ] ] }) > db.ex.ensureIndex({ lastSeenAt : "2d" })
  26. Multi-location $within > // Find all who passed near a

    station > db.ex.find({ lastSeenAt : { $within : { $centerSphere : [ [-83.8245, 34.77624], 30 / 3963 /* radians */ ] } } }, { _id : 0, lastSeenAt : 0 }) { "user" : "bigfoot" } { "user" : "greg" }
  27. Multi-location (geo)$near > // Find all who passed near a

    station > db.ex.find({ lastSeenAt : { $nearSphere : [-83.8245, 34.77624], $maxDistance : 30 / 3963 /* radians */ } }, { _id : 0, lastSeenAt : 0 }) { "user" : "bigfoot" } { "user" : "greg" } { "user" : "bigfoot" } // same distance { "user" : "greg" } // same distance { "user" : "greg" }
  28. To remember... • Ordered queries ( $near, $nearSphere, geoNear )

    return 1 doc per location { "user" : "bigfoot" } { "user" : "greg" } { "user" : "bigfoot" } { "user" : "greg" } { "user" : "greg" } • Unordered queries ( $within ) return unique documents { "user" : "bigfoot" } { "user" : "greg" }
  29. ... also ... • Multi-locations are not paths, but can

    approximate paths depending on segment length. • Multi-location docs with many locations will perform better with $within queries, as do single-location queries
  30. Other embeddings > db.employees.findOne() { "employee" : "johndoe", "reg_addrs" :

    [ { "name" : "Home", "coord" : [ -84.19268, 34.62946 ] }, { "name" : "Office", "coord" : [ -86.01932, 37.88322 ] } ] } > db.employees.ensureIndex({ "reg_addrs.coord" : "2d" })
  31. Last 40 minutes... • Intro to geospatial indexing in MongoDB

    • More advanced Geo queries >> 1.7 ◦ Spherical Queries >> 1.9 ◦ Polygon Queries ◦ Multi-location documents
  32. Add a Geo index, find via index > db.stops.ensureIndex({'stop_geo': '2d'})

    > // Find the BART station closest to the Foreign Cinema > db.stops.find({ 'stop_geo': {'$near': [-122.419088, 37.75689]}}). limit(1){ "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "stop_lat" : 37.752411, "zone_id" : 45, "stop_lon" : -122.418292, "stop_url" : "http://www.bart.gov/stations/24TH/", "stop_id" : "24TH", "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, 37.75241 ] }
  33. Bounds queries • Used to find items $within a shape

    • Use $box for rectangles: ◦ Must specify the lower-left and upper-right corners > box = [[-73.99756, 40.73083], [-73.988135, 40.741404]] > db.places.find({"loc" : {"$within" : {"$box" : box}}}) • Use $center for circles: > var center = [50, 50] > var radius = 10 > db.places.find({"loc" : {"$within" : {"$center" : [center, radius]}}})
  34. Bounds Queries (Find within a shape) > // Number of

    people living within (roughly)100 miles > // of the Empire State Building. > // Second argument to $center is the radius. > var q = {'$within': {'$center': [[-73.985656, 40.748433], 100 / (3963 * (Math.PI / 180))]}}> var sum = 0 > var cur = db.zips.find({'loc': q}) > for( var i = 0; i < cur.length(); i++ ){ ... sum += cur[i].pop;... } 23266107
  35. Spherical queries • New in 1.7.0 (officially released in 1.8.0)

    • Calculate accurate spherical distances • $nearSphere or $centerSphere ◦ $boxSphere doesn't really make sense :-) • For geoNear add 'spherical: true' option • There are some caveats...
  36. Spherical queries • Points must be stored in decimal degrees

    • Must be specified in longitude / latitude order • All distances returned in radians ◦ Multiply by Earth's radius to get distance in useful units ▪ e.g. ~6371 km or ~3963 miles ◦ Divide by Earth's radius for maxDistance • Doesn't handle wrapping at poles or the transition from -180° to +180° longitude
  37. Spherical example with geoNear > db.runCommand( { 'geoNear': 'stops','near': [-122.419088,

    37.75689], distanceMultiplier: 3963, spherical: true}) // distanceMultiplier is just Earth's radius since distance is in radians { "ns" : "bart.stops", "near" : "0100100111000100111110110100100100110100110101001110", "results" : [ { "dis" : 0.31284408378898926, // 0.31465508413025545 in non-spherical example "obj" : { "_id" : ObjectId("4daa1892ff76e9c85c180f30"), "stop_lat" : 37.752411, "zone_id" : 45, "stop_lon" : -122.418292, "stop_url" : "http://www.bart.gov/stations/24TH/", "stop_id" : "24TH", "stop_name" : "24th St. Mission BART", "stop_geo" : [ -122.418292, 37.752411 ] } },
  38. Another spherical example... > // Number of people living within100

    miles of the> // Empire State Building (again). > var sum = 0 > var maxdist = 100 / 3963 > // limit(100) is default if not specified...> var cur = db.zips.find({'loc': {'$nearSphere': [-73.985656, 40.748433], $maxDistance: maxdist}}).limit(60000)> for(var i = 0; i < cur.length(); i++){ ... sum += cur[i]. pop;... }26452212 > // Was 23266107 in our previous non-spherical attempt.
  39. Coming in MongoDB 1.9 • Multi-location Documents > db.places.insert({ addresses

    : [ { name : "Home", loc : [55.5, 42.3] }, { name : "Work", loc : [32.3, 44.2] } ] }) > db.places.ensureIndex({ "addresses.loc" : "2d" }) • Map/Reduce and Group with Geo > var map_func = function(){ emit(this.state, this.pop); } > var reduce_func = function(key, values){ return Array. sum(values); } > var q = {'loc': {'$near': [-73.985656, 40.748433]}} > db.runCommand({mapreduce: 'zips', map: map_func, reduce: reduce_func, out: {inline : 1}, query: q}) ▪ Simple polygon searches We’re Hiring ! [email protected]