Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mapping Flatland - Grand Goodale, Massively Fun

mongodb
January 06, 2012

Mapping Flatland - Grand Goodale, Massively Fun

MongoSeattle 2011

mongodb

January 06, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Mapping Flatland Storing and Querying Location Data with MongoDB Grant

    Goodale (@ggoodale) MongoSEA 2011 (#mongosea) December 1, 2011 Friday, January 6, 2012
  2. Why MongoDB? Easy to set up and run Fast Reasonably

    robust node.js driver (much better now) Geospatial indexing and querying Friday, January 6, 2012
  3. Structuring your data tile = { _id : BSON::ObjectId(...) position

    : [0,0], letter : "A", wildcard : "false" } Friday, January 6, 2012
  4. Structuring your data tile = { _id : BSON::ObjectId(...) position

    : {x: 0, y: 0}, letter : "A", wildcard : "false" } Friday, January 6, 2012
  5. Watch your language > db[‘tiles’].insert({ position : {y: 50, x:

    20}, letter : "A", wildcard : "false" }) => BSON::ObjectId('4dd06d037a70183256000004') > db.[‘tiles’].find_one() => {"_id"=>BSON::ObjectId ('4dd06d037a70183256000004'), "letter"=>"A", "position"=>{"x"=>20, "y"=>50}, "wildcard"=>false} Friday, January 6, 2012
  6. Be safe! Use array notation; guaranteed ordering = WIN C++:

    BSONObjBuilder Ruby: Use 1.9.x or OrderedHash in 1.8.x Python: Use OrderedDict (introduced in 2.7) and SON (in the BSON package) Javascript: Did I mention arrays? Friday, January 6, 2012
  7. Creating the index > db[‘tiles’].create_index([[“position”, Mongo::GEO2D]]) => “position_2d” > db[‘tiles’].index_information

    => {"_id_"=>{"name"=>"_id_", "ns"=>"test.test_tiles", "key"=>{"_id"=>1}}, "position_2d"=>{"key"=>{"position"=>"2d"}, "ns"=>"test.test_tiles", "name"=>"position_2d"}} Defaults: Min: -180, Max: 180, bits: 26 Friday, January 6, 2012
  8. Creating the index > db[‘tiles’].create_index( [[“position”, Mongo::GEO2D]], :min => -500,

    :max => 500, :bits => 32 ) => “position_2d” Friday, January 6, 2012
  9. More index fun Only one Geo2D index per collection (SERVER-2331)

    But it can be a compound index: > db[‘tiles’].create_index([ [“position”, Mongo::GEO2D], [“letter”, Mongo::ASCENDING] ]) => “position_2d_letter_1” Queries are prefix-matched on indexes, so put Geo2D first (or use hinting) Friday, January 6, 2012
  10. New 2.0 feature Geo2d indices across an array field! >

    db[‘words’].insert({ “word” : “QI”, “tiles” : [ {“letter” => “Q”, position => [1,1]}, {“letter” => “I”, position => [2,1]} ] }) => BSON::ObjectID('4dd074927a70183256000006') > db[‘words’].create_index([[ “tiles.position”, Mongo::GEO2D ]]) => “position_2d” Friday, January 6, 2012
  11. Problems we don’t have Projection issues Great Circle distance calculation

    Polar coordinate systems Pirates http://www.flickr.com/photos/jmd41280/4501410061/ Friday, January 6, 2012
  12. Querying real location data Search by proximity: $near Uses native

    units (degrees for [-180, 180]) Use $maxDistance to bound query > db[‘tile’].find(:position => {“$near” => [10,10]}).to_a => [{"_id"=>BSON::ObjectId('4dd084ca7a70183256000007'), "letter"=>"A", "position"=>[12,9]}] > db[‘tile’].find(:position => {“$near” => [10,10], “$maxDistance” => 1}).to_a =>[] Friday, January 6, 2012
  13. Querying real location data Need distance to center as well?

    Use $geoNear Also includes fun stats > db.command('geoNear' => 'tiles', 'near' => [1830, 2002], :maxDistance => 10) ) => {"ns"=>"test.tiles", "near"=>"110000000000001100011000110010101010001000001011 1111", "results"=>[{"dis"=>3.999471664428711, "obj"=> {"_id"=>BSON::ObjectId('4dd0b0957a701852bc02bf67'), "position"=>{"x"=>1830, "y"=>2006}, "letter"=>"A"}}], "stats"=>{"time"=>0, "btreelocs"=>3, "nscanned"=>2, "objectsLoaded"=>1, "avgDistance"=>3.999471664428711, "maxDistance"=>3.999471664428711}, "ok"=>1.0} Friday, January 6, 2012
  14. Querying real location data Region queries: $within Example: $box (rectangle)

    > db[‘tile’].find(:position => {“$within” => {“$box” => [[10,10], [30,30]]}).to_a => [{"_id"=>BSON::ObjectId('4dd084ca7a70183256000007'), "letter"=>"A", "position"=>[12,9]}] [10,10] [30,30] Friday, January 6, 2012
  15. Alternately: $center (circle) > db[‘tile’].find(:position => {“$within” => {“$center” =>

    [[10,10], 5]}).to_a => [{"_id"=>BSON::ObjectId('4dd084ca7a70183256000007'), "letter"=>"A", "position"=>[12,9]}] [10,10] 5 Querying real location data Friday, January 6, 2012
  16. Querying real location data New in 2.0: $polygon! > db[‘tile’].find(:position

    => {“$within” => {“$polygon” => [[5,5], [5,15], [15,5]}).to_a => [{"_id"=>BSON::ObjectId('4dd084ca7a70183256000007'), "letter"=>"A", "position"=>[12,9]}] [5,5] [5,15] [15,5] Friday, January 6, 2012
  17. Querying real location data Spherical equivalents: $nearSphere and $centerSphere Uses

    radians, not native units position must be in [long, lat] order! > earthRadius = 6378 #km => 6378 > db[‘restaurants’].find(:position => {“$nearSphere” => [-122.03,36.97], “$maxDistance” => 25.0/earthRadius}).to_a => [{"_id"=>BSON::ObjectId('4dd084ca7a70183256000007'), "name"=>"Crow’s Nest", "position"=>[-122.0,36.96]}] Friday, January 6, 2012
  18. MapReduce MapReduce queries can use Geo2D indices when querying data

    Great for regional analytics: ‘What events did user x trigger within this region’ ‘Which users visited this region in the last 24 hours’ Friday, January 6, 2012
  19. Alternate Geometries Really “Regular Planed Tilings by Regular Polygons”. Or

    “Edge-to-Edge Tilings by Congruent Polygons”. Really. Friday, January 6, 2012
  20. Gotchas Non-uniform distances between adjacent ranks Example: $within => [-1,-1],

    [1,1] -1,-1 0,-1 1,-1 2,-1 -2,0 -1,0 0,0 1,0 2,0 -1,1 0,1 1,1 2,1 Friday, January 6, 2012
  21. Gotchas Non-uniform distances between adjacent ranks Example: $within => [-1,-1],

    [1,1] -1,-1 0,-1 1,-1 2,-1 -2,0 -1,0 0,0 1,0 2,0 -1,1 0,1 1,1 2,1 Friday, January 6, 2012
  22. Gotchas Non-uniform distances between adjacent ranks Example: $within => [-1,-1],

    [1,1] -1,-1 0,-1 1,-1 2,-1 -2,0 -1,0 0,0 1,0 2,0 -1,1 0,1 1,1 2,1 Oops. Friday, January 6, 2012
  23. Gotchas Query engine assumes a regular grid (possibly mapped onto

    a sphere using a standard sinusoidal projection) If you’re using non-square region units, expect to perform secondary processing on the results Friday, January 6, 2012
  24. Again: we’re weird. Big index, but no need for it

    all to be in memory Large numbers of tiny documents Large regions of the world where activity => 0 as density => 1 Single box scaling limit determined by # of active sections of the world at a time Friday, January 6, 2012
  25. Our setup Master/Slave (Nowadays: use a Replica Set) Slaves used

    for backup Map image generation Next stop (at some point): geoSharding Friday, January 6, 2012
  26. Sharding Yes, you can shard on a geo-indexed field Not

    recommended due to query performance (SERVER-1982). Vote it up if you care (and you should). Can’t use $near in queries, only $geoNear and therefore runCommand(). (SERVER-1981) Friday, January 6, 2012