Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hash-Based Sharding in MongoDB 2.4 (MongoDB SF 2013)

Hash-Based Sharding in MongoDB 2.4 (MongoDB SF 2013)

Presentation on the hash-based sharding features of MongoDB given at MongoDB SF 2013.

Brandon Black

May 10, 2013
Tweet

More Decks by Brandon Black

Other Decks in Programming

Transcript

  1. Agenda •  Mechanics of Sharding –  Key space –  Chunks

    –  Balancing •  Request Routing •  Hashed Shard Keys –  Why use hashed shard keys –  How to enable hashed shard keys –  Limitations
  2. Sharded Cluster Node 1 Secondary Config Server Node 1 Secondary

    Config Server Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server
  3. What Is A Shard Key? •  Shard key is used

    to partition your collection •  Shard key must exist in every document •  Shard key is immutable •  Shard key values are immutable •  Shard key must be indexed •  Shard key is used to route requests to shards
  4. The Key Space {x: 10}! -∞ +∞ Key Space {x:

    -5}! {x: -9}!{x: 7}!{x: 6}!{x: 0}!
  5. -∞ +∞ Key Space Inserting Data {x: 0}! {x: 6}!

    {x: 7}! {x: -5}! {x: 10}! {x: -9}!
  6. -∞ +∞ Key Space Inserting Data {x: 0}! {x: 6}!

    {x: 7}! {x: -5}! {x: 10}! {x: -9}!
  7. Chunk Range and Size minKey maxKey minKey maxKey minKey maxKey

    {x: -20} {x: 13} {x: 25} {x: 100,000} 64MB -∞ +∞ Key Space {x: 0}! {x: 6}! {x: 7}! {x: -5}! {x: 10}! {x: -9}!
  8. Inserting Further Data -∞ +∞ Key Space {x: 0}! {x:

    6}! {x: 7}! {x: -5}! {x: 10}! {x: -9}! {x: 9}! {x: -7}! {x: 3}!
  9. Chunk Splitting minKey maxKey minKey 13 14 maxKey -∞ +∞

    Key Space {x: 0}! {x: 6}! {x: 7}! {x: -5}! {x: 10}! {x: -9}! 0 0 •  A chunk is split once it exceeds the maximum size •  There is no split point if all documents have the same shard key •  Chunk split is a logical operation (no data is moved) •  If split creates too large of a discrepancy of chunk count across cluster a balancing round starts
  10. Data Distribution Node 1 Secondary Config Server Shard 1 Mongos

    Mongos Mongos Shard 2 Mongod •  MinKey to 0 lives on Shard1 •  0 to MaxKey lives on Shard2 •  Mongos routes queries appropriately
  11. Mongos Routes Data Node 1 Secondary Config Server Shard 1

    Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey db.test.insert({ x: -1000 })!
  12. Mongos Routes Data Node 1 Secondary Config Server Shard 1

    Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey db.test.insert({ x: -1000 })!
  13. Unbalanced Shards Node 1 Secondary Config Server Shard 1 Mongos

    Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey
  14. Balancing Node 1 Secondary Config Server Shard 1 Mongos Mongos

    Mongos Shard 2 Mongod •  Migration threshold •  Number of chunks less than 20, migration threshold of 2 •  21-80, migration threshold 4 •  >80, migration threshold 8
  15. Moving the chunk Node 1 Secondary Config Server Mongos Shard

    1 Shard 2 Mongod •  One chunk of data is copied from Shard 1 to Shard 2
  16. Committing Migration •  Once everyone agrees the data has moved,

    that chunk gets deleted from Shard 1. Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  17. Cleanup •  Other mongos' have to find out about new

    configuration Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  18. Effects of Migrations •  Expensive •  Can take a long

    time •  Competes for limited resources Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  19. Picking A Shard Key •  Cardinality •  Optimize routing • 

    Minimize (unnecessary) traffic •  Allow best scaling
  20. What About ObjectId? ObjectId("51597ca8e28587b86528edfd”)! •  Used for _id •  12

    byte value •  Generated by the driver if not specified •  Theoretically globally unique
  21. // enabling sharding on test database mongos> sh.enableSharding("test") { "ok"

    : 1 } // sharding the test collection mongos> sh.shardCollection("test.test",{_id:1}) { "collectionsharded" : "test.test", "ok" : 1 } // create a loop inserting data mongos> for (x=0; x<10000; x++) { ... db.test.insert({value:x}) ... } Sharding on ObjectId
  22. shards: { "_id" : "shard0000", "host" : "localhost:30000" } {

    "_id" : "shard0001", "host" : "localhost:30001" } databases: { "_id" : "test", "partitioned" : true, "primary" : "shard0001" } test.test shard key: { "_id" : 1 } chunks: shard0001 3 { "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") } on : shard0001 { "t" : 1000, "i" : 1 } { "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 1000, "i" : 2 } ObjectId Chunk Distribution
  23. ObjectId Results In A “Hot Shard” Node 1 Secondary Config

    Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey
  24. Hashed Shard Key Eliminates “Hot Shard” Node 1 Secondary Config

    Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod minKey à 0 0 à maxKey
  25. Under the Hood •  Create a hashed index used for

    sharding •  Uses the first 64-bits of md5 hash of field •  Hash both data and BSON type •  Represented as a NumberLong in the shell
  26. // hash on 1 as an integer > db.runCommand({_hashBSONElement:1}) {

    "key" : 1, "seed" : 0, "out" : NumberLong("5902408780260971510"), "ok" : 1 } // hash on “1” as a string > db.runCommand({_hashBSONElement:"1"}) { "key" : "1", "seed" : 0, "out" : NumberLong("-2448670538483119681"), "ok" : 1 } Hash on both data and BSON type
  27. // enabling sharding on test database mongos> sh.enableSharding("test") { "ok"

    : 1 } // shard by hashed _id field mongos> sh.shardCollection("test.hash”,{_id:"hashed"}) { "collectionsharded" : "test.hash", "ok" : 1 } Sharding on Hashed ObjectId
  28. databases: { "_id" : "test", "partitioned" : true, "primary" :

    "shard0001" } test.hash shard key: { "_id" : "hashed" } chunks: shard0000 2 shard0001 2 { "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 2 } { "_id" : NumberLong("-4611686018427387902") } --> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 3 } { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 { "t" : 2000, "i" : 4 } { "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 2000, "i" : 5 } Pre-Splitting the Data
  29. // create a loop inserting data mongos> for (x=0; x<10000;

    x++) { ... db.hash.insert({value:x}) ... } Inserting Into Hashed Shard Key Collection
  30. test.hash shard key: { "_id" : "hashed" } chunks: shard0000

    4 shard0001 4 {"_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8} {"_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9} {"_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("-2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6} {"_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 7} { "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743") } on : shard0001 { "t" : 2000, "i" : 12} Even Distribution of Chunks
  31. Hash Keys Are Great for Equality Queries •  Equality queries

    directed to a specific shard •  Will use the index •  Most efficient query possible
  32. mongos> db.hash.find({x:1}).explain() { "cursor" : "BtreeCursor x_hashed", "n" : 1,

    "nscanned" : 1, "nscannedObjects" : 1, "millisShardTotal" : 0, "numQueries" : 1, "numShards" : 1, "indexBounds" : { "x" : [ [ NumberLong("5902408780260971510"), NumberLong("5902408780260971510") ] ] }, "millis" : 0 } Explain Plan of an Equality Query
  33. Not So Good for a Range Query •  Range queries

    scatter gather •  Don’t use the index •  Inefficient query
  34. mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain() { "cursor" : "BasicCursor", "n" : 97,

    "nChunkSkips" : 0, "nYields" : 0, "nscanned" : 1000, "nscannedAllPlans" : 1000, "nscannedObjects" : 1000, "nscannedObjectsAllPlans" : 1000, "millisShardTotal" : 0, "millisShardAvg" : 0, "numQueries" : 2, "numShards" : 2, "millis" : 3 } Explain Plan of a Range Query
  35. Limitations •  Cannot use a compound key •  Key cannot

    have an array value •  Incompatible with tag aware sharding –  Tags would be assigned the value of the hash, not the value of the underlying key •  Key with poor cardinality is going to give a hash with poor cardinality –  Floating point numbers are squashed. E.g. 100.4 will be hashed as 100
  36. Summary •  There are 3 different approaches for sharding • 

    Hash shard keys give great distribution •  Hash shard keys are good for equality •  Pick the right shard key for your application