Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How and When to Scale MongoDB with Sharding

mongodb
April 20, 2012
260

How and When to Scale MongoDB with Sharding

MongoDB Stockholm - How and When to Scale MongoDB with Sharding - Spencer Brody, Software Engineer, 10gen

Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s support for auto-sharding, including an architectural overview, usage patterns, as well as a few example use cases of real world sharded deployments.

mongodb

April 20, 2012
Tweet

Transcript

  1. Architecture client mongos ... mongos mongod mongod ... Shards mongod

    mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod client client client
  2. mongos •  Shard Router •  Acts just like a MongoD

    •  1 or as many as you want •  Can run on App Servers •  Caches meta-data from config servers client mongos ... mongos mongod mongod ... Shards mongod mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod client client client
  3. Config Server •  3 of them •  Changes use 2

    phase commit •  If any are down, meta data goes read only •  System is online as long as 1/3 is up client mongos ... mongos mongod mongod ... Shards mongod mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod client client client
  4. Keys { name: “Jared”, email: “[email protected]”, } { name: “Scott”,

    email: “[email protected]”, } { name: “Dan”, email: “[email protected]”, } > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} )
  5. Chunks Min Key Max Key Shard -∞ [email protected] 1 [email protected]

    [email protected] 1 [email protected] [email protected] 1 [email protected] +∞ 1 •  Stored in the config servers •  Cached in MongoS •  Used to route requests and keep cluster balanced
  6. Balancing Shard 1 Shard 2 Shard 3 Shard 4 5

    9 1 6 10 2 7 11 3 8 12 4 17 21 13 18 22 14 19 23 15 20 24 16 29 33 25 30 34 26 31 35 27 32 36 28 41 45 37 42 46 38 43 47 39 44 48 40 mongos balancer config config config Chunks!
  7. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Imbalance Imbalance
  8. Balancing mongos balancer Move chunk 1 to Shard 2 config

    config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48
  9. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1
  10. Balancing mongos balancer Chunk 1 now lives on Shard 2

    config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48
  11. Queries By Shard Key Routed db.users.find( {email: “[email protected]”}) Sorted by

    shard key Routed in order db.users.find().sort({email:-1}) Find by non shard key Scatter Gather db.users.find({state:”CA”}) Sorted by non shard key Distributed merge sort db.users.find().sort({state:1})
  12. Writes Inserts Requires shard key db.users.insert({ name: “Jared”, email: “[email protected]”})

    Removes Routed db.users.delete({ email: “[email protected]”}) Scattered db.users.delete({name: “Jared”}) Updates Routed db.users.update( {email: “[email protected]”}, {$set: { state: “CA”}}) Scattered db.users.update( {state: “FZ”}, {$set:{ state: “CA”}}, false, true )
  13. Routed Request mongos Shard 1 Shard 2 Shard 3 1

    2 3 4 1.  Query arrives at MongoS 2.  MongoS routes query to a single shard 3.  Shard returns results of query 4.  Results returned to client
  14. Scatter Gather mongos Shard 1 Shard 2 Shard 3 1

    4 1.  Query arrives at MongoS 2.  MongoS broadcasts query to all shards 3.  Each shard returns results for query 4.  Results combined and returned to client 2 2 3 3 2 3
  15. Distributed Merge Sort mongos Shard 1 Shard 2 Shard 3

    1 3 6 1.  Query arrives at MongoS 2.  MongoS broadcasts query to all shards 3.  Each shard locally sorts results 4.  Results returned to mongos 5.  MongoS merge sorts individual results 6.  Combined sorted result returned to client 2 2 3 3 4 4 5 2 4
  16. User Profiles { name: “Jared”, email: “[email protected]”, addresses: [ {state:

    “CA”} ] } •  Shard by email •  Lookup by email hits 1 node •  Index on {“addresses.state”:1}
  17. Activity Stream { user_id: “[email protected]”, event_id: “Logged in”, data: “…”

    } •  Shard by user_id •  Looking up a stream hits 1 node •  Writing is evenly distributed •  Index on {“event_id”:1} for deletes
  18. Photos { photo_id: ???, data: BinData(…) } •  What’s the

    right key? –  Auto Increment? –  MD5( data ) –  Now() + MD5(data) –  Month() + MD5(data)