Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Paris 2012: How and When to Scale Mongo...

mongodb
June 14, 2012
96

MongoDB Paris 2012: How and When to Scale MongoDB with Sharding

Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s support for auto-sharding, including an architectural overview, usage patterns, as well as a few example use cases of real world sharded deployments.

mongodb

June 14, 2012
Tweet

More Decks by mongodb

Transcript

  1. Write scaling - add Shards write read shard1 node_c1 node_b1

    node_a1 shard2 node_c2 node_b2 node_a2
  2. Write scaling - add Shards write read shard1 node_c1 node_b1

    node_a1 shard2 node_c2 node_b2 node_a2 shard3 node_c3 node_b3 node_a3
  3. MongoDB Sharding Automatic partitioning and management Range based Convert to

    sharded system with no downtime Fully consistent • • • •
  4. Range Based Partitioning > db.posts.save( {age:40} ) -∞ +∞ -∞

    40 41 +∞ Data in inserted Ranges are split into more “chunks” • •
  5. How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save(

    {age:50} ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ More Data in inserted Ranges are split into more“chunks” • •
  6. How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save(

    {age:50} ) > db.posts.save( {age:60} ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ 61 +∞ 51 60
  7. -∞ +∞ 41 +∞ 51 +∞ How MongoDB Sharding works

    > db.posts.save( {age:40} ) > db.posts.save( {age:50} ) > db.posts.save( {age:60} ) -∞ 40 41 50 61 +∞ 51 60 shard1
  8. How MongoDB Sharding works > db.runCommand( { addshard : "shard2"

    } ); -∞ 40 41 50 61 +∞ 51 60 shard1 shard2 > db.runCommand( { addshard : "shard3" } ); shard3
  9. mongos Shard Router Acts just like a MongoD 1 or

    as many as you want Can run on App Servers Caches meta-data from config servers • • • • •
  10. Config Server 3 of them Changes use 2 phase commit

    If any are down, meta data goes read only System is online as long as 1/3 is up • • • •
  11. Keys { name: “Jared”, email: “[email protected]”, } { name: “Scott”,

    email: “[email protected]”, } { name: “Dan”, email: “[email protected]”, } > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} )
  12. Chunks Min Key Max Key Shard -∞ [email protected] 1 [email protected]

    [email protected] 1 [email protected] [email protected] 1 [email protected] +∞ 1 Stored in the config servers Cached in MongoS Used to route requests and keep cluster balanced • • •
  13. Balancing Shard 1 Shard 2 Shard 3 Shard 4 5

    9 1 6 10 2 7 11 3 8 12 4 17 21 13 18 22 14 19 23 15 20 24 16 29 33 25 30 34 26 31 35 27 32 36 28 41 45 37 42 46 38 43 47 39 44 48 40 mongos balancer config config config Chunks!
  14. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Imbalance Imbalance
  15. Balancing mongos balancer Move chunk 1 to Shard 2 config

    config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48
  16. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1
  17. Balancing mongos balancer Chunk 1 now lives on Shard 2

    config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48
  18. Routed Request mongos Shard 1 Shard 2 Shard 3 1

    2 3 4 Query arrives at MongoS MongoS routes query to a single shard Shard returns results of query Results returned to client 1. 2. 3. 4.
  19. Scatter Gather mongos Shard 1 Shard 2 Shard 3 1

    4 Query arrives at MongoS MongoS broadcasts query to all shards Each shard returns results for query Results combined and returned to client 1. 2. 3. 4. 2 2 3 3 2 3
  20. Distributed Merge Sort mongos Shard 1 Shard 2 Shard 3

    1 3 6 Query arrives at MongoS MongoS broadcasts query to all shards Each shard locally sorts results Results returned to mongos MongoS merge sorts individual results Combined sorted result returned to client 1. 2. 3. 4. 5. 6. 2 2 3 3 4 4 5 2 4
  21. Choosing a shard key How does you application query the

    data? Most common queries Value of the key is important Random distribution of values Cardinality Not incremental Could be compound {a:1,b:1} or concatenated ‘a+b’ • • • • • • • •
  22. Only have to keep small portion in ram Right shard

    "hot" • • Time Based ObjectId Auto Increment • • • Incremental Right Balanced Access
  23. Have to keep entire index in ram All shards "warm"

    • • Hash • Random distribution
  24. Have to keep entire index in ram Some shards "warm"

    • • Month + Hash • Segmented access
  25. Impact on Schema Design { _id : "alvin", display: "jonnyeight",

    addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] } Shard on { _id : 1 } Lookup by _id hits 1 node Index on { “addresses.country” : 1 }
  26. Multiple Identities - Example User can have multiple identities twitter

    name email address facebook name etc. What is the best sharding key & schema design? • • • •
  27. Multiple Identities - Solution 1 { _id: "alvin", display: "jonnyeight",

    fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin addresses : [ { state : "CA", country: "USA" }, { country: "UK" } ] } Shard on { _id: 1 } Lookup by _id hits 1 node Lookup by li or fb is scatter gather Cannot create a unique index on li or fb
  28. Multiple Identities - Solution 2 identities { type: "_id", val:

    "alvin", info: "1200-42"} { type: "fb", val: "alvin.richards", info: "1200-42"} { type: "li", val: "alvin.j.richards",info: "1200-42"} info { _id: "1200-42", addresses : [ { state : "CA", country: "USA" }, { country: "UK" }] } Shard identities on { type : 1, val : 1 } Lookup by type & val hits 1 node Can create unique index on type & val Shard info on { _id: 1 } Lookup info on _id hits one node
  29. When to shard? When you are running out of hardware

    resources Need to scale RAM or Disk IO? Throughput or data size? Shard only if you need to Use Monitoring Tools Mongostat, db.serverStatus(), iostat MMS - http://mms.10gen.com/ Working Set and Indexes in RAM page faults and BTree index misses • • • • • • • • •
  30. Data Set larger than RAM? write read shard1 A-M N-P

    R-Z 300 GB Data 3:1 Data/Mem 96 GB Mem
  31. Cache everything in RAM write read shard1 A-M shard2 N-P

    shard3 R-Z 300 GB Data 1:1 Data/Mem 96 GB Mem
  32. Summary Shard to horizontally scale your application Choose Shard Keys

    wisely Sharding may effect your schema design Shard when you need to: Listen to the metrics Monitor and watch the trends Shard early • • • • • • •