Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB Paris - How&WhenToShard - June 2012.pdf

D8fc2580cfaca035f666d9e4ee79a7f7?s=47 mongodb
June 22, 2012
380

MongoDB Paris - How&WhenToShard - June 2012.pdf

D8fc2580cfaca035f666d9e4ee79a7f7?s=128

mongodb

June 22, 2012
Tweet

Transcript

  1. How & When to Shard

  2. Solution Architect Based in London http://www.10gen.com/ @dmroberts daniel.roberts@10gen.com sdf

  3. Agenda Architecture How it Works Choosing a shard key When

    to Shard • • • • 3
  4. http://community.qlikview.com/cfs- filesystemfile.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/theqlikviewblog/Cutting-Grass- with-Scissors-_2D00_-2.jpg

  5. http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg

  6. MongoDB Scaling - Single Node write read node_a1

  7. Read scaling - add Replicas write read node_b1 node_a1

  8. Read scaling - add Replicas write read node_c1 node_b1 node_a1

  9. Write scaling - Sharding write read shard1 node_c1 node_b1 node_a1

  10. Write scaling - add Shards write read shard1 node_c1 node_b1

    node_a1 shard2 node_c2 node_b2 node_a2
  11. Write scaling - add Shards write read shard1 node_c1 node_b1

    node_a1 shard2 node_c2 node_b2 node_a2 shard3 node_c3 node_b3 node_a3
  12. MongoDB Sharding Automatic partitioning and management Range based Convert to

    sharded system with no downtime Fully consistent • • • •
  13. Range Based Partitioning > db.posts.save( {age:40} ) -∞ +∞ -∞

    40 41 +∞ Data in inserted Ranges are split into more “chunks” • •
  14. How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save(

    {age:50} ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ More Data in inserted Ranges are split into more“chunks” • •
  15. How MongoDB Sharding works > db.posts.save( {age:40} ) > db.posts.save(

    {age:50} ) > db.posts.save( {age:60} ) -∞ +∞ -∞ 40 41 +∞ 41 50 51 +∞ 61 +∞ 51 60
  16. -∞ +∞ 41 +∞ 51 +∞ How MongoDB Sharding works

    > db.posts.save( {age:40} ) > db.posts.save( {age:50} ) > db.posts.save( {age:60} ) -∞ 40 41 50 61 +∞ 51 60 shard1
  17. How MongoDB Sharding works > db.runCommand( { addshard : "shard2"

    } ); -∞ 40 41 50 61 +∞ 51 60 shard1 shard2 > db.runCommand( { addshard : "shard3" } ); shard3
  18. SHARDING ARCHITECTURE

  19. Architecture

  20. mongos Shard Router Acts just like a MongoD 1 or

    as many as you want Can run on App Servers Caches meta-data from config servers • • • • •
  21. Config Server 3 of them Changes use 2 phase commit

    If any are down, meta data goes read only System is online as long as 1/3 is up • • • •
  22. HOW IT WORKS

  23. Keys { name: “Jared”, email: “jsr@abc.com”, } { name: “Scott”,

    email: “scott@gmail.com”, } { name: “Dan”, email: “dan@yahoo.com”, } > db.runCommand( { shardcollection: “test.users”, key: { email: 1 }} )
  24. Chunks -∞ +∞

  25. Chunks -∞ +∞ dan@yahoo.com jsr@abc.com scott@gmail.com

  26. Chunks -∞ +∞ dan@yahoo.com jsr@abc.com scott@gmail.com Split!

  27. Chunks -∞ +∞ dan@yahoo.com jsr@abc.com scott@gmail.com Split! This is a

    chunk This is a chunk
  28. Chunks -∞ +∞ dan@yahoo.com jsr@abc.com scott@gmail.com

  29. Chunks -∞ +∞ dan@yahoo.com jsr@abc.com scott@gmail.com

  30. Chunks -∞ +∞ dan@yahoo.com jsr@abc.com scott@gmail.com Split!

  31. Chunks Min Key Max Key Shard -∞ adam@yahoo.com 1 adam@yahoo.com

    jared@abc.com 1 jared@abc.com scott@gmail.com 1 scott@gmail.com +∞ 1 Stored in the config servers Cached in MongoS Used to route requests and keep cluster balanced • • •
  32. Balancing Shard 1 Shard 2 Shard 3 Shard 4 5

    9 1 6 10 2 7 11 3 8 12 4 17 21 13 18 22 14 19 23 15 20 24 16 29 33 25 30 34 26 31 35 27 32 36 28 41 45 37 42 46 38 43 47 39 44 48 40 mongos balancer config config config Chunks!
  33. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 Imbalance Imbalance
  34. Balancing mongos balancer Move chunk 1 to Shard 2 config

    config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48
  35. Balancing mongos balancer config config config Shard 1 Shard 2

    Shard 3 Shard 4 5 9 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48 1
  36. Balancing mongos balancer Chunk 1 now lives on Shard 2

    config config config Shard 1 Shard 2 Shard 3 Shard 4 5 9 1 6 10 2 7 11 3 8 12 4 21 22 23 24 33 34 35 36 45 46 47 48
  37. ROUTING

  38. Routed Request mongos Shard 1 Shard 2 Shard 3 1

    2 3 4 Query arrives at MongoS MongoS routes query to a single shard Shard returns results of query Results returned to client 1. 2. 3. 4.
  39. Scatter Gather mongos Shard 1 Shard 2 Shard 3 1

    4 Query arrives at MongoS MongoS broadcasts query to all shards Each shard returns results for query Results combined and returned to client 1. 2. 3. 4. 2 2 3 3 2 3
  40. Distributed Merge Sort mongos Shard 1 Shard 2 Shard 3

    1 3 6 Query arrives at MongoS MongoS broadcasts query to all shards Each shard locally sorts results Results returned to mongos MongoS merge sorts individual results Combined sorted result returned to client 1. 2. 3. 4. 5. 6. 2 2 3 3 4 4 5 2 4
  41. Choosing a shard key How does you application query the

    data? Most common queries Value of the key is important Random distribution of values Cardinality Not incremental Could be compound {a:1,b:1} or concatenated ‘a+b’ • • • • • • • •
  42. Only have to keep small portion in ram Right shard

    "hot" • • Time Based ObjectId Auto Increment • • • Incremental Right Balanced Access
  43. Have to keep entire index in ram All shards "warm"

    • • Hash • Random distribution
  44. Have to keep entire index in ram Some shards "warm"

    • • Month + Hash • Segmented access
  45. Impact on Schema Design { _id : "alvin", display: "jonnyeight",

    addresses: [ { state : "CA", country: "USA" }, { country: "UK" } ] } Shard on { _id : 1 } Lookup by _id hits 1 node Index on { “addresses.country” : 1 }
  46. Multiple Identities - Example User can have multiple identities twitter

    name email address facebook name etc. What is the best sharding key & schema design? • • • •
  47. Multiple Identities - Solution 1 { _id: "alvin", display: "jonnyeight",

    fb: "alvin.richards", // facebook li: "alvin.j.richards", // linkedin addresses : [ { state : "CA", country: "USA" }, { country: "UK" } ] } Shard on { _id: 1 } Lookup by _id hits 1 node Lookup by li or fb is scatter gather Cannot create a unique index on li or fb
  48. Multiple Identities - Solution 2 identities { type: "_id", val:

    "alvin", info: "1200-42"} { type: "fb", val: "alvin.richards", info: "1200-42"} { type: "li", val: "alvin.j.richards",info: "1200-42"} info { _id: "1200-42", addresses : [ { state : "CA", country: "USA" }, { country: "UK" }] } Shard identities on { type : 1, val : 1 } Lookup by type & val hits 1 node Can create unique index on type & val Shard info on { _id: 1 } Lookup info on _id hits one node
  49. When to shard? When you are running out of hardware

    resources Need to scale RAM or Disk IO? Throughput or data size? Shard only if you need to Use Monitoring Tools Mongostat, db.serverStatus(), iostat MMS - http://mms.10gen.com/ Working Set and Indexes in RAM page faults and BTree index misses • • • • • • • • •
  50. Data Set larger than RAM? write read shard1 A-M N-P

    R-Z 300 GB Data 3:1 Data/Mem 96 GB Mem
  51. Cache everything in RAM write read shard1 A-M shard2 N-P

    shard3 R-Z 300 GB Data 1:1 Data/Mem 96 GB Mem
  52. Summary Shard to horizontally scale your application Choose Shard Keys

    wisely Sharding may effect your schema design Shard when you need to: Listen to the metrics Monitor and watch the trends Shard early • • • • • • •
  53. Solution Architect Based in London http://www.10gen.com/ @dmroberts daniel.roberts@10gen.com sdf