Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Sharding (MongoSV 2012)

Introduction to Sharding (MongoSV 2012)

An introduction to sharding with MongoDB given at MongoSV '12.

29392a12bce98d5f0de66043d17f378b?s=128

Brandon Black

December 04, 2012
Tweet

Transcript

  1. Software Engineer, 10gen @brandonmblack Brandon Black #mongosv Sharding

  2. Agenda •  Scaling Data •  MongoDB's Approach •  Architecture • 

    Configuration •  Mechanics
  3. Scaling Data

  4. Examining Growth •  More Users –  1995: 0.4% of the

    world’s population –  Today: 30% of the world is online (~2.2B) –  Emerging Markets & Mobile •  More Data –  Facebook’s data set is around 100 petabytes –  4 billion photos taken in the last year (4x a decade ago)
  5. Read/Write Throughput Exceeds I/O

  6. Working Set Exceeds Physical Memory

  7. Vertical Scalability (Scale Up)

  8. Horizontal Scalability (Scale Out)

  9. Data Store Scalability •  Custom Hardware –  Oracle •  Custom

    Software –  Facebook + MySQL –  Google
  10. Data Store Scalability Today •  MongoDB Auto-Sharding •  A data

    store that is –  Publicly available –  Free, open source (https://github.com/mongodb/mongo) –  Horizontally scalable –  Application independent
  11. MongoDB's Approach to Sharding

  12. Partitioning •  User defines shard key •  Shard key defines

    range of data •  Key space is like points on a line •  Range is a segment of that line -∞ +∞ Key Space
  13. Data Distribution •  Initially 1 chunk •  Default max chunk

    size: 64mb •  MongoDB automatically splits & migrates chunks when max reached Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
  14. Routing and Balancing •  Queries routed to specific shards • 

    MongoDB balances cluster •  MongoDB migrates data to new nodes Shard Shard Shard Mongos 1 2 3 4
  15. MongoDB Auto-Sharding •  Minimal effort required –  Same interface as

    single mongod •  Two steps –  Enable Sharding for a database –  Shard collection within database
  16. Architecture

  17. What is a Shard? •  Shard is a node of

    the cluster •  Shard can be a single mongod or a replica set Shard Primary Secondary Secondary Shard or Mongod
  18. •  Config Server –  Stores cluster chunk ranges and locations

    –  Can have only 1 or 3 (production must have 3) –  Not a replica set Meta Data Storage or Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server Node 1 Secondary Config Server
  19. Routing and Managing Data •  Mongos –  Acts as a

    router / balancer –  No local data (persists to config database) –  Can have 1 or many App Server Mongos Mongos App Server App Server App Server Mongos or
  20. Sharding infrastructure Node 1 Secondary Config Server Node 1 Secondary

    Config Server Node 1 Secondary Config Server Shard Shard Shard Mongos App Server Mongos App Server Mongos App Server
  21. Configuration

  22. Example Cluster •  Don’t use this setup in production! - 

    Only one Config server (No Fault Tolerance) -  Shard not in a replica set (Low Availability) -  Only one mongos and shard (No Performance Improvement) -  Useful for development or demonstrating configuration mechanics Node 1 Secondary Config Server Mongos Mongod Mongod
  23. Node 1 Secondary Config Server Starting the Configuration Server • 

    mongod --configsvr •  Starts a configuration server on the default port (27019)
  24. Node 1 Secondary Config Server Mongos Start the mongos Router

    •  mongos --configdb <hostname>:27019! •  For 3 configuration servers: mongos --configdb <host1>:<port1>,<host2>:<port2>,<host3>:<port3>! •  This is always how to start a new mongos, even if the cluster is already running
  25. Start the shard database •  mongod --shardsvr! •  Starts a

    mongod with the default shard port (27018) •  Shard is not yet connected to the rest of the cluster •  Shard may have already been running in production Node 1 Secondary Config Server Mongos Mongod Shard
  26. Add the Shard •  On mongos: -  sh.addShard(‘<host>:27018’)! •  Adding

    a replica set: -  sh.addShard(‘<rsname>/<seedlist>’)! Node 1 Secondary Config Server Mongos Mongod Shard
  27. Verify that the shard was added •  db.runCommand({ listshards:1 })!

    { "shards" : 
 ![{"_id”: "shard0000”,"host”: ”<hostname>:27018” } ],! "ok" : 1
 }! Node 1 Secondary Config Server Mongos Mongod Shard
  28. Enabling Sharding •  Enable sharding on a database ! sh.enableSharding(“<dbname>”)!

    •  Shard a collection with the given key ! sh.shardCollection(“<dbname>.people”,{“country”:1})! •  Use a compound shard key to prevent duplicates ! sh.shardCollection(“<dbname>.cars”,{“year”:1, ”uniqueid”:1})!
  29. Mechanics

  30. Partitioning •  Remember it's based on ranges -∞ +∞ Key

    Space
  31. Chunk is a section of the entire range minKey maxKey

    minKey maxKey minKey maxKey {x: -20} {x: 13} {x: 25} {x: 100,000} 64MB
  32. Chunk splitting •  A chunk is split once it exceeds

    the maximum size •  There is no split point if all documents have the same shard key •  Chunk split is a logical operation (no data is moved) minKey maxKey minKey 13 14 maxKey
  33. Balancing •  Balancer is running on mongos! •  Once the

    difference in chunks between the most dense shard and the least dense shard is above the migration threshold, a balancing round starts Node 1 Secondary Config Server Shard 1 Mongos Mongos Mongos Shard 2 Mongod
  34. Acquiring the Balancer Lock •  The balancer on mongos takes

    out a “balancer lock” •  To see the status of these locks: use config! db.locks.find({ _id: “balancer” })! Node 1 Secondary Config Server Mongos Shard 1 Mongos Mongos Shard 2 Mongod
  35. Moving the chunk •  The mongos sends a moveChunk command

    to source shard •  The source shard then notifies destination shard •  Destination shard starts pulling documents from source shard Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  36. Committing Migration •  When complete, destination shard updates config server

    -  Provides new locations of the chunks Node 1 Secondary Config Server Mongos Shard 1 Shard 2 Mongod
  37. Cleanup •  Source shard deletes moved data -  Must wait

    for open cursors to either close or time out -  NoTimeout cursors may prevent the release of the lock •  The mongos releases the balancer lock after old chunks are deleted Node 1 Secondary Config Server Shard 1 Shard 2 Mongod Mongos Mongos Mongos
  38. Routing Requests

  39. Cluster Request Routing •  Targeted Queries •  Scatter Gather Queries

    •  Scatter Gather Queries with Sort
  40. Cluster Request Routing: Targeted Query Shard Shard Shard Mongos

  41. Routable request received Shard Shard Shard Mongos 1

  42. Request routed to appropriate shard Shard Shard Shard Mongos 1

    2
  43. Shard returns results Shard Shard Shard Mongos 1 2 3

  44. Mongos returns results to client Shard Shard Shard Mongos 1

    2 3 4
  45. Cluster Request Routing: Non-Targeted Query Shard Shard Shard Mongos

  46. Non-Targeted Request Received Shard Shard Shard Mongos 1

  47. Request sent to all shards Shard Shard Shard Mongos 1

    2 2 2
  48. Shards return results to mongos Shard Shard Shard Mongos 1

    2 2 2 3 3 3
  49. Mongos returns results to client Shard Shard Shard Mongos 1

    2 2 2 3 3 3 4
  50. Cluster Request Routing: Non-Targeted Query with Sort Shard Shard Shard

    Mongos
  51. Non-Targeted request with sort received Shard Shard Shard Mongos 1

  52. Request sent to all shards Shard Shard Shard Mongos 1

    2 2 2
  53. Query and sort performed locally Shard Shard Shard Mongos 1

    2 2 2 3 3 3
  54. Shards return results to mongos Shard Shard Shard Mongos 1

    2 2 2 4 4 4 3 3 3
  55. Mongos merges sorted results Shard Shard Shard Mongos 1 2

    2 2 4 4 4 3 3 3 5
  56. Mongos returns results to client Shard Shard Shard Mongos 1

    2 2 2 4 4 4 3 3 3 6 5
  57. Shard Key

  58. Shard Key •  Shard key is immutable •  Shard key

    values are immutable •  Shard key must be indexed •  Shard key limited to 512 bytes in size •  Shard key used to route queries –  Choose a field commonly used in queries •  Only shard key can be unique across shards –  `_id` field is only unique within individual shard
  59. Shard Key Considerations •  Cardinality •  Write distribution •  Query

    isolation •  Reliability •  Index Locality
  60. Conclusion

  61. •  Sharding Enables Scaling •  MongoDB’s Auto-Sharding –  Easy to

    Install –  Consistent –  Free and Open Source •  What’s next? –  Advanced Sharding Talk w/ Bernie Hackett (B5, 1:45PM) –  MongoDB User Group –  Sharding Best Practices Webinar (December 20th) •  Resources http://www.10gen.com/presentations http://github.com/brandonblack/presentations
  62. Software Engineer, 10gen @brandonmblack Brandon Black #mongosv Thank You