Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Practical Scaling and Sharding - Eliot Horowitz, 10gen

mongodb
October 05, 2011

Practical Scaling and Sharding - Eliot Horowitz, 10gen

MongoBoston 2011

With MongoDB, you can distribute load across multiple servers using auto-sharding. This session will introduce MongoDB's auto-sharding concepts and... how to use them. We'll discuss choosing a shard key, basic architecture concepts, and common usage patterns. We'll close out the session with a few example use cases, including real world sharded deployments.

mongodb

October 05, 2011
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Scaling by Optimization • Schema Design • Index Design •

    Hardware Configuration Wednesday, October 5, 2011
  2. Horizontal Scaling • Vertical scaling is limited • Hard to

    scale vertically in the cloud • Can scale wider than higher Wednesday, October 5, 2011
  3. Replica Sets • One master at any time • Programmer

    determines if read hits master or a slave • Easy to setup to scale reads Wednesday, October 5, 2011
  4. db.people.find( { state : “NY” } ).addOption( SlaveOK ) •

    routed to a secondary automatically • will use master if no secondary is available Wednesday, October 5, 2011
  5. Not Enough • Writes don’t scale • Reads are out

    of date on slaves • RAM/Data Size doesn’t scale Wednesday, October 5, 2011
  6. • Distribute write load • Keep working set in RAM

    • Consistent reads • Preserve functionality Why Shard? Wednesday, October 5, 2011
  7. Sharding Design Goals • Scale linearly • Increase capacity with

    no downtime • Transparent to the application • Low administration to add capacity Wednesday, October 5, 2011
  8. Sharding and Documents • Rich documents reduce need for joins

    • No joins makes sharding solvable Wednesday, October 5, 2011
  9. • Choose how you partition data • Convert from single

    replica set to sharding with no downtime • Full feature set • Fully consistent by default Basics Wednesday, October 5, 2011
  10. Architecture client mongos ... mongos mongod mongod ... Shards mongod

    mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod client client client Wednesday, October 5, 2011
  11. Data Center Primary Data Center Secondary S1 p=1 S1 p=1

    S1 p=0 S2 p=0 S3 p=0 S2 p=1 S3 p=1 S2 p=1 S3 p=1 Config 2 Config 2 Config 1 mongos mongos mongos mongos Typical Basic Setup Wednesday, October 5, 2011
  12. Range Based • collection is broken into chunks by range

    • chunks default to 64mb or 100,000 objects Wednesday, October 5, 2011
  13. Choosing a Shard Key • Shard key determines how data

    is partitioned • Hard to change • Most important performance decision Wednesday, October 5, 2011
  14. Use Case: Photos { photo_id : ???? , data :

    <binary> } What’s the right key? • auto increment • MD5( data ) • month() + MD5(data) Wednesday, October 5, 2011
  15. Initial Loading • System start with 1 chunk • Writes

    will hit 1 shard and then move • Pre-splitting for initial bulk loading can dramatically improve bulk load time Wednesday, October 5, 2011
  16. Administering a Cluster • Do not wait too long to

    add capacity • Need capacity for normal workload + cost of moving data • Stay < 70% operational capacity Wednesday, October 5, 2011
  17. Hardware Considerations • Understand working set and make sure it

    can fit in RAM • Choose appropriate sized boxes for shards • Too small and admin/overhead goes up • Too large, and you can’t add capacity smoothly Wednesday, October 5, 2011
  18. Download MongoDB http://www.mongodb.org and  let  us  know  what  you  think

    @eliothorowitz        @mongodb 10gen is hiring! http://www.10gen.com/jobs Wednesday, October 5, 2011
  19. Use Case: User Profiles { email : “[email protected]” , addresses

    : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 } Wednesday, October 5, 2011
  20. Use Case: Activity Stream { user_id : XXX, event_id :

    YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes Wednesday, October 5, 2011