Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How and When to Scale MongoDB with Sharding - Kyle Banker, 10gen

mongodb
February 13, 2012

How and When to Scale MongoDB with Sharding - Kyle Banker, 10gen

MongoDB Boulder 2012

Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s support for auto-sharding, including an architectural overview, usage patterns, as well as a few example use cases of real world sharded deployments.

mongodb

February 13, 2012
Tweet

More Decks by mongodb

Other Decks in Technology

Transcript

  1. Horizontal Scaling • Vertical scaling is limited • Hard to

    scale vertically in the cloud • Can scale wider than higher
  2. Replica Sets • One master at any time • Programmer

    determines if read hits master or a slave • Easy to scale reads
  3. slaveOK = 4 db.people.find( { state : “NY” } ).addOption(

    slaveOK ) • routed to a secondary automatically • will use master if no secondary is available Replica Sets - Read Scaling
  4. Not Enough • Writes don’t scale • Depending on write

    load, reads might not even scale • Reads are out of date on secondaries • “Eventually consistent” • RAM/Data Size doesn’t scale
  5. • Distribute write load • Keep working set in RAM

    • Consistent reads • Preserve functionality Why Shard?
  6. Sharding Design Goals • Scale linearly • Increase capacity with

    no downtime • Transparent to the application • Low administration to add capacity • Simplify by avoiding joins and transactions • BigTable / PNUTS inspired
  7. • Choose how you partition data • Convert from single

    replica set to sharding with no downtime • Full feature set • Fully consistent by default Basics
  8. Architecture client mongos ... mongos mongod mongod ... Shards mongod

    mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod client client client
  9. Data Center Primary Data Center Secondary S1 p=1 S1 p=1

    S1 p=0 S2 p=0 S3 p=0 S2 p=1 S3 p=1 S2 p=1 S3 p=1 Config 2 Config 2 Config 1 mongos mongos mongos mongos Typical Basic Setup
  10. Range Based • collection is broken into chunks by range

    • chunks default to 64mb or 100,000 objects
  11. Choosing a Shard Key • Shard key determines how data

    is partitioned • Immutable • Most important performance decision
  12. Use Case: Photos { photo_id : ???? , data :

    <binary> } What’s the right key? • auto increment • MD5( data ) • month() + MD5( data ) • user_id + MD5( data )
  13. Initial Loading • System start with 1 chunk • Writes

    will hit 1 shard and then move • Pre-splitting for initial bulk loading can dramatically improve bulk load time
  14. Administering a Cluster • Do not wait too long to

    add capacity • Need capacity for normal workload + cost of moving data • Stay < 70% operational capacity
  15. Hardware Considerations • Understand working set and make sure it

    can fit in RAM • Choose appropriate sized boxes for shards • Too small and admin/overhead goes up • Too large, and you can’t add capacity smoothly
  16. Use Case: User Profiles { email : “[email protected]” , addresses

    : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 }
  17. Use Case: Activity Stream { user_id : XXX, event_id :

    YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing is distributed • Index on { “event_id” : 1 } for deletes
  18. Download MongoDB http://www.mongodb.org and  let  us  know  what  you  think

    @hwaet        @mongodb 10gen is hiring! http://www.10gen.com/jobs