Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoLA 2011 - Scaling with MongoDB

Avatar for erh erh
January 18, 2011

MongoLA 2011 - Scaling with MongoDB

Avatar for erh

erh

January 18, 2011
Tweet

Other Decks in Programming

Transcript

  1. Horizontal Scaling • Vertical scaling is limited • Hard to

    scale vertically in the cloud • Can scale wider than higher 4
  2. Indexes • Index common queries • Make sure there aren’t

    duplicates: (A) and (A,B) aren’t needed • Right-balanced indexes keep working set small • Understand Covered Indexes 5
  3. RAM Requirements • Understand working set • What percentage of

    your data has to fit in RAM? • How do you figure this out? 6
  4. Hardware • Disk performance • How many drives • What

    about ec2? • Network performance 7
  5. Read Scaling • One master at any time • Programmer

    determines if read hits master or a slave • Pro: easy to setup, can scale reads very well • Con: reads are inconsistent on a slave • Writes don’t scale 8
  6. One Master, Many Slaves • Custom Master/Slave setup • Have

    as many slaves as you want • Can put them local to application servers • Good for 90+% read heavy applications (Wikipedia) 9
  7. Replica Sets • High Availability Cluster • One master at

    any time, up to 6 slaves • A slave automatically promoted to master if failure • Drivers support auto routing of reads to slaves if programmer allows • Good for applications that need high write availability but mostly reads (Commenting System) 10
  8. • Many masters, even more slaves • Can scale reads

    and writes in two dimensions • Add slaves for inconsistent read scaling and redundancy • Add Shards for write and data size scaling Sharding 11
  9. Architecture client mongos ... mongos mongod mongod mongod mongod mongod

    mongod ... Shards mongod mongod mongod Config Servers 12
  10. Common Setup • Typical setup is 3 shards with 3

    servers per shard: 3 masters, 6 slaves • One massive collection, dozen non-sharded • Can add sharding later to an existing replica set with no down time • Can have sharded and non-sharded collections 13
  11. Choosing a Shard Key • Shard key determines how data

    is partitioned • Hard to change • Most important performance decision 14
  12. Range Based • collection is broken into chunks by range

    • chunks default to 200mb or 100,000 objects MIN MAX LOCATION A F shard1 F M shard1 M R shard2 R Z shard3 15
  13. Use Case: User Profiles { email : “[email protected]” , addresses

    : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 } 16
  14. Use Case: Activity Stream { user_id : XXX, event_id :

    YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes 17
  15. Use Case: Photos { photo_id : ???? , data :

    <binary> } What’s the right key? • auto increment • MD5( data ) • now() + MD5(data) • month() + MD5(data) 18
  16. Use Case: Logging { machine : “app.foo.com” , app :

    “apache” , when : “2010-12-02:11:33:14” , data : XXX } Possible Shard keys • { machine : 1 } • { when : 1 } • { machine : 1 , app : 1 } • { app : 1 } 19
  17. • First release - February 2009 • v1.0 - August

    2009 • v1.2 - December 2009 - Map/Reduce, lots of small things • v1.4 - March 2010 - Concurrency/Geo • V1.6 - August 2010 - Sharding/Replica Sets Past Releases 21
  18. Short List • Better Aggregation • Full Text Search •

    TTL timeout collections • Concurrency • Compaction 23
  19. Download MongoDB http://www.mongodb.org and let us know what you think

    @eliothorowitz @mongodb 10gen is hiring! http://www.10gen.com/jobs 24