How and When to Scale MongoDB with Sharding - Kyle Banker, 10gen

Kyle Banker @hwaet MongoBoulder February 1, 2012 How and When
to Scale MongoDB with Sharding

Scaling by Optimization • Schema Design • Index Design •
Hardware Conﬁguration

Horizontal Scaling • Vertical scaling is limited • Hard to
scale vertically in the cloud • Can scale wider than higher

Replica Sets • One master at any time • Programmer
determines if read hits master or a slave • Easy to scale reads

slaveOK = 4 db.people.ﬁnd( { state : “NY” } ).addOption(
slaveOK ) • routed to a secondary automatically • will use master if no secondary is available Replica Sets - Read Scaling

Not Enough • Writes don’t scale • Depending on write
load, reads might not even scale • Reads are out of date on secondaries • “Eventually consistent” • RAM/Data Size doesn’t scale

• Distribute write load • Keep working set in RAM
• Consistent reads • Preserve functionality Why Shard?

Sharding Design Goals • Scale linearly • Increase capacity with
no downtime • Transparent to the application • Low administration to add capacity • Simplify by avoiding joins and transactions • BigTable / PNUTS inspired

Sharding and Documents • Rich documents reduce need for joins

• Choose how you partition data • Convert from single
replica set to sharding with no downtime • Full feature set • Fully consistent by default Basics

Architecture client mongos ... mongos mongod mongod ... Shards mongod
mongod mongod Conﬁg Servers mongod mongod mongod mongod mongod mongod mongod client client client

Data Center Primary Data Center Secondary S1 p=1 S1 p=1
S1 p=0 S2 p=0 S3 p=0 S2 p=1 S3 p=1 S2 p=1 S3 p=1 Config 2 Config 2 Config 1 mongos mongos mongos mongos Typical Basic Setup

Range Based • collection is broken into chunks by range
• chunks default to 64mb or 100,000 objects

Choosing a Shard Key • Shard key determines how data
is partitioned • Immutable • Most important performance decision

Use Case: Photos { photo_id : ???? , data :
<binary> } What’s the right key? • auto increment • MD5( data ) • month() + MD5( data ) • user_id + MD5( data )

Initial Loading • System start with 1 chunk • Writes
will hit 1 shard and then move • Pre-splitting for initial bulk loading can dramatically improve bulk load time

Administering a Cluster • Do not wait too long to
add capacity • Need capacity for normal workload + cost of moving data • Stay < 70% operational capacity

Hardware Considerations • Understand working set and make sure it
can ﬁt in RAM • Choose appropriate sized boxes for shards • Too small and admin/overhead goes up • Too large, and you can’t add capacity smoothly

Use Case: User Proﬁles { email : “[email protected]” , addresses
: [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 }

Use Case: Activity Stream { user_id : XXX, event_id :
YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing is distributed • Index on { “event_id” : 1 } for deletes

Download MongoDB http://www.mongodb.org and let us know what you think
@hwaet @mongodb 10gen is hiring! http://www.10gen.com/jobs

How and When to Scale MongoDB with Sharding - K...

How and When to Scale MongoDB with Sharding - Kyle Banker, 10gen

mongodb

More Decks by mongodb

Other Decks in Technology

Featured

Transcript

Kyle Banker @hwaet MongoBoulder February 1, 2012 How and When

Scaling by Optimization • Schema Design • Index Design •

Horizontal Scaling • Vertical scaling is limited • Hard to

Replica Sets • One master at any time • Programmer

slaveOK = 4 db.people.ﬁnd( { state : “NY” } ).addOption(

Not Enough • Writes don’t scale • Depending on write

• Distribute write load • Keep working set in RAM

Sharding Design Goals • Scale linearly • Increase capacity with

Sharding and Documents • Rich documents reduce need for joins

• Choose how you partition data • Convert from single

Architecture client mongos ... mongos mongod mongod ... Shards mongod

Data Center Primary Data Center Secondary S1 p=1 S1 p=1

Range Based • collection is broken into chunks by range

Choosing a Shard Key • Shard key determines how data

Use Case: Photos { photo_id : ???? , data :

Initial Loading • System start with 1 chunk • Writes

Administering a Cluster • Do not wait too long to

Hardware Considerations • Understand working set and make sure it

Use Case: User Proﬁles { email : “[email protected]” , addresses

Use Case: Activity Stream { user_id : XXX, event_id :

Download MongoDB http://www.mongodb.org and let us know what you think