Scaling with MongoDB

Rick Copeland @rick446 Arborian Consulting, LLC

  Now a consultant, but formerly…   Software engineer
at SourceForge, early adopter of MongoDB (version 0.8)   Wrote the SQLAlchemy book (I love SQL when it’s used well)   Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …

  You can do it with an RDBMS as long
as you…   Don’t use joins   Don’t use transactions   Use read-‐only slaves   Use memcached   Denormalize your data   Use custom sharding/partitioning   Do a lot of vertical scaling ▪  (we’re going to need a bigger box)

  Use documents to improve locality   Optimize your
indexes   Be aware of your working set   Scaling your disks   Replication for fault-‐tolerance and read scaling   Sharding for read and write scaling

Relational (SQL) MongoDB Database Database Table
Collection Index Index Row Document Column Field Dynamic Typing B-‐tree (range-‐based) Think JSON Primitive types + arrays, documents

{! title: "Slides for Scaling with MongoDB",! author: "Rick Copeland",!
date: ISODate("20012-02-29T19:30:00Z"),! text: "My slides are available on speakerdeck.com",! comments: [! { author: "anonymous",! date: ISODate("20012-02-29T19:30:01Z"),! text: "Frist psot!" },! { author: "mark”,! date: ISODate("20012-02-29T19:45:23Z"),! text: "Nice slides" } ] } Embed comment data in blog post document

Seek = 5+ ms Read = really really fast

Post Author Comment

Post Author Comment Comment Comment Comment Comment

1 2 3 4 5
6 7 Looked at 7 objects Find where x equals 7

7 6 5 4 3
2 1 Looked at 3 objects Find where x equals 7

Entire index must ﬁt in RAM

Only small portion in RAM

  Working set =   sizeof(frequently used data)
  + sizeof(frequently used indexes)   Right-‐aligned indexes reduce working set size   Working set should ﬁt in available RAM for best performance   Page faults are the biggest cause of performance loss in MongoDB

> db.foo.stats()! {! "ns" : "test.foo",! "count" : 1338330,! "size"
: 46915928,! "avgObjSize" : 35.05557523181876,! "storageSize" : 86092032,! "numExtents" : 12,! "nindexes" : 2,! "lastExtentSize" : 20872960,! "paddingFactor" : 1,! "flags" : 0,! "totalIndexSize" : 99860480,! "indexSizes" : {! "_id_" : 55877632,! "x_1" : 43982848},! "ok" : 1! } Data Size Average doc size Size on disk (or RAM!) Size of all indexes Size of each index

~200 seeks / second

~200 seeks / second ~200 seeks / second ~200 seeks
/ second   Faster, but less reliable

~400 seeks / second ~400 seeks / second ~400 seeks
/ second   Faster and more reliable ($$$ though)

Primary Secondary Secondary Read / Write Read Read   Old
and busted  master/slave replication   The new hotness  replica sets with automatic failover

  Primary handles all writes   Application optionally
sends reads to slaves   Heartbeat manages automatic failover

  Special collection (the oplog) records operations idempotently
  Secondaries read from primary oplog and replay operations locally   Space is preallocated and ﬁxed for the oplog

{! "ts" : Timestamp(1317653790000, 2),! "h" : -6022751846629753359,! "op" :
"i",! "ns" : "confoo.People",! "o" : {! "_id" : ObjectId("4e89cd1e0364241932324269"),! "first" : "Rick",! "last" : "Copeland”! }! } Insert Collection name Object to insert

  Use heartbeat signal to detect failure   When
primary can’t be reached, elect a new one   Replica that’s the most up-‐to-‐date is chosen   If there is skew, changes not on new primary are saved to a .bson ﬁle for manual reconciliation   Application can require data to be replicated to a majority to ensure this doesn’t happen

  Priority   Slower nodes with lower priority
  Backup or read-‐only nodes to never be primary   slaveDelay   Fat-‐ﬁnger protection   Data center awareness and tagging   Application can ensure complex replication guarantees

  Reads scale nicely   As long as the
working set ﬁts in RAM   … and you don’t mind eventual consistency   Sharding to the rescue!   Automatically partitioned data sets   Scale writes and reads   Automatic load balancing between the shards

Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard
4 30..40 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS Configuration Config 1 Config 2 Config 3 MongoS

  Sharding is per-‐collection and range-‐based   The highest-‐impact
choice (and hardest to change decision) you make is the shard key   Random keys: good for writes, bad for reads   Right-‐aligned index: bad for writes   Small # of discrete keys: very bad   Ideal: balance writes, make reads routable by mongos   Optimal shard key selection is hard

Primary Data Center Secondary Data Center Shard 1 Priority 1
Shard 1 Priority 1 Shard 1 Priority 0 Shard 2 Priority 1 Shard 2 Priority 1 Shard 2 Priority 0 Shard 3 Priority 1 Shard 3 Priority 1 Shard 3 Priority 0 Config 1 Config 2 Config 3 RS1 RS2 RS3 Config

  Writes and reads both scale (with good choice of
shard key)   Reads scale while remaining strongly consistent   Partitioning ensures you get more usable RAM   Pitfall: don’t wait too long to add capacity

Rick Copeland @rick446 Arborian Consulting, LLC

Scaling with MongoDB

Scaling with MongoDB

rick446

More Decks by rick446

Other Decks in Technology

Featured

Transcript

Rick Copeland @rick446 Arborian Consulting, LLC

  Now a consultant, but formerly…   Software engineer

  You can do it with an RDBMS as long

  Use documents to improve locality   Optimize your

Relational (SQL) MongoDB Database Database Table

{! title: "Slides for Scaling with MongoDB",! author: "Rick Copeland",!

Seek = 5+ ms Read = really really fast

Post Author Comment

Post Author Comment Comment Comment Comment Comment

1 2 3 4 5

7 6 5 4 3

Entire index must ﬁt in RAM

Only small portion in RAM

  Working set =   sizeof(frequently used data)

> db.foo.stats()! {! "ns" : "test.foo",! "count" : 1338330,! "size"

~200 seeks / second

~200 seeks / second ~200 seeks / second ~200 seeks

~400 seeks / second ~400 seeks / second ~400 seeks

Primary Secondary Secondary Read / Write Read Read   Old

  Primary handles all writes   Application optionally

  Special collection (the oplog) records operations idempotently

{! "ts" : Timestamp(1317653790000, 2),! "h" : -6022751846629753359,! "op" :

  Use heartbeat signal to detect failure   When

  Priority   Slower nodes with lower priority

  Reads scale nicely   As long as the

Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard

  Sharding is per-‐collection and range-‐based   The highest-‐impact

Primary Data Center Secondary Data Center Shard 1 Priority 1

  Writes and reads both scale (with good choice of

Rick Copeland @rick446 Arborian Consulting, LLC