Scaling with MongoDB - Speaker Deck

Slide 1

Slide 1 text

Rick Copeland @rick446 Arborian Consulting, LLC

Slide 2

Slide 2 text

  Now a consultant, but formerly…   Software engineer at SourceForge, early adopter of MongoDB (version 0.8)   Wrote the SQLAlchemy book (I love SQL when it’s used well)   Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …

Slide 3

Slide 3 text

  You can do it with an RDBMS as long as you…   Don’t use joins   Don’t use transactions   Use read-‐only slaves   Use memcached   Denormalize your data   Use custom sharding/partitioning   Do a lot of vertical scaling ▪  (we’re going to need a bigger box)

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

  Use documents to improve locality   Optimize your indexes   Be aware of your working set   Scaling your disks   Replication for fault-‐tolerance and read scaling   Sharding for read and write scaling

Slide 7

Slide 7 text

Relational (SQL) MongoDB Database Database Table Collection Index Index Row Document Column Field Dynamic Typing B-‐tree (range-‐based) Think JSON Primitive types + arrays, documents

Slide 8

Slide 8 text

{! title: "Slides for Scaling with MongoDB",! author: "Rick Copeland",! date: ISODate("20012-02-29T19:30:00Z"),! text: "My slides are available on speakerdeck.com",! comments: [! { author: "anonymous",! date: ISODate("20012-02-29T19:30:01Z"),! text: "Frist psot!" },! { author: "mark”,! date: ISODate("20012-02-29T19:45:23Z"),! text: "Nice slides" } ] } Embed comment data in blog post document

Slide 9

Slide 9 text

Seek = 5+ ms Read = really really fast

Slide 10

Slide 10 text

Post Author Comment

Slide 11

Slide 11 text

Post Author Comment Comment Comment Comment Comment

Slide 12

Slide 12 text

1 2 3 4 5 6 7 Looked at 7 objects Find where x equals 7

Slide 13

Slide 13 text

7 6 5 4 3 2 1 Looked at 3 objects Find where x equals 7

Slide 14

Slide 14 text

Entire index must ﬁt in RAM

Slide 15

Slide 15 text

Only small portion in RAM

Slide 16

Slide 16 text

  Working set =   sizeof(frequently used data)   + sizeof(frequently used indexes)   Right-‐aligned indexes reduce working set size   Working set should ﬁt in available RAM for best performance   Page faults are the biggest cause of performance loss in MongoDB

Slide 17

Slide 17 text

> db.foo.stats()! {! "ns" : "test.foo",! "count" : 1338330,! "size" : 46915928,! "avgObjSize" : 35.05557523181876,! "storageSize" : 86092032,! "numExtents" : 12,! "nindexes" : 2,! "lastExtentSize" : 20872960,! "paddingFactor" : 1,! "flags" : 0,! "totalIndexSize" : 99860480,! "indexSizes" : {! "_id_" : 55877632,! "x_1" : 43982848},! "ok" : 1! } Data Size Average doc size Size on disk (or RAM!) Size of all indexes Size of each index

Slide 18

Slide 18 text

~200 seeks / second

Slide 19

Slide 19 text

~200 seeks / second ~200 seeks / second ~200 seeks / second   Faster, but less reliable

Slide 20

Slide 20 text

~400 seeks / second ~400 seeks / second ~400 seeks / second   Faster and more reliable ($$$ though)

Slide 21

Slide 21 text

Primary Secondary Secondary Read / Write Read Read   Old and busted  master/slave replication   The new hotness  replica sets with automatic failover

Slide 22

Slide 22 text

  Primary handles all writes   Application optionally sends reads to slaves   Heartbeat manages automatic failover

Slide 23

Slide 23 text

  Special collection (the oplog) records operations idempotently   Secondaries read from primary oplog and replay operations locally   Space is preallocated and ﬁxed for the oplog

Slide 24

Slide 24 text

{! "ts" : Timestamp(1317653790000, 2),! "h" : -6022751846629753359,! "op" : "i",! "ns" : "confoo.People",! "o" : {! "_id" : ObjectId("4e89cd1e0364241932324269"),! "first" : "Rick",! "last" : "Copeland”! }! } Insert Collection name Object to insert

Slide 25

Slide 25 text

  Use heartbeat signal to detect failure   When primary can’t be reached, elect a new one   Replica that’s the most up-‐to-‐date is chosen   If there is skew, changes not on new primary are saved to a .bson ﬁle for manual reconciliation   Application can require data to be replicated to a majority to ensure this doesn’t happen

Slide 26

Slide 26 text

  Priority   Slower nodes with lower priority   Backup or read-‐only nodes to never be primary   slaveDelay   Fat-‐ﬁnger protection   Data center awareness and tagging   Application can ensure complex replication guarantees

Slide 27

Slide 27 text

  Reads scale nicely   As long as the working set ﬁts in RAM   … and you don’t mind eventual consistency   Sharding to the rescue!   Automatically partitioned data sets   Scale writes and reads   Automatic load balancing between the shards

Slide 28

Slide 28 text

Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS Configuration Config 1 Config 2 Config 3 MongoS

Slide 29

Slide 29 text

  Sharding is per-‐collection and range-‐based   The highest-‐impact choice (and hardest to change decision) you make is the shard key   Random keys: good for writes, bad for reads   Right-‐aligned index: bad for writes   Small # of discrete keys: very bad   Ideal: balance writes, make reads routable by mongos   Optimal shard key selection is hard

Slide 30

Slide 30 text

Primary Data Center Secondary Data Center Shard 1 Priority 1 Shard 1 Priority 1 Shard 1 Priority 0 Shard 2 Priority 1 Shard 2 Priority 1 Shard 2 Priority 0 Shard 3 Priority 1 Shard 3 Priority 1 Shard 3 Priority 0 Config 1 Config 2 Config 3 RS1 RS2 RS3 Config

Slide 31

Slide 31 text

  Writes and reads both scale (with good choice of shard key)   Reads scale while remaining strongly consistent   Partitioning ensures you get more usable RAM   Pitfall: don’t wait too long to add capacity

Slide 32

Slide 32 text

Rick Copeland @rick446 Arborian Consulting, LLC