Slide 1

Slide 1 text

Rick  Copeland  @rick446   Arborian  Consulting,  LLC  

Slide 2

Slide 2 text

  Now  a  consultant,  but  formerly…     Software  engineer  at  SourceForge,  early  adopter  of   MongoDB  (version  0.8)     Wrote  the  SQLAlchemy  book  (I  love  SQL  when  it’s   used  well)     Mainly  write  Python  now,  but  have  done  C++,  C#,   Java,  Javascript,  VHDL,  Verilog,  …  

Slide 3

Slide 3 text

  You  can  do  it  with  an  RDBMS  as  long  as  you…       Don’t  use  joins     Don’t  use  transactions     Use  read-­‐only  slaves     Use  memcached     Denormalize  your  data     Use  custom  sharding/partitioning     Do  a  lot  of  vertical  scaling     ▪  (we’re  going  to  need  a  bigger  box)  

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

  Use  documents  to  improve  locality     Optimize  your  indexes     Be  aware  of  your  working  set     Scaling  your  disks     Replication  for  fault-­‐tolerance  and  read  scaling     Sharding  for  read  and  write  scaling  

Slide 7

Slide 7 text

Relational  (SQL)   MongoDB   Database   Database   Table   Collection   Index   Index   Row   Document   Column   Field   Dynamic   Typing   B-­‐tree   (range-­‐based)   Think  JSON   Primitive  types  +   arrays,  documents  

Slide 8

Slide 8 text

{! title: "Slides for Scaling with MongoDB",! author: "Rick Copeland",! date: ISODate("20012-02-29T19:30:00Z"),! text: "My slides are available on speakerdeck.com",! comments: [! { author: "anonymous",! date: ISODate("20012-02-29T19:30:01Z"),! text: "Frist psot!" },! { author: "mark”,! date: ISODate("20012-02-29T19:45:23Z"),! text: "Nice slides" } ] }   Embed  comment  data  in   blog  post  document  

Slide 9

Slide 9 text

Seek = 5+ ms Read = really really fast

Slide 10

Slide 10 text

Post Author Comment

Slide 11

Slide 11 text

Post Author Comment Comment Comment Comment Comment

Slide 12

Slide 12 text

1   2   3   4   5   6   7   Looked at 7 objects Find where x equals 7

Slide 13

Slide 13 text

7   6   5   4   3   2   1   Looked at 3 objects Find where x equals 7

Slide 14

Slide 14 text

Entire index must fit in RAM

Slide 15

Slide 15 text

Only small portion in RAM

Slide 16

Slide 16 text

  Working  set  =       sizeof(frequently  used  data)     +  sizeof(frequently  used  indexes)     Right-­‐aligned  indexes  reduce  working  set  size     Working  set  should  fit  in  available  RAM  for  best   performance     Page  faults  are  the  biggest  cause  of  performance   loss  in  MongoDB    

Slide 17

Slide 17 text

> db.foo.stats()! {! "ns" : "test.foo",! "count" : 1338330,! "size" : 46915928,! "avgObjSize" : 35.05557523181876,! "storageSize" : 86092032,! "numExtents" : 12,! "nindexes" : 2,! "lastExtentSize" : 20872960,! "paddingFactor" : 1,! "flags" : 0,! "totalIndexSize" : 99860480,! "indexSizes" : {! "_id_" : 55877632,! "x_1" : 43982848},! "ok" : 1! }   Data  Size   Average  doc  size   Size  on  disk  (or  RAM!)   Size  of  all  indexes   Size  of  each  index  

Slide 18

Slide 18 text

~200 seeks / second

Slide 19

Slide 19 text

~200 seeks / second ~200 seeks / second ~200 seeks / second   Faster,  but  less  reliable  

Slide 20

Slide 20 text

~400 seeks / second ~400 seeks / second ~400 seeks / second   Faster  and  more  reliable  ($$$  though)  

Slide 21

Slide 21 text

Primary Secondary Secondary Read / Write Read Read   Old  and  busted    master/slave  replication     The  new  hotness    replica  sets  with  automatic   failover  

Slide 22

Slide 22 text

  Primary  handles  all   writes     Application  optionally   sends  reads  to  slaves     Heartbeat  manages   automatic  failover  

Slide 23

Slide 23 text

  Special  collection  (the  oplog)  records  operations   idempotently     Secondaries  read  from  primary  oplog  and  replay   operations  locally     Space  is  preallocated  and  fixed  for  the  oplog  

Slide 24

Slide 24 text

{! "ts" : Timestamp(1317653790000, 2),! "h" : -6022751846629753359,! "op" : "i",! "ns" : "confoo.People",! "o" : {! "_id" : ObjectId("4e89cd1e0364241932324269"),! "first" : "Rick",! "last" : "Copeland”! }! }   Insert   Collection  name   Object  to  insert  

Slide 25

Slide 25 text

  Use  heartbeat  signal  to  detect  failure     When  primary  can’t  be  reached,  elect  a  new  one     Replica  that’s  the  most  up-­‐to-­‐date  is  chosen     If  there  is  skew,  changes  not  on  new  primary  are   saved  to  a  .bson  file  for  manual  reconciliation     Application  can  require  data  to  be  replicated  to  a   majority  to  ensure  this  doesn’t  happen  

Slide 26

Slide 26 text

  Priority     Slower  nodes  with  lower  priority     Backup  or  read-­‐only  nodes  to  never  be  primary     slaveDelay     Fat-­‐finger  protection     Data  center  awareness  and  tagging     Application  can  ensure  complex  replication   guarantees  

Slide 27

Slide 27 text

  Reads  scale  nicely     As  long  as  the  working  set  fits  in  RAM     …  and  you  don’t  mind  eventual  consistency     Sharding  to  the  rescue!     Automatically  partitioned  data  sets     Scale  writes  and  reads     Automatic  load  balancing  between  the  shards  

Slide 28

Slide 28 text

Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard 4 30..40 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS Configuration Config 1 Config 2 Config 3 MongoS

Slide 29

Slide 29 text

  Sharding  is  per-­‐collection  and  range-­‐based     The  highest-­‐impact  choice  (and  hardest  to   change  decision)  you  make  is  the  shard  key     Random  keys:  good  for  writes,  bad  for  reads     Right-­‐aligned  index:  bad  for  writes     Small  #  of  discrete  keys:  very  bad     Ideal:  balance  writes,  make  reads  routable  by  mongos     Optimal  shard  key  selection  is  hard  

Slide 30

Slide 30 text

Primary Data Center Secondary Data Center Shard 1 Priority 1 Shard 1 Priority 1 Shard 1 Priority 0 Shard 2 Priority 1 Shard 2 Priority 1 Shard 2 Priority 0 Shard 3 Priority 1 Shard 3 Priority 1 Shard 3 Priority 0 Config 1 Config 2 Config 3                                                              RS1                                                                RS2                                                                RS3                                                                Config  

Slide 31

Slide 31 text

  Writes  and  reads  both  scale  (with  good  choice  of   shard  key)     Reads  scale  while  remaining  strongly  consistent     Partitioning  ensures  you  get  more  usable  RAM     Pitfall:  don’t  wait  too  long  to  add  capacity  

Slide 32

Slide 32 text

Rick  Copeland  @rick446   Arborian  Consulting,  LLC