Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling with MongoDB

rick446
February 29, 2012

Scaling with MongoDB

MongoDB’s architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB’s early adopters.

rick446

February 29, 2012
Tweet

More Decks by rick446

Other Decks in Technology

Transcript

  1.   Now  a  consultant,  but  formerly…     Software  engineer

     at  SourceForge,  early  adopter  of   MongoDB  (version  0.8)     Wrote  the  SQLAlchemy  book  (I  love  SQL  when  it’s   used  well)     Mainly  write  Python  now,  but  have  done  C++,  C#,   Java,  Javascript,  VHDL,  Verilog,  …  
  2.   You  can  do  it  with  an  RDBMS  as  long

     as  you…       Don’t  use  joins     Don’t  use  transactions     Use  read-­‐only  slaves     Use  memcached     Denormalize  your  data     Use  custom  sharding/partitioning     Do  a  lot  of  vertical  scaling     ▪  (we’re  going  to  need  a  bigger  box)  
  3.   Use  documents  to  improve  locality     Optimize  your

     indexes     Be  aware  of  your  working  set     Scaling  your  disks     Replication  for  fault-­‐tolerance  and  read  scaling     Sharding  for  read  and  write  scaling  
  4. Relational  (SQL)   MongoDB   Database   Database   Table

      Collection   Index   Index   Row   Document   Column   Field   Dynamic   Typing   B-­‐tree   (range-­‐based)   Think  JSON   Primitive  types  +   arrays,  documents  
  5. {! title: "Slides for Scaling with MongoDB",! author: "Rick Copeland",!

    date: ISODate("20012-02-29T19:30:00Z"),! text: "My slides are available on speakerdeck.com",! comments: [! { author: "anonymous",! date: ISODate("20012-02-29T19:30:01Z"),! text: "Frist psot!" },! { author: "mark”,! date: ISODate("20012-02-29T19:45:23Z"),! text: "Nice slides" } ] }   Embed  comment  data  in   blog  post  document  
  6. 1   2   3   4   5  

    6   7   Looked at 7 objects Find where x equals 7
  7. 7   6   5   4   3  

    2   1   Looked at 3 objects Find where x equals 7
  8.   Working  set  =       sizeof(frequently  used  data)

        +  sizeof(frequently  used  indexes)     Right-­‐aligned  indexes  reduce  working  set  size     Working  set  should  fit  in  available  RAM  for  best   performance     Page  faults  are  the  biggest  cause  of  performance   loss  in  MongoDB    
  9. > db.foo.stats()! {! "ns" : "test.foo",! "count" : 1338330,! "size"

    : 46915928,! "avgObjSize" : 35.05557523181876,! "storageSize" : 86092032,! "numExtents" : 12,! "nindexes" : 2,! "lastExtentSize" : 20872960,! "paddingFactor" : 1,! "flags" : 0,! "totalIndexSize" : 99860480,! "indexSizes" : {! "_id_" : 55877632,! "x_1" : 43982848},! "ok" : 1! }   Data  Size   Average  doc  size   Size  on  disk  (or  RAM!)   Size  of  all  indexes   Size  of  each  index  
  10. ~200 seeks / second ~200 seeks / second ~200 seeks

    / second   Faster,  but  less  reliable  
  11. ~400 seeks / second ~400 seeks / second ~400 seeks

    / second   Faster  and  more  reliable  ($$$  though)  
  12. Primary Secondary Secondary Read / Write Read Read   Old

     and  busted    master/slave  replication     The  new  hotness    replica  sets  with  automatic   failover  
  13.   Primary  handles  all   writes     Application  optionally

      sends  reads  to  slaves     Heartbeat  manages   automatic  failover  
  14.   Special  collection  (the  oplog)  records  operations   idempotently  

      Secondaries  read  from  primary  oplog  and  replay   operations  locally     Space  is  preallocated  and  fixed  for  the  oplog  
  15. {! "ts" : Timestamp(1317653790000, 2),! "h" : -6022751846629753359,! "op" :

    "i",! "ns" : "confoo.People",! "o" : {! "_id" : ObjectId("4e89cd1e0364241932324269"),! "first" : "Rick",! "last" : "Copeland”! }! }   Insert   Collection  name   Object  to  insert  
  16.   Use  heartbeat  signal  to  detect  failure     When

     primary  can’t  be  reached,  elect  a  new  one     Replica  that’s  the  most  up-­‐to-­‐date  is  chosen     If  there  is  skew,  changes  not  on  new  primary  are   saved  to  a  .bson  file  for  manual  reconciliation     Application  can  require  data  to  be  replicated  to  a   majority  to  ensure  this  doesn’t  happen  
  17.   Priority     Slower  nodes  with  lower  priority  

      Backup  or  read-­‐only  nodes  to  never  be  primary     slaveDelay     Fat-­‐finger  protection     Data  center  awareness  and  tagging     Application  can  ensure  complex  replication   guarantees  
  18.   Reads  scale  nicely     As  long  as  the

     working  set  fits  in  RAM     …  and  you  don’t  mind  eventual  consistency     Sharding  to  the  rescue!     Automatically  partitioned  data  sets     Scale  writes  and  reads     Automatic  load  balancing  between  the  shards  
  19. Shard 1 0..10 Shard 2 10..20 Shard 3 20..30 Shard

    4 30..40 Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary Primary Secondary Secondary MongoS Configuration Config 1 Config 2 Config 3 MongoS
  20.   Sharding  is  per-­‐collection  and  range-­‐based     The  highest-­‐impact

     choice  (and  hardest  to   change  decision)  you  make  is  the  shard  key     Random  keys:  good  for  writes,  bad  for  reads     Right-­‐aligned  index:  bad  for  writes     Small  #  of  discrete  keys:  very  bad     Ideal:  balance  writes,  make  reads  routable  by  mongos     Optimal  shard  key  selection  is  hard  
  21. Primary Data Center Secondary Data Center Shard 1 Priority 1

    Shard 1 Priority 1 Shard 1 Priority 0 Shard 2 Priority 1 Shard 2 Priority 1 Shard 2 Priority 0 Shard 3 Priority 1 Shard 3 Priority 1 Shard 3 Priority 0 Config 1 Config 2 Config 3                                                              RS1                                                                RS2                                                                RS3                                                                Config  
  22.   Writes  and  reads  both  scale  (with  good  choice  of

      shard  key)     Reads  scale  while  remaining  strongly  consistent     Partitioning  ensures  you  get  more  usable  RAM     Pitfall:  don’t  wait  too  long  to  add  capacity