Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling MongoDB | Sergey Gavruk

Scaling MongoDB | Sergey Gavruk

Sergey Gavruk
Meetup #7

Minsk MongoDB User Group

October 04, 2012
Tweet

More Decks by Minsk MongoDB User Group

Other Decks in Programming

Transcript

  1. Scaling   • Ver2cal   • Horizontal   • By  op2miza2on   – Op2mize

     your  queries,  schema,  indexes   – Tune  you  file  system   – Choose  right  disks  
  2. Share  nothing  architecture   •  Michael  Stonebraker     First

     implementa2on  in  1983     Google  calls  this  “Sharding”  
  3. Sharding  goals   • App  doesn’t  know  about  clusters   • Cluster

     should  always  be   available  for  reads  and  writes   • Cluster  should  grow  easily  
  4. Sharding  features   •  Range-­‐based  data  par22oning   •  Automa2c

     data  volume  distribu2on   •  Transparent  query  rou2ng  
  5. [“a”,  “g”)   [“g”,  “m”)   [“m”,  “s”)   [“s”,

     “z”)   [“d”,  “g”)   100  GB   500  GB   100  GB   100  GB   100  GB   400  GB   200  GB   100  GB  
  6. [“a”,  “d”)  300     [“g”,  “k”)  300   [“m”,

     “s”)   [“s”,  “z”)   400  GB   400  GB   100  GB   100  GB   [“d”,  “g”)  100   [“k”,  “m”)  100  
  7. null     Numbers   Strings   Objects   Arrays

      binary  data   ObjectIds   booleans   Dates   regular  expressions   smaller   bigger  
  8. Balancing   mongos   balancer   Config  server   Config

     server   Config  server   Shard  1   Shard  2  
  9. Balancing   mongos   balancer   Config  server   Config

     server   Config  server   Shard  1   Shard  2   Move  chunk  X   to  shard  2  
  10. Balancing   Number  of  chunks   Migra:on  threshold   <

     20   2   21-­‐80   4   80+   8  
  11. Balancing  schedule   db.seangs.update({  _id  :  "balancer"  },    

    {     $set  :     {     ac2veWindow  :  {  start  :  "23:00",  stop  :  "6:00"  }     }     },  true  )    
  12. mongos   Shard  1   Shard  2   Shard  3

      Request  without  shard  key  
  13. Consider  the  shard  cluster  if:   •  Data  exceeds  the

     storage  capacity  of  a  single  node   •  Size  of  working  set  will  soon  exceed  your  RAM   •  Large  amount  of  writes  
  14. Restric2ons   •  You  cannot  update  a  shard  key  

    •  You  must  use  a  shard  key  for  a  single  update   •  Index  on  shard  key  
  15. Ideal  shard  key   •  easily  divisible.   •  will

     distribute  write  opera2ons  among  the   cluster   •  will  make  it  possible  for  the  mongos  to  return   most  query  opera2ons  directly  from  a  single   specific  mongod  instance  
  16. Choosing  a  shard  key   {        

     _id:  "1",          user_id:  "2345652221",          date_2me:  "2012-­‐10-­‐04“,          tweet_text:  “Hello  world”   }   Reliability  
  17. Choosing  a  shard  key   Low-­‐cardinality  key     {

      Con:nent:  “Europe”,   Name:  “Tom”,   …   }     Zip  code?