Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sharding and Scaling Your Database

Neha
May 01, 2013

Sharding and Scaling Your Database

Determining a data storage solution as your web application scales can be the most difficult part of web development, and takes time away from developing application features. MongoDB, Redis, Postgres, Riak, Cassandra, Voldemort, NoSQL, MySQL, NewSQL — the options are overwhelming, and all claim to be elastic, fault-tolerant, durable, and give great performance for both reads and writes. This talk will describe these different storage solutions and explain what is really important when choosing a datastore — your application data schema and feature requirements. You’ll learn how to think about scaling, consistency, sharding, and fault-tolerance, and most importantly, you’ll be able to start building your web application with confidence knowing that you can scale when the time comes.

Presented at Future Insights Live, Las Vegas, NV

Neha

May 01, 2013
Tweet

More Decks by Neha

Other Decks in Programming

Transcript

  1. What  to  Think  About  When   Building  Your  ApplicaCon  

    Neha  Narula   @neha   May  1,  2013  
  2. In  This  Talk   What  to  think  about  when  

    choosing  a  datastore  for  a  web   applicaCon  
  3. Every  so  oKen…   Friends  ask  me  for  advice  when

      they  are  building  a  new   applicaCon  
  4. I  am  going  to  be  iteraCng  on  my  app  

      I  don’t  have  any  customers  yet   And  my  current  dataset  is  Cny   But…  
  5.  Can  you  tell  me  which  NoSQL   database  I  should

     use   And  how  to  shard  it?”  
  6. What  to  think  about  when   choosing  a  datastore  for

     a  web   applicaCon   Hint:  Not  Scaling  
  7. SQL   •  SQL:    Query  language   •  Features:

       ACID     SELECT  descripCon  FROM  conferences   WHERE  name  =  “Future  Insights  Live”;  
  8. NoSQL   •  Key/Value  store   •  Document  stores  

    •  Simple,  less  restricCve     GET  “filive_descripCon”  
  9. Sharding   •  Split  data  between  mulCple  servers   • 

    One  way  of  gecng  beWer  performance     sh.shardCollection(“records.conferences”, {“name”:1 })  
  10. First  five  minutes  –     what’s  it  like?  

    Idea  credit:  Adam  Marcus,  The  NoSQL  Ecosystem,  HPTS  2011     via  JusCn  Sheehy  from  Basho    
  11. Developing   •  MulCple  people  working  on  the  same  code

      •  TesCng  new  features  =>  new  access  paWerns   •  New  person  comes  along…  
  12. QuesCons   •  What  is  your  mix  of  reads  and

     writes?   •  How  much  data  do  you  have?   •  Do  you  need  transacCons?   •  What  are  your  access  paWerns?   •  What  will  grow  and  change?   •  What  do  you  already  know?  
  13. Reads  and  Writes   •  Write  opCmized  vs.  Read  opCmized

      •  MongoDB  used  to  have  a  global  write  lock   – Now  a  lock  per  database  (version  2.2)   •  No  concurrent  writes!  
  14. Size  of  Data   •  Does  it  fit  in  memory?

      •  Do  you  have  another  persistent  store?   •  Worst  case,  Redis  needs  2X  the  memory  of   your  data!  
  15. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  16. Performance   •  RelaConal  databases  are  considered  slow   • 

    But  they  are  doing  a  lot  of  work!   •  SomeCmes  you  need  that  work,  someCmes   you  don’t.  
  17. Flexibility  vs.  Performance   •  We  might  want  to  pay

     the  overhead  for  query   flexibility   •  In  a  primary  key  datastore,  we  can  only  ask   queries  on  primary  key,  but  they  are  generally   faster   •  SQL  gives  us  flexibility  to  change  our  queries,   but  might  be  slow  
  18. Safeness  vs.  Performance   •  Being  safe  is  slow  

    •  TransacCons   •  WriCng  to  disk  
  19. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  20. Durability   1.  Client:    write   2.  Server:  flush

     to  disk   3.  Server:  Send  done   4.  Server  CRASH   5.  Server  recover   6.  Client:  see  the  write  
  21. No  Durability   App   Datastore   Buy   shoes

      Ship   Neha   Shoes   Charge   $100   Write  
  22. Bad  Defaults     By  default,  MongoDB  does  not  fsync()

     before   returning  to  client  on  write   – Need  j:true     By  default,  MySQL  uses  MyISAM  instead  of   InnoDB  
  23. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  24. Consistency   •  ApplicaCon  business  logic   •  Ask  yourself:

     How  bad  can  things  get?    What’s   ok  to  show  the  user?   •  Compromise  for  performance  
  25. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  26. SpecializaCon   •  You  know  your     – query  access

     paWerns  and  traffic   – consistency  requirements   •  Design  specialized  lookups   – TransacConal  datastore  for  consistent  data   – Memcached  for  staCc,  mostly  unchanging  content   – Redis  for  a  data  processing  pipeline   •  Know  what  tradeoffs  to  make  
  27. Ways  to  Scale   •  Reads   – Cache   – Replicate

      – Shard   •  Writes   – Shard  data  amongst  mulCple  servers    
  28. Post  Page   SELECT * " FROM comments" WHERE post_id

    = 100" "   zrange comments:100 0 -1"  
  29. Example  Sharded   Database   Database   Database   Database

      comments table post_id" 100-199" 0-99" 200-199" Webservers  
  30. MySQL MySQL MySQL Query  Goes  to  One   Shard  

    MySQL MySQL MySQL Comments   on  post  100   100-199" 0-99" 200-299"
  31. MySQL MySQL MySQL Many  Concurrent   Queries   MySQL MySQL

    MySQL Comments   on  post  100   100-199" 0-99" 200-299" Comments   on  post  52   Comments   on  post  289  
  32. Complex  Queries   SELECT * " FROM friends, statuses" WHERE

    friends.f1 = “neha”" AND statuses.user = friends.f2" " " GET neha_friends for each friend in friends GET friend_status
  33. MySQL MySQL MySQL Concurrent  Complex   Queries   MySQL MySQL

    MySQL JOIN  Query   JOIN  Query   JOIN  Query  
  34. Concurrent  NoSQL   App  Joins   Several  GET   queries

      Several  GET   queries   Several  GET   queries  
  35. Lessons  Learned   •  Don’t  worry  at  the  beginning:  Use

     what  you   know   •  Know  your  applicaCon,  make  the  right   tradeoffs   •  Not  about  SQL  vs.  NoSQL  systems  –  about   simple  vs.  complex  applicaCon  queries