$30 off During Our Annual Pro Sale. View Details »

Sharding and Scaling Your Database

May 01, 2013

Sharding and Scaling Your Database

Determining a data storage solution as your web application scales can be the most difficult part of web development, and takes time away from developing application features. MongoDB, Redis, Postgres, Riak, Cassandra, Voldemort, NoSQL, MySQL, NewSQL — the options are overwhelming, and all claim to be elastic, fault-tolerant, durable, and give great performance for both reads and writes. This talk will describe these different storage solutions and explain what is really important when choosing a datastore — your application data schema and feature requirements. You’ll learn how to think about scaling, consistency, sharding, and fault-tolerance, and most importantly, you’ll be able to start building your web application with confidence knowing that you can scale when the time comes.

Presented at Future Insights Live, Las Vegas, NV


May 01, 2013

More Decks by Neha

Other Decks in Programming


  1. Sharding  and  Scaling   Your  Database   Neha  Narula  

    @neha   May  1,  2013  
  2. What  to  Think  About  When   Building  Your  ApplicaCon  

    Neha  Narula   @neha   May  1,  2013  
  3. @neha   Froogle   Blobstore   NaCve  Client  

  4. In  This  Talk   What  to  think  about  when  

    choosing  a  datastore  for  a  web   applicaCon  
  5. Every  so  oKen…   Friends  ask  me  for  advice  when

      they  are  building  a  new   applicaCon  
  6. Friend  

  7. “Hi  Neha!    I  am  making  a  new   applicaCon.

  8. I  have  heard  MySQL  sucks  and  I   should  use

  9. I  am  going  to  be  iteraCng  on  my  app  

      I  don’t  have  any  customers  yet   And  my  current  dataset  is  Cny   But…  
  10.  Can  you  tell  me  which  NoSQL   database  I  should

     use   And  how  to  shard  it?”  
  11. Me  

  12. hWp://knowyourmeme.com/memes/facepalm  

  13. None
  14. None
  15. Hype  

  16. What  to  think  about  when   choosing  a  datastore  for

     a  web   applicaCon   Hint:  Not  Scaling  
  17. Outline   SQL,  NoSQL,  and  sharding   Stages  of  development

      Sharding  myths  
  18. Outline   SQL,  NoSQL,  and  sharding   Stages  of  development

      Sharding  myths  
  19. SQL   •  SQL:    Query  language   •  Features:

       ACID     SELECT  descripCon  FROM  conferences   WHERE  name  =  “Future  Insights  Live”;  
  20. SQL/NoSQL  

  21. NoSQL   •  Key/Value  store   •  Document  stores  

    •  Simple,  less  restricCve     GET  “filive_descripCon”  
  22. Sharding   •  Split  data  between  mulCple  servers   • 

    One  way  of  gecng  beWer  performance     sh.shardCollection(“records.conferences”, {“name”:1 })  
  23. Database   Example  Sharded   Database   Database   Database

  24. Outline   SQL,  NoSQL,  and  sharding   Stages  of  development

      Sharding  myths  
  25. Prototype   Launch   Scale  

  26. Prototype  

  27. First  five  minutes  –     what’s  it  like?  

    Idea  credit:  Adam  Marcus,  The  NoSQL  Ecosystem,  HPTS  2011     via  JusCn  Sheehy  from  Basho    
  28. First  Five  Minutes:  Redis   hWp://simonwillison.net/2009/Oct/22/redis/  

  29. First  Five  Minutes:  MySQL  

  30. Developing   •  MulCple  people  working  on  the  same  code

      •  TesCng  new  features  =>  new  access  paWerns   •  New  person  comes  along…  
  31. Redis   Time  to  go  get   lunch  

  32. MySQL  

  33. Launch  

  34. QuesCons   •  What  is  your  mix  of  reads  and

     writes?   •  How  much  data  do  you  have?   •  Do  you  need  transacCons?   •  What  are  your  access  paWerns?   •  What  will  grow  and  change?   •  What  do  you  already  know?  
  35. Reads  and  Writes   •  Write  opCmized  vs.  Read  opCmized

      •  MongoDB  used  to  have  a  global  write  lock   – Now  a  lock  per  database  (version  2.2)   •  No  concurrent  writes!  
  36. Size  of  Data   •  Does  it  fit  in  memory?

      •  Do  you  have  another  persistent  store?   •  Worst  case,  Redis  needs  2X  the  memory  of   your  data!  
  37. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  38. Performance   •  RelaConal  databases  are  considered  slow   • 

    But  they  are  doing  a  lot  of  work!   •  SomeCmes  you  need  that  work,  someCmes   you  don’t.  
  39. Flexibility  vs.  Performance   •  We  might  want  to  pay

     the  overhead  for  query   flexibility   •  In  a  primary  key  datastore,  we  can  only  ask   queries  on  primary  key,  but  they  are  generally   faster   •  SQL  gives  us  flexibility  to  change  our  queries,   but  might  be  slow  
  40. Safeness  vs.  Performance   •  Being  safe  is  slow  

    •  TransacCons   •  WriCng  to  disk  
  41. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  42. Durability   •  Really  persistent  datastores   •  No  data

     loss  or  corrupCon  
  43. Durability   1.  Client:    write   2.  Server:  flush

     to  disk   3.  Server:  Send  done   4.  Server  CRASH   5.  Server  recover   6.  Client:  see  the  write  
  44. No  Durability   App   Datastore   Buy   shoes

      Ship   Neha   Shoes   Charge   $100   Write  
  45. Bad  Defaults     By  default,  MongoDB  does  not  fsync()

     before   returning  to  client  on  write   – Need  j:true     By  default,  MySQL  uses  MyISAM  instead  of   InnoDB  
  46. DeleCon   •  Know  your  laws!   •  Purge  backups

     and  copies  
  47. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  48. How  Messed  Up  Can  Things  Get?  

  49. Consistency   •  ApplicaCon  business  logic   •  Ask  yourself:

     How  bad  can  things  get?    What’s   ok  to  show  the  user?   •  Compromise  for  performance  
  50. •  Performance   Latency  tolerance   •  Durability   Data

     loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  
  51. Availability   •  Expect  failures   •  What  happens  when

     a  datacenter  goes  down?  
  52. Scale  

  53. SpecializaCon   •  You  know  your     – query  access

     paWerns  and  traffic   – consistency  requirements   •  Design  specialized  lookups   – TransacConal  datastore  for  consistent  data   – Memcached  for  staCc,  mostly  unchanging  content   – Redis  for  a  data  processing  pipeline   •  Know  what  tradeoffs  to  make  
  54. Ways  to  Scale   •  Reads   – Cache   – Replicate

      – Shard   •  Writes   – Shard  data  amongst  mulCple  servers    
  55. Cache   Database   Cache   App  

  56. Replicate   Database   App   Database   App  

  57. Outline   SQL,  NoSQL,  and  sharding   Stages  of  development

      Sharding  myths  
  58. Lots  of  Folklore  

  59. Myth   NoSQL  scales  beWer  than  a   relaConal  database

  60. Tell  These  Guys  

  61. Post  Page   SELECT * " FROM comments" WHERE post_id

    = 100" "   zrange comments:100 0 -1"  
  62. Example  Sharded   Database   Database   Database   Database

      comments table post_id" 100-199" 0-99" 200-199" Webservers  
  63. MySQL MySQL MySQL Query  Goes  to  One   Shard  

    MySQL MySQL MySQL Comments   on  post  100   100-199" 0-99" 200-299"
  64. MySQL MySQL MySQL Many  Concurrent   Queries   MySQL MySQL

    MySQL Comments   on  post  100   100-199" 0-99" 200-299" Comments   on  post  52   Comments   on  post  289  
  65. Complex  Queries   SELECT * " FROM friends, statuses" WHERE

    friends.f1 = “neha”" AND statuses.user = friends.f2" " " GET neha_friends for each friend in friends GET friend_status
  66. MySQL MySQL MySQL Complex  Queries   MySQL MySQL MySQL JOIN

  67. MySQL MySQL MySQL Concurrent  Complex   Queries   MySQL MySQL

    MySQL JOIN  Query   JOIN  Query   JOIN  Query  
  68. NoSQL  App  JOIN   Several  GET   queries   NoSQL

     key/ value  stores  
  69. Concurrent  NoSQL   App  Joins   Several  GET   queries

      Several  GET   queries   Several  GET   queries  
  70. Lessons  Learned   •  Don’t  worry  at  the  beginning:  Use

     what  you   know   •  Know  your  applicaCon,  make  the  right   tradeoffs   •  Not  about  SQL  vs.  NoSQL  systems  –  about   simple  vs.  complex  applicaCon  queries  
  71. Thanks!   @neha   narula@mit.edu