Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sharding and Scaling Your Database

Neha
May 01, 2013

Sharding and Scaling Your Database

Determining a data storage solution as your web application scales can be the most difficult part of web development, and takes time away from developing application features. MongoDB, Redis, Postgres, Riak, Cassandra, Voldemort, NoSQL, MySQL, NewSQL — the options are overwhelming, and all claim to be elastic, fault-tolerant, durable, and give great performance for both reads and writes. This talk will describe these different storage solutions and explain what is really important when choosing a datastore — your application data schema and feature requirements. You’ll learn how to think about scaling, consistency, sharding, and fault-tolerance, and most importantly, you’ll be able to start building your web application with confidence knowing that you can scale when the time comes.

Presented at Future Insights Live, Las Vegas, NV

Neha

May 01, 2013
Tweet

More Decks by Neha

Other Decks in Programming

Transcript

  1. Sharding  and  Scaling  
    Your  Database  
    Neha  Narula  
    @neha  
    May  1,  2013  

    View Slide

  2. What  to  Think  About  When  
    Building  Your  ApplicaCon  
    Neha  Narula  
    @neha  
    May  1,  2013  

    View Slide

  3. @neha  
    Froogle  
    Blobstore  
    NaCve  Client  

    View Slide

  4. In  This  Talk  
    What  to  think  about  when  
    choosing  a  datastore  for  a  web  
    applicaCon  

    View Slide

  5. Every  so  oKen…  
    Friends  ask  me  for  advice  when  
    they  are  building  a  new  
    applicaCon  

    View Slide

  6. Friend  

    View Slide

  7. “Hi  Neha!    I  am  making  a  new  
    applicaCon.      

    View Slide

  8. I  have  heard  MySQL  sucks  and  I  
    should  use  NoSQL.      

    View Slide

  9. I  am  going  to  be  iteraCng  on  my  app    
    I  don’t  have  any  customers  yet  
    And  my  current  dataset  is  Cny  
    But…  

    View Slide

  10.  Can  you  tell  me  which  NoSQL  
    database  I  should  use  
    And  how  to  shard  it?”  

    View Slide

  11. Me  

    View Slide

  12. hWp://knowyourmeme.com/memes/facepalm  

    View Slide

  13. View Slide

  14. View Slide

  15. Hype  

    View Slide

  16. What  to  think  about  when  
    choosing  a  datastore  for  a  web  
    applicaCon  
    Hint:  Not  Scaling  

    View Slide

  17. Outline  
    SQL,  NoSQL,  and  sharding  
    Stages  of  development  
    Sharding  myths  

    View Slide

  18. Outline  
    SQL,  NoSQL,  and  sharding  
    Stages  of  development  
    Sharding  myths  

    View Slide

  19. SQL  
    •  SQL:    Query  language  
    •  Features:    ACID  
     
    SELECT  descripCon  FROM  conferences  
    WHERE  name  =  “Future  Insights  Live”;  

    View Slide

  20. SQL/NoSQL  

    View Slide

  21. NoSQL  
    •  Key/Value  store  
    •  Document  stores  
    •  Simple,  less  restricCve  
     
    GET  “filive_descripCon”  

    View Slide

  22. Sharding  
    •  Split  data  between  mulCple  servers  
    •  One  way  of  gecng  beWer  performance  
     
    sh.shardCollection(“records.conferences”,
    {“name”:1 })
     

    View Slide

  23. Database  
    Example  Sharded  
    Database  
    Database  
    Database  
    Database  

    View Slide

  24. Outline  
    SQL,  NoSQL,  and  sharding  
    Stages  of  development  
    Sharding  myths  

    View Slide

  25. Prototype   Launch   Scale  

    View Slide

  26. Prototype  

    View Slide

  27. First  five  minutes  –    
    what’s  it  like?  
    Idea  credit:  Adam  Marcus,  The  NoSQL  Ecosystem,  HPTS  2011  
     
    via  JusCn  Sheehy  from  Basho    

    View Slide

  28. First  Five  Minutes:  Redis  
    hWp://simonwillison.net/2009/Oct/22/redis/  

    View Slide

  29. First  Five  Minutes:  MySQL  

    View Slide

  30. Developing  
    •  MulCple  people  working  on  the  same  code  
    •  TesCng  new  features  =>  new  access  paWerns  
    •  New  person  comes  along…  

    View Slide

  31. Redis  
    Time  to  go  get  
    lunch  

    View Slide

  32. MySQL  

    View Slide

  33. Launch  

    View Slide

  34. QuesCons  
    •  What  is  your  mix  of  reads  and  writes?  
    •  How  much  data  do  you  have?  
    •  Do  you  need  transacCons?  
    •  What  are  your  access  paWerns?  
    •  What  will  grow  and  change?  
    •  What  do  you  already  know?  

    View Slide

  35. Reads  and  Writes  
    •  Write  opCmized  vs.  Read  opCmized  
    •  MongoDB  used  to  have  a  global  write  lock  
    – Now  a  lock  per  database  (version  2.2)  
    •  No  concurrent  writes!  

    View Slide

  36. Size  of  Data  
    •  Does  it  fit  in  memory?  
    •  Do  you  have  another  persistent  store?  
    •  Worst  case,  Redis  needs  2X  the  memory  of  
    your  data!  

    View Slide

  37. •  Performance  
    Latency  tolerance  
    •  Durability  
    Data  loss  tolerance  
    •  Consistency  
    Weird  result  tolerance  
    •  Availability  
    DownCme  tolerance  

    View Slide

  38. Performance  
    •  RelaConal  databases  are  considered  slow  
    •  But  they  are  doing  a  lot  of  work!  
    •  SomeCmes  you  need  that  work,  someCmes  
    you  don’t.  

    View Slide

  39. Flexibility  vs.  Performance  
    •  We  might  want  to  pay  the  overhead  for  query  
    flexibility  
    •  In  a  primary  key  datastore,  we  can  only  ask  
    queries  on  primary  key,  but  they  are  generally  
    faster  
    •  SQL  gives  us  flexibility  to  change  our  queries,  
    but  might  be  slow  

    View Slide

  40. Safeness  vs.  Performance  
    •  Being  safe  is  slow  
    •  TransacCons  
    •  WriCng  to  disk  

    View Slide

  41. •  Performance  
    Latency  tolerance  
    •  Durability  
    Data  loss  tolerance  
    •  Consistency  
    Weird  result  tolerance  
    •  Availability  
    DownCme  tolerance  

    View Slide

  42. Durability  
    •  Really  persistent  datastores  
    •  No  data  loss  or  corrupCon  

    View Slide

  43. Durability  
    1.  Client:    write  
    2.  Server:  flush  to  disk  
    3.  Server:  Send  done  
    4.  Server  CRASH  
    5.  Server  recover  
    6.  Client:  see  the  write  

    View Slide

  44. No  Durability  
    App   Datastore  
    Buy  
    shoes  
    Ship  
    Neha  
    Shoes  
    Charge  
    $100  
    Write  

    View Slide

  45. Bad  Defaults  
     
    By  default,  MongoDB  does  not  fsync()  before  
    returning  to  client  on  write  
    – Need  j:true  
     
    By  default,  MySQL  uses  MyISAM  instead  of  
    InnoDB  

    View Slide

  46. DeleCon  
    •  Know  your  laws!  
    •  Purge  backups  and  copies  

    View Slide

  47. •  Performance  
    Latency  tolerance  
    •  Durability  
    Data  loss  tolerance  
    •  Consistency  
    Weird  result  tolerance  
    •  Availability  
    DownCme  tolerance  

    View Slide

  48. How  Messed  Up  Can  Things  Get?  

    View Slide

  49. Consistency  
    •  ApplicaCon  business  logic  
    •  Ask  yourself:  How  bad  can  things  get?    What’s  
    ok  to  show  the  user?  
    •  Compromise  for  performance  

    View Slide

  50. •  Performance  
    Latency  tolerance  
    •  Durability  
    Data  loss  tolerance  
    •  Consistency  
    Weird  result  tolerance  
    •  Availability  
    DownCme  tolerance  

    View Slide

  51. Availability  
    •  Expect  failures  
    •  What  happens  when  a  datacenter  goes  down?  

    View Slide

  52. Scale  

    View Slide

  53. SpecializaCon  
    •  You  know  your    
    – query  access  paWerns  and  traffic  
    – consistency  requirements  
    •  Design  specialized  lookups  
    – TransacConal  datastore  for  consistent  data  
    – Memcached  for  staCc,  mostly  unchanging  content  
    – Redis  for  a  data  processing  pipeline  
    •  Know  what  tradeoffs  to  make  

    View Slide

  54. Ways  to  Scale  
    •  Reads  
    – Cache  
    – Replicate  
    – Shard  
    •  Writes  
    – Shard  data  amongst  mulCple  servers  
     

    View Slide

  55. Cache  
    Database  
    Cache  
    App  

    View Slide

  56. Replicate  
    Database  
    App  
    Database  
    App  

    View Slide

  57. Outline  
    SQL,  NoSQL,  and  sharding  
    Stages  of  development  
    Sharding  myths  

    View Slide

  58. Lots  of  Folklore  

    View Slide

  59. Myth  
    NoSQL  scales  beWer  than  a  
    relaConal  database  

    View Slide

  60. Tell  These  Guys  

    View Slide

  61. Post  Page  
    SELECT * "
    FROM comments"
    WHERE post_id = 100"
    "
     
    zrange comments:100 0 -1"
     

    View Slide

  62. Example  Sharded  
    Database  
    Database  
    Database  
    Database  
    comments table
    post_id"
    100-199"
    0-99"
    200-199"
    Webservers  

    View Slide

  63. MySQL
    MySQL
    MySQL
    Query  Goes  to  One  
    Shard  
    MySQL
    MySQL
    MySQL
    Comments  
    on  post  100  
    100-199"
    0-99"
    200-299"

    View Slide

  64. MySQL
    MySQL
    MySQL
    Many  Concurrent  
    Queries  
    MySQL
    MySQL
    MySQL
    Comments  
    on  post  100  
    100-199"
    0-99"
    200-299"
    Comments  
    on  post  52  
    Comments  
    on  post  289  

    View Slide

  65. Complex  Queries  
    SELECT * "
    FROM friends, statuses"
    WHERE friends.f1 = “neha”"
    AND statuses.user = friends.f2"
    "
    "
    GET neha_friends
    for each friend in friends
    GET friend_status

    View Slide

  66. MySQL
    MySQL
    MySQL
    Complex  Queries  
    MySQL
    MySQL
    MySQL
    JOIN  Query  

    View Slide

  67. MySQL
    MySQL
    MySQL
    Concurrent  Complex  
    Queries  
    MySQL
    MySQL
    MySQL
    JOIN  Query  
    JOIN  Query  
    JOIN  Query  

    View Slide

  68. NoSQL  App  JOIN  
    Several  GET  
    queries  
    NoSQL  key/
    value  stores  

    View Slide

  69. Concurrent  NoSQL  
    App  Joins  
    Several  GET  
    queries  
    Several  GET  
    queries  
    Several  GET  
    queries  

    View Slide

  70. Lessons  Learned  
    •  Don’t  worry  at  the  beginning:  Use  what  you  
    know  
    •  Know  your  applicaCon,  make  the  right  
    tradeoffs  
    •  Not  about  SQL  vs.  NoSQL  systems  –  about  
    simple  vs.  complex  applicaCon  queries  

    View Slide

  71. Thanks!  
    @neha  
    [email protected]  

    View Slide