Slide 1

Slide 1 text

Sharding  and  Scaling   Your  Database   Neha  Narula   @neha   May  1,  2013  

Slide 2

Slide 2 text

What  to  Think  About  When   Building  Your  ApplicaCon   Neha  Narula   @neha   May  1,  2013  

Slide 3

Slide 3 text

@neha   Froogle   Blobstore   NaCve  Client  

Slide 4

Slide 4 text

In  This  Talk   What  to  think  about  when   choosing  a  datastore  for  a  web   applicaCon  

Slide 5

Slide 5 text

Every  so  oKen…   Friends  ask  me  for  advice  when   they  are  building  a  new   applicaCon  

Slide 6

Slide 6 text

Friend  

Slide 7

Slide 7 text

“Hi  Neha!    I  am  making  a  new   applicaCon.      

Slide 8

Slide 8 text

I  have  heard  MySQL  sucks  and  I   should  use  NoSQL.      

Slide 9

Slide 9 text

I  am  going  to  be  iteraCng  on  my  app     I  don’t  have  any  customers  yet   And  my  current  dataset  is  Cny   But…  

Slide 10

Slide 10 text

 Can  you  tell  me  which  NoSQL   database  I  should  use   And  how  to  shard  it?”  

Slide 11

Slide 11 text

Me  

Slide 12

Slide 12 text

hWp://knowyourmeme.com/memes/facepalm  

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Hype  

Slide 16

Slide 16 text

What  to  think  about  when   choosing  a  datastore  for  a  web   applicaCon   Hint:  Not  Scaling  

Slide 17

Slide 17 text

Outline   SQL,  NoSQL,  and  sharding   Stages  of  development   Sharding  myths  

Slide 18

Slide 18 text

Outline   SQL,  NoSQL,  and  sharding   Stages  of  development   Sharding  myths  

Slide 19

Slide 19 text

SQL   •  SQL:    Query  language   •  Features:    ACID     SELECT  descripCon  FROM  conferences   WHERE  name  =  “Future  Insights  Live”;  

Slide 20

Slide 20 text

SQL/NoSQL  

Slide 21

Slide 21 text

NoSQL   •  Key/Value  store   •  Document  stores   •  Simple,  less  restricCve     GET  “filive_descripCon”  

Slide 22

Slide 22 text

Sharding   •  Split  data  between  mulCple  servers   •  One  way  of  gecng  beWer  performance     sh.shardCollection(“records.conferences”, {“name”:1 })  

Slide 23

Slide 23 text

Database   Example  Sharded   Database   Database   Database   Database  

Slide 24

Slide 24 text

Outline   SQL,  NoSQL,  and  sharding   Stages  of  development   Sharding  myths  

Slide 25

Slide 25 text

Prototype   Launch   Scale  

Slide 26

Slide 26 text

Prototype  

Slide 27

Slide 27 text

First  five  minutes  –     what’s  it  like?   Idea  credit:  Adam  Marcus,  The  NoSQL  Ecosystem,  HPTS  2011     via  JusCn  Sheehy  from  Basho    

Slide 28

Slide 28 text

First  Five  Minutes:  Redis   hWp://simonwillison.net/2009/Oct/22/redis/  

Slide 29

Slide 29 text

First  Five  Minutes:  MySQL  

Slide 30

Slide 30 text

Developing   •  MulCple  people  working  on  the  same  code   •  TesCng  new  features  =>  new  access  paWerns   •  New  person  comes  along…  

Slide 31

Slide 31 text

Redis   Time  to  go  get   lunch  

Slide 32

Slide 32 text

MySQL  

Slide 33

Slide 33 text

Launch  

Slide 34

Slide 34 text

QuesCons   •  What  is  your  mix  of  reads  and  writes?   •  How  much  data  do  you  have?   •  Do  you  need  transacCons?   •  What  are  your  access  paWerns?   •  What  will  grow  and  change?   •  What  do  you  already  know?  

Slide 35

Slide 35 text

Reads  and  Writes   •  Write  opCmized  vs.  Read  opCmized   •  MongoDB  used  to  have  a  global  write  lock   – Now  a  lock  per  database  (version  2.2)   •  No  concurrent  writes!  

Slide 36

Slide 36 text

Size  of  Data   •  Does  it  fit  in  memory?   •  Do  you  have  another  persistent  store?   •  Worst  case,  Redis  needs  2X  the  memory  of   your  data!  

Slide 37

Slide 37 text

•  Performance   Latency  tolerance   •  Durability   Data  loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  

Slide 38

Slide 38 text

Performance   •  RelaConal  databases  are  considered  slow   •  But  they  are  doing  a  lot  of  work!   •  SomeCmes  you  need  that  work,  someCmes   you  don’t.  

Slide 39

Slide 39 text

Flexibility  vs.  Performance   •  We  might  want  to  pay  the  overhead  for  query   flexibility   •  In  a  primary  key  datastore,  we  can  only  ask   queries  on  primary  key,  but  they  are  generally   faster   •  SQL  gives  us  flexibility  to  change  our  queries,   but  might  be  slow  

Slide 40

Slide 40 text

Safeness  vs.  Performance   •  Being  safe  is  slow   •  TransacCons   •  WriCng  to  disk  

Slide 41

Slide 41 text

•  Performance   Latency  tolerance   •  Durability   Data  loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  

Slide 42

Slide 42 text

Durability   •  Really  persistent  datastores   •  No  data  loss  or  corrupCon  

Slide 43

Slide 43 text

Durability   1.  Client:    write   2.  Server:  flush  to  disk   3.  Server:  Send  done   4.  Server  CRASH   5.  Server  recover   6.  Client:  see  the  write  

Slide 44

Slide 44 text

No  Durability   App   Datastore   Buy   shoes   Ship   Neha   Shoes   Charge   $100   Write  

Slide 45

Slide 45 text

Bad  Defaults     By  default,  MongoDB  does  not  fsync()  before   returning  to  client  on  write   – Need  j:true     By  default,  MySQL  uses  MyISAM  instead  of   InnoDB  

Slide 46

Slide 46 text

DeleCon   •  Know  your  laws!   •  Purge  backups  and  copies  

Slide 47

Slide 47 text

•  Performance   Latency  tolerance   •  Durability   Data  loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  

Slide 48

Slide 48 text

How  Messed  Up  Can  Things  Get?  

Slide 49

Slide 49 text

Consistency   •  ApplicaCon  business  logic   •  Ask  yourself:  How  bad  can  things  get?    What’s   ok  to  show  the  user?   •  Compromise  for  performance  

Slide 50

Slide 50 text

•  Performance   Latency  tolerance   •  Durability   Data  loss  tolerance   •  Consistency   Weird  result  tolerance   •  Availability   DownCme  tolerance  

Slide 51

Slide 51 text

Availability   •  Expect  failures   •  What  happens  when  a  datacenter  goes  down?  

Slide 52

Slide 52 text

Scale  

Slide 53

Slide 53 text

SpecializaCon   •  You  know  your     – query  access  paWerns  and  traffic   – consistency  requirements   •  Design  specialized  lookups   – TransacConal  datastore  for  consistent  data   – Memcached  for  staCc,  mostly  unchanging  content   – Redis  for  a  data  processing  pipeline   •  Know  what  tradeoffs  to  make  

Slide 54

Slide 54 text

Ways  to  Scale   •  Reads   – Cache   – Replicate   – Shard   •  Writes   – Shard  data  amongst  mulCple  servers    

Slide 55

Slide 55 text

Cache   Database   Cache   App  

Slide 56

Slide 56 text

Replicate   Database   App   Database   App  

Slide 57

Slide 57 text

Outline   SQL,  NoSQL,  and  sharding   Stages  of  development   Sharding  myths  

Slide 58

Slide 58 text

Lots  of  Folklore  

Slide 59

Slide 59 text

Myth   NoSQL  scales  beWer  than  a   relaConal  database  

Slide 60

Slide 60 text

Tell  These  Guys  

Slide 61

Slide 61 text

Post  Page   SELECT * " FROM comments" WHERE post_id = 100" "   zrange comments:100 0 -1"  

Slide 62

Slide 62 text

Example  Sharded   Database   Database   Database   Database   comments table post_id" 100-199" 0-99" 200-199" Webservers  

Slide 63

Slide 63 text

MySQL MySQL MySQL Query  Goes  to  One   Shard   MySQL MySQL MySQL Comments   on  post  100   100-199" 0-99" 200-299"

Slide 64

Slide 64 text

MySQL MySQL MySQL Many  Concurrent   Queries   MySQL MySQL MySQL Comments   on  post  100   100-199" 0-99" 200-299" Comments   on  post  52   Comments   on  post  289  

Slide 65

Slide 65 text

Complex  Queries   SELECT * " FROM friends, statuses" WHERE friends.f1 = “neha”" AND statuses.user = friends.f2" " " GET neha_friends for each friend in friends GET friend_status

Slide 66

Slide 66 text

MySQL MySQL MySQL Complex  Queries   MySQL MySQL MySQL JOIN  Query  

Slide 67

Slide 67 text

MySQL MySQL MySQL Concurrent  Complex   Queries   MySQL MySQL MySQL JOIN  Query   JOIN  Query   JOIN  Query  

Slide 68

Slide 68 text

NoSQL  App  JOIN   Several  GET   queries   NoSQL  key/ value  stores  

Slide 69

Slide 69 text

Concurrent  NoSQL   App  Joins   Several  GET   queries   Several  GET   queries   Several  GET   queries  

Slide 70

Slide 70 text

Lessons  Learned   •  Don’t  worry  at  the  beginning:  Use  what  you   know   •  Know  your  applicaCon,  make  the  right   tradeoffs   •  Not  about  SQL  vs.  NoSQL  systems  –  about   simple  vs.  complex  applicaCon  queries  

Slide 71

Slide 71 text

Thanks!   @neha   [email protected]