Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Camille Fournier on The Chubby lock service for loosely-coupled distributed systems

Camille Fournier on The Chubby lock service for loosely-coupled distributed systems

Distributed consensus is often discussed in terms of algorithms: Paxos, ZAB, RAFT, etc. But while the algorithms may be more or less mind-bending, for me the more interesting aspect of distributed consensus is creating systems that support it for the general use case. This paper, on Google's Chubby lock service, is the story of happens when a system stops being a polite theory, and starts getting real-world use.

To anyone who has worked in depth as a distributed systems engineer, Chubby is a beautiful paper. It is not a paper about algorithms and their limits, or a toy fringe system created by grad students to test a hypothesis. It is a paper that describes the real tradeoffs that real systems engineers make when designing something to solve a large set of problems well enough. This paper shows the key insights that the authors had as to how such a system might be used, and awareness of what it should do well, and what it should not try to do well. It details how Chubby was designed, but then goes further to describe how it ended up being used when released to the wild, and the surprises and consequences of these design decisions.


October 20, 2014

More Decks by Papers_We_Love

Other Decks in Technology


  1. Chubby   a  Mike  Burrows  Joint Camille  Fournier,  CTO,  Rent

     the  Runway   for  Papers!  We!  Love!   @skamille
  2. What  is  Chubby? Chubby  is  a  self-­‐described  lock  service  

    • Allow  clients  to  synchronize  their  activities   and  agree  on  basic  information  about  their   environment   Help  developers  deal  with  coarse-­‐grained  sync,   in  particular  leader  election
  3. How  does  it  work? Solve  distributed  consensus  using  asynchronous  

    communication   At  its  core,  Paxos
  4. Interesting  not  due  to  deep  fundamental   algorithms,  but  due

     to  how  you  take  those   fundamental  concepts  and  create  a  system   usable  by  many  apps,  devs,  etc
  5. System  Structure

  6. System  Structure Definitions:   Chubby  Cell   • small  set

     of  servers  (typically  5)  known  as   replicas   Master   • the  replica  that  handles  all  writes  and  reads
  7. Data  Model:  Files,  Directories,  Handles Exports  a  file  system  interface

        /ls/foo/wombat/pouch   ls:  the  Chubby  common  prefix,  stands  for  “lock   service”   foo:  the  name  of  the  chubby  cell  (resolves  to   one  or  more  servers  via  DNS  lookup)   /wombat/pouch:  interpreted  within  named  cell
  8. Namespace  details Only  files  and  directories  (collectively,  nodes)   Each

     node  has  only  one  name  (no  links)   Nodes  may  be  permanent  or  ephemeral   • Ephemeral  deleted  if  no  client  has  them  open   ACLs  inherited  from  parent  on  creation
  9. More  miscellanea Per-­‐node  metadata  for  bookeeping     instance  number

        content  generation  number     lock  generation  number     ACL  generation  number   Handles   • Obtained  when  a  node  is  opened.  Sequence  number  to  tell   master  which  generation  created  it,  mode  info  to  tell  who   has  handle  open
  10. Locks Advisory  rather  than  mandatory   Potential  lock  problems  in

     distributed  systems     A  holds  a  lock  L,  issues  request  W,  then  fails     B  acquires  L  (because  A  fails),  performs  actions     W  arrives  (out-­‐of-­‐order)  after  B’s  actions   Solution  #1:  backward  compatible   Lock  server  will  prevent  other  clients  from  getting  the  lock  if  a  lock  become   inaccessible  or  the  holder  has  failed   “Draining  the  queue”  of  unprocessed  events  before  someone  else  can  acquire  the   lock   Solution  #2:  sequencer       A  lock  holder  can  obtain  a  sequencer  from  Chubby       It  attaches  the  sequencer  to  any  requests  that  it  sends  to  other  servers       The  other  servers  can  verify  the  sequencer  information
  11. Something  confusing... “The  validity  of  a  sequencer  can  be  checked

      against  the  server’s  Chubby  cache  or,  if  the   server  does  not  wish  to  maintain  a  session   with  Chubby,  the  most  recent  sequencer  that   the  server  has  observed”   Wha?
  12. Remember:  Locks  are  advisory All  we  guarantee  is  that  locks

     conflict  only  with   other  attempts  to  acquire  the  same  lock.  They   do  NOT  make  locked  objects  inaccessible  to   clients  not  holding  their  locks.
  13. Events When  you  create  a  handle,  you  can  subscribe  to

      events!   File  modified   Child  node  changed   Master  failover   Handle  invalid   Delivered  after  corresponding  action  has  taken   place
  14. Caching Clients  cache  file  data  and  node  meta-­‐data  via  in-­‐memory

     write-­‐ through  cache   When  node  is  changed,  modification  is  blocked  while  master   invalidates  data  in  all  caches   During  invalidation,  master  treats  node  as  uncachable   Caching  protocol  invalidates  cached  data  on  a  change,  never   updates  it
  15. Sessions  and  KeepAlives Session:  Relationship  between  Chubby  client  and  

    Chubby  cell,  maintained  by  KeepAlives   Created  on  first  contact  with  Chubby  master   Ended  when  terminated,  or  left  idle  with  no   open  handles  or  no  calls
  16. KeepAlives Not  quite  heartbeats...   Special  RPC  handled  by  blocking

     the  response  until  the  client’s  lease   is  close  to  expiring,  then  allowing  it  to  return  to  the  client  with  the   new  lease   Client  initiates  new  KeepAlive  immediately  upon  receiving  previous   reply   Also  used  to  transmit  events!  This  ensures  that  clients  can’t  maintain   a  session  without  acknowledging  cache  invalidation   Handling  behavior  during  what  might  be  a  “disconnect”  from  the   master  is  done  via  a  grace  period  
  17. Master  Fail-­‐over

  18. Basically... We  don’t  want  to  expire  all  our  clients  when

     the  master   fails  over,  because  re-­‐establishing  sessions  and   redoing  all  the  things  the  clients  do  on  reconnect  is  a   pain  in  the  ass   So  the  client  can’t  do  NEW  work,  but  it  doesn’t  close  its   session,  either
  19. “Readers  will  be  unsurprised  to  learn  that  the  fail-­‐over  code,

     which  is   exercised  far  less  often  than  other  parts  of  the  system,  has  been  a  rich   source  of  interesting  bugs.” Indeed.
  20. Now  on  to  the  interesting  part Design  Rationale

  21. Two  Key  Design  Decisions 1. Lock  Service,  as  opposed  to

     library  or  service   for  consensus   2. Serves  small  files  to  permit  using  the  service   to  share  data  such  as  advertisement  and   config
  22. Why  not  libPaxos? A  client  Paxos  library  would  depend  on

     no  other   services...,  and  would  provide  a  standard   framework  for  programmers,  assuming  their   services  can  be  implemented  as  state   machines.  
  23. Hell  is  Other  Programmers

  24. Service  Advantages:  Part  1 Devs  don’t  plan  for  HA  

    Code  needs  to  be  specially  structured  for  use   with  consensus  protocols   Service  enables  code  to  have  correct  distributed   locking  without  having  to  rewrite  the  whole   damn  thing
  25. Service  Advantages  2,  Electric  Boogaloo When  you  are  electing  a

     primary  or  partitioning   data  dynamically,  you  often  need  to  advertise   what  the  state  is   Supporting  the  storage  and  fetching  of  small   quantities  of  data  is  useful!   You  can  do  it  with  DNS  but  DNS  TTL  is  kind  of  a   pain  in  the  ass
  26. Service  Advantages  III Programmers  understand  lock-­‐based  interfaces     Sort

     of         Not  really   But  hey,  a  familiar  interface  makes  them  use   something  that  works  vs  some  hack  that  they   threw  together!
  27. Service  4dvantages   Distributed  consensus  algos  use  quorums  to  

    make  decisions,  which  means  they  have  to   have  replicas,  which  means  HA   Having  HA  in  the  service  means  the  client  can   make  safe  decisions  even  when  it  does  not   have  its  own  majority!
  28. None
  29. Coarse-­‐grained  Locking Lock-­‐acquisition  rate  only  weakly  related  to   transaction

     rate  of  client  apps   Locks  acquired  rarely   Lower  load  on  the  system
  30. Coarse-­‐grained  Locking Coarse-­‐grained  locks  tend  to  protect  things  that  

    require  costly  recovery  procedures   Coarse-­‐grained  locks  should  survive  system   failure   If  you  want  fine-­‐grained  locking,  implement  your   own  lock  service  using  Chubby  to  coordinate   blocks  of  lock  groups  to  lock  servers
  31. Learnings As  a  product  manager  might  call   them...

  32. How  did  people  use  this? Naming!   Most  traffic  is

     session  KeepAlives   Some  reads  (from  cache  misses),  few  writes
  33. Outages  and  Data  Loss The  network  (maintenance,  issues)  causes  

    outages   Database  software  errors  and  operator  error   cause  data  loss
  34. Performance  sensitivity Clients  rarely  care  about  latency  to  Chubby  

    provided  sessions  don’t  drop   Extremely  sensitive  to  performance  of  local   Chubby  cache   Server  overloads  above  90,000  sessions  or  due   to  client  spam   Scaling  depends  on  reducing  communication
  35. If  ya  like  it  then  you  shoulda  put  a  proxy

     in  front  of  it Java  compatibility?  PROXY!  (ok  not  exactly  but  close  enough)   Name  service?  PROXY!   Proxy:   • Trusted  process  that  passes  requests  from  other   clients  to  a  Chubby  cell   Layer  of  indirection,  allows  different  langs,  different   constraints,  more  load  per  cell
  36. Most  people  want  a  Name  Service DNS  is  hard  to

     scale  via  TTL   • 3000  servers  communicating  with  each  other  with  60s  TTL   requires  150K  lookups  per  second   Chubby  can  handle  more,  but  also  name  resolution  doesn’t  need   Chubby-­‐level  preciseness   • Add  a  proxy  designed  for  name  lookups!
  37. Did  I  mention  the  problem  with  other  programmers? They  will

     write  loops  that  constantly  retry  failed   commands   They  will  try  to  use  this  as  a  data  storage  system   They  think  that  a  lock  server  makes  for  good   pub/sub  messaging
  38. More  difficulties  with  developers They  rarely  consider  availability.   They

     don’t  think  about  failure  probabilities.   They  don’t  understand  distributed  systems.   They  blindly  follow  APIs,  don’t  read  the  documentation   carefully.   They  write  bugs.   They  don’t  predict  the  future  very  well.
  39. Mitigating  the  impacts  of  developers Review  all  their  code  

    Review  the  way  they  want  to  use  their  system   Entirely  control  the  client  and  make  bad   behavior  painful   Aggressively  cache
  40. In  conclusion... Centralized  service:  Useful  for  many  reasons   Creating

     shared  core  architecture  is  hard   Developers  can  and  will  fsck  everything  up   Having  fundamental  insights  and  making   decision  up  front  about  what  you  are  and  are   not  building  helps  you  to  create  something   great
  41. fin @skamille