Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deploying 10,000 Nodes Simultaneously

Deploying 10,000 Nodes Simultaneously

How Rackspace is scaling deployments to 10k+ nodes. This talk includes challenges and successes.

Paul Voccio

May 05, 2013
Tweet

Other Decks in Technology

Transcript

  1. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        2     wri?ng  code  is  hard     if  you  cannot  deploy  it     it  does  not  maAer        
  2. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Deploying  stuff  at  scale   •  Where  we’ve  been   •  Where  we’re  going   Things  I’m  going  to  tell  you   011011110110100000100000011010000110000101101001  
  3. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Clustered  DBs     More  app  nodes   Couple  of  web  servers   Redundant  LBs   LB   Web   Web   App     App   DB   DB   App   Web   4   What  does  “At  Scale”  Mean?   Learning  to  Scale  
  4. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Hundreds  of  HVs   Thousands  of  HVs   Tens  of  Thousand  HVs   Hundreds  of  Thousand  HVs   Global   Cloud   Region   Region   Cell     Cell   Cell   HV   HV   HV   HV   HV   HV   Cell   Cell   Region   5   What  does  “At  Scale”  Mean?   Learning  to  Scale  OpenStack  
  5. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Could  use  LB  &  node  rota?on   •  Some  stacks  handle  this  well   •  Challenge  for  infrastructure   Deploy  Strategy   0110010001100101011100100111000001101100011011110111100100100001   LB   Web01-­‐A   Web02  –A   Web01-­‐B   Web02-­‐B  
  6. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Debs  +  freight  +  apache   •  Configura?on  with  Puppet  w/  PuppetMasters   •  Bash  +  ssh  will  take  you  a  long  way.     “Agile”  deploy  process   011010000110000100101100001000000110000101100111011010010110110001100101  
  7. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Loadbalancing  Puppet  didn’t  help   •  Networks  already  saturated   Deploy  Strategies   No  down?me  deploys   LB   Puppet1   Puppet2   Puppet3   Puppet4  
  8. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Network  devices  got  in  the  way   •  Timeouts,  connec?on  limits,  link  satura?on…   ConsideraHons     Things  that  broke…  
  9. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Is  99.9%  system  online  good  or  bad?     – 10,000*.001  =  10  down  nodes   – Constantly  fixing   •  Build  your  system  to  deal  with  breakage   – Autohealing  will  help  you  stay  sane   Deal  with  Breakage   011000100111010101110011011101000110010101100100001011100010111000101110  
  10. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        11   Improving  the  Deploy  Mechanism   01101000011001010110110001110000001000000110110101100101   •  Easy  enough  with  100  nodes   •  Now  scale  to  1,000…  then  10,000     •  OS  independent   •  Build  for  10x  more  
  11. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        • Switched  from   Debian  packages  to   virtual   environments   Package   • Torrent  for   package,  pssh  for   fact  files,  and   mcollec?ve  for   ac?ons   Distribute   • From  centralized   puppet  master  to   decentralized   masterless  puppet   Execute   12   Improving  the  Deploy  Mechanism   Deploying  from  OpenStack  Trunk  
  12. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Packages  away…   01101001001111000011001101110101   • Switched  from   Debian  packages  to   virtual   environments   Package   •  OS  independent   •  Easier  on  developers   •  Portable  
  13. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Torrents  save  bandwidth   •  Pssh  for  quick  interac?ons   •  Mcollec?ve  for  lots  of  ac?ons   Pushing  bits   011001110110111101100111011011110110011101101111   • Torrent  for   package,  pssh  for   fact  files,  and   mcollec?ve  for   ac?ons   Distribute  
  14. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  S?ll  need  configura?on  management   •  Configura?ons  reside  on  the  nodes   •  Less  infrastructure  to  maintain   Configure  and  Execute   0111001101100001011101100110010100100000011001100110010101110010011100100110100101110011   • From  centralized   puppet  master  to   decentralized   masterless  puppet   Execute  
  15. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Metrics   01101000011101000111010001110000011001010110010101100101  
  16. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Everyone  should  be  able  to  deploy   01100100011011110110111001110100011000100110111101100111011000010111001001110100  
  17. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Don’t  make  it  hard   01101101011001010110111101110111   •  Make  it  easy  for  anyone  to  deploy   •  Automated  build  tools  do  this  well   •  (we  use  Jenkins)  
  18. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  We  build  Infrastructure  as  a  Service   •  Built  like  an  applica?on   – Transac?onal     – “S?cky”  data   •  Deploy  it  like  a  website   SHll  a  ways  to  go…   0110110101101111011100100110010100111111  
  19. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        •  Reduced  deploy  ?me  from  hours  to  minutes   •  Most  ?me  spent  tes?ng     •  Deployment  tools  are  part  of  the  product,  not   an  aoerthought   End  Results?   011100110110110001100101011001010111000000100001  
  20. RACKSPACE®  HOSTING        |        WWW.RACKSPACE.COM

        Fin     [email protected]   director,  infrastructure  engineering   @paulvx