Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling puppet

Avatar for Pascal Hahn Pascal Hahn
September 27, 2012

Scaling puppet

Scaling a multi-tenant puppet-system from a technical and organizational side

Avatar for Pascal Hahn

Pascal Hahn

September 27, 2012
Tweet

Other Decks in Technology

Transcript

  1. Pascal  Hahn [email protected]  /  [email protected] Head  of  Automated  Deployment Nokia

     Loca?on  and  Commerce   Scaling  a  mul+-­‐tenant  puppet-­‐ system  from  a  technical  and   organiza+onal  side 1
  2. -­‐ Head  of  Thor  team  (Nokia’s  Automated  Deployment) -­‐ At

     Nokia  for  about  a  year  and  a  half -­‐ Started  as  Principal  Engineer,  got  sucked  into  Management -­‐ Before  Nokia,  Site  Reliability  Engineer  at  Google,  also   developing  Automa?on  (no  puppet) Who  am  I 2
  3. What  does  Nokia  do  with  puppet? -­‐ Deploy  wide  range

     of  services,  mostly  focused  around  loca?on -­‐ Mul?ple  environments  and  datacenters  around  the  world  (labs,   produc?on,  tes?ng,  development,  …) -­‐ Strict  versioning  and  isola?on  using  environments  +  ENC -­‐ Deployment  as  code,  lots  of  automated  tes?ng  to  make  sure  the   code  works -­‐ Custom  func?ons  –  for  example:  encrypt  confiden?al  data -­‐ RESTful  API  to  manage  yum  repositories  globally -­‐ RESTful  API  to  change  puppet  configura?on 5
  4. -­‐ Services  deployed  to  Legacy  system  in  various  ways,  massive

     pain -­‐ Developed  Puppet-­‐system  V1,  became  unmaintainable  quickly -­‐ Lots  of  thinking -­‐ Developed  Puppet-­‐system  V2  (=Thor) -­‐ Launched  in  06/2011 Time-­‐lapse 6
  5. -­‐ ~120  different  internal  customer-­‐teams  (~90  in  june) -­‐ Customers

     all  over  the  globe -­‐ Machine  fleet  exploded -­‐ Went  from  underdog-­‐solu?on  to  de-­‐facto  standard If  you  build  it  they  will  come 8
  6. -­‐ 1  primary  puppet-­‐master  per  cell -­‐ Listens  on  8140

     and  loadbalances  compiles -­‐ >=  3  puppetmasters  total -­‐ Cert  signing  on  primary  master -­‐ Slaves  sync  certs  from  master Catalog  compiles  are  expensive 11
  7. -­‐ 8  x  16  hypervisors  maximum -­‐ Standardized  hardware -­‐

    Standardized  network-­‐setup -­‐ Repeatable  and  testable  setup Fix-­‐size  `scaling-­‐units`  /  cells 12
  8. -­‐ 2  jenkins  jobs  per  project,  per  release -­‐ Manual

     work  for  new  projects  and  releases -­‐ Results  in  lots  of  clicking  in  Jenkins  and  confused  users Build  Pipeline 14
  9. -­‐ Default  set  of  nagios  alerts,  on  every  host -­‐

    Machine  metrics  in  trending  system  (Graphite  +  collectd) -­‐ Default  logs  in  Splunk Basic  integra+on  comes  for  free 23
  10. -­‐ Apache  module  fully  integrated  with  shared  services -­‐ PHP

     and  Tomcat  base  projects -­‐ Modules  are  security-­‐team  approved Standard  building-­‐blocks 27
  11. -­‐ Hiring  puppet  or  ruby  (systems)  specialists  is  hard -­‐

    Hire  smart  ,  generalist,  system  engineers Hiring 29
  12. -­‐ Subject-­‐maier-­‐experts  are  good -­‐ As  long  as  they  share

     knowledge -­‐ Whole  team  needs  to  know  all  pieces 1  Area    ==  1  Developer 30
  13. -­‐ High  quality  documenta?on -­‐ Users-­‐help-­‐users -­‐ IRC  is  great!

    -­‐ Onsite,  face-­‐2-­‐face  trainings Support  is  taking  a  lot  of  +me 31
  14. -­‐ Open-­‐source  core  components,  build  pipeline  and  pm  manager  1.

    -­‐ Revisit  tes?ng  methodology -­‐ Introduce  permission  model  in  backend  /  replace  legacy  backend -­‐ Wrap  up  AWS  support -­‐ Hire  more  engineers  (Nokia  is  hiring:  hip://devops.nokia.com) Next  steps 32