Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CERN: Accelerating Science with Puppet

Puppet Labs
October 25, 2012

CERN: Accelerating Science with Puppet

"Accelerating Science with Puppet" by Tim Bell of CERN, at PuppetConf 2012.

Follow along with the video here: http://bit.ly/TYwftk

Watch PuppetConf videos you missed out on here: http://www.puppetlabs.com/community/videos/puppetconf

Abstract: This talk will review CERN's objectives and how the computing infrastructure is evolving to address the challenges at scale using community supported software such as Puppet and OpenStack.

Speaker Bio: Tim Bell is responsible for the CERN IT Operating System and Infrastructure Group which supports Windows, Mac and Linux across the site along with virtualisation, E-mail and web services. These systems are used by over 10,000 scientists researching fundamental physics, finding out what the Universe is made of and how it works. Prior to working at CERN, Tim worked for Deutsche Bank managing private banking infrastructure in Europe and for IBM as a Unix kernel developer and deploying large scale technical computing solutions.

Learn more about Puppet:
www.puppetlabs.com

Be the first to know about PuppetConf 2013:
http://info.puppetlabs.com/puppetconf2013-notification.html

Puppet Labs

October 25, 2012
Tweet

More Decks by Puppet Labs

Other Decks in Technology

Transcript

  1. Accelera'ng  Science
     
    with  Puppet
     
    Tim  Bell
     
    [email protected]
     
    @noggin143
     
     
    PuppetConf  San  Francisco
     
    28th  September  2012
     
    1  
    PuppetConf  2012   Tim  Bell,  CERN  

    View Slide

  2. What  is  CERN  ?
     
    PuppetConf  2012   Tim  Bell,  CERN   2  
    •  Conseil  Européen  pour  la  
    Recherche  Nucléaire  –  aka  
    European  Laboratory  for  
    Par'cle  Physics  
    •  Between  Geneva  and  the  
    Jura  mountains,  straddling  
    the  Swiss-­‐French  border  
    •  Founded  in  1954  with  an  
    interna'onal  treaty  
    •  Our  business  is  
    fundamental  physics  ,  what  
    is  the  universe  made  of  and  
    how  does  it  work  

    View Slide

  3. PuppetConf  2012   Tim  Bell,  CERN   3  
    Answering  fundamental  ques'ons…  
    •  How  to  explain  par'cles  have  mass?  
    We  have  theories  and  accumula'ng  experimental  evidence..  Ge[ng  close…  
     
    •  What  is  96%  of  the  universe  made  of  ?  
    We  can  only  see  4%  of  its  es'mated  mass!  
     
    •  Why  isn’t  there  an'-­‐ma`er  
    in  the  universe?  
    Nature  should  be  symmetric…  
     
    •  What  was  the  state  of  ma`er  just  
    aber  the  «  Big  Bang  »  ?  
    Travelling  back  to  the  earliest  instants  of  
    the  universe  would  help…  

    View Slide

  4. Community  collabora'on  on  an  interna'onal  scale
     
    Tim  Bell,  CERN   4  
    PuppetConf  2012  

    View Slide

  5. The  Large  Hadron  Collider  
    Tim  Bell,  CERN   5  
    PuppetConf  2012  

    View Slide

  6. PuppetConf  2012   Tim  Bell,  CERN   6  

    View Slide

  7. LHC  construc'on
     
    PuppetConf  2012   Tim  Bell,  CERN   7  

    View Slide

  8. 8
    The  Large  Hadron  Collider  (LHC)  tunnel  
    PuppetConf  2012   Tim  Bell,  CERN  

    View Slide

  9. PuppetConf  2012   Tim  Bell,  CERN   9  

    View Slide

  10. Superconduc'ng  magnets  –  October  2008
     
    PuppetConf  2012   Tim  Bell,  CERN   10  
    A  faulty  connec'on  between  two  superconduc'ng  magnets  led  to  the  release  of  a  
    large  amount  of  helium  into  the  LHC  tunnel  and  forced  the  machine  to  shut  down  
    for  repairs  for  one  year  

    View Slide

  11. Accumula'ng  events  in  2009-­‐2011
     
    PuppetConf  2012   Tim  Bell,  CERN   11  

    View Slide

  12. PuppetConf  2012   Tim  Bell,  CERN   12  

    View Slide

  13. Heavy  Ion  Collisions
     
    PuppetConf  2012   Tim  Bell,  CERN   13  

    View Slide

  14. PuppetConf  2012   Tim  Bell,  CERN   14  

    View Slide

  15. PuppetConf  2012   Tim  Bell,  CERN   15  
    Tier-­‐1  (11  centres):  
    • Permanent  storage  
    • Re-­‐processing  
    • Analysis  
    Tier-­‐0  (CERN):  
    • Data  recording  
    • Ini'al  data  reconstruc'on  
    • Data  distribu'on  
    Tier-­‐2    (~200  centres):  
    •   Simula'on  
    •   End-­‐user  analysis  
    •  Data  is  recorded  at  CERN  and  Tier-­‐1s  and  analysed  in  the  Worldwide  LHC  Compu'ng  Grid  
    •  In  a  normal  day,  the  grid  provides  100,000  CPU  days  execu'ng  1  million  jobs  

    View Slide

  16. PuppetConf  2012   Tim  Bell,  CERN   16  
    •  Data  Centre  by  Numbers  
    –  Hardware  installa'on  &  re'rement  
    •  ~7,000  hardware  movements/year;  ~1,800  disk  failures/year  
    Xeon  
    5150  
    2%  
    Xeon  
    5160  
    10%  
    Xeon  
    E5335  
    7%  
    Xeon  
    E5345  
    14%  
    Xeon  
    E5405  
    6%  
    Xeon  
    E5410  
    16%  
    Xeon  
    L5420  
    8%  
    Xeon  
    L5520  
    33%  
    Xeon  
    3GHz  
    4%  
     Fujitsu  
    3%  
     Hitachi  
    23%  
     HP  
    0%  
     Maxtor  
    0%  
     Seagate  
    15%  
     Western  
    Digital  
    59%  
    Other  
    0%  
    High  Speed  Routers  
    (640  Mbps  →  2.4  Tbps)  
    24  
    Ethernet  Switches   350  
    10  Gbps  ports   2,000  
    Switching  Capacity   4.8  Tbps  
    1  Gbps  ports   16,939  
    10  Gbps  ports   558  
    Racks   828  
    Servers   11,728  
    Processors   15,694  
    Cores   64,238  
    HEPSpec06   482,507  
    Disks   64,109  
    Raw  disk  capacity  (TiB)   63,289  
    Memory  modules   56,014  
    Memory  capacity  (TiB)   158  
    RAID  controllers   3,749  
    Tape  Drives   160  
    Tape  Cartridges   45,000  
    Tape  slots   56,000  
    Tape  Capacity  (TiB)   73,000  
    IT  Power  Consump^on   2,456  KW  
    Total  Power  Consump^on   3,890  KW  

    View Slide

  17. Our  Challenges  -­‐  Data  storage
     
    PuppetConf  2012   Tim  Bell,  CERN   17  
    •  25PB/year  to  record  
    •  >20  years  reten'on  
    •  6GB/s  average  
    •  25GB/s  peaks  

    View Slide

  18. PuppetConf  2012   Tim  Bell,  CERN   18  

    View Slide

  19. PuppetConf  2012   Tim  Bell,  CERN   19  
    45,000  tapes  holding  73PB  of  physics  data  

    View Slide

  20. New  data  centre  to  expand  capacity  
    PuppetConf  2012   Tim  Bell,  CERN   20  
    •  Data  centre  in  
    Geneva  reaches  limit  
    of  electrical  capacity  
    at  3.5MW  
    •  New  centre  chosen  in  
    Budapest,  Hungary  
    •  Addi'onal  2.7MW  of  
    usable  power  
    •  Hands  off  facility  
    •  Deploying  from  2013  

    View Slide

  21. Time  to  change  strategy  
    •  Ra'onale  
    –  Need  to  manage  twice  the  servers  as  today  
    –  No  increase  in  staff  numbers  
    –  Tools  becoming  increasingly  bri`le  and  will  not  scale  as-­‐is  
    •  Approach  
    –  We  are  no  longer  a  special  case  for  compute  
    –  Adopt  an  open  source  tool  chain  model  
    –  Strong  engineering  skills  allows  rapid  adop'on  of  new  technologies  
    •  Evaluate  solu'ons  in  the  problem  domain  
    •  Iden'fy  func'onal  gaps  and  challenge  them  
    –  Contribute  new  func'on  back  to  the  community  
    PuppetConf  2012   Tim  Bell,  CERN   21  

    View Slide

  22. Building  Blocks
     
    PuppetConf  2012   Tim  Bell,  CERN   22  
    Bamboo
    Koji, Mock
    AIMS/PXE
    Foreman
    Yum repo
    Pulp
    Puppet-DB
    mcollective, yum
    JIRA
    Lemon /
    Hadoop
    git
    OpenStack
    Nova
    Hardware
    database
    Puppet
    Active Directory /
    LDAP

    View Slide

  23. Training  and  Support  
    •  Buy  the  book  rather  than  guru  mentoring  
    •  Newcomers  are  rapidly  produc've  (and  oben  know  more  than  us)  
    •  Community  and  Enterprise  support  means  we’re  not  on  our  own  
    PuppetConf  2012   Tim  Bell,  CERN   23  

    View Slide

  24. Staff  Mo'va'on  
    •  Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts  
    end  
    PuppetConf  2012   Tim  Bell,  CERN   24  

    View Slide

  25. Prepare  the  move  to  the  clouds  
    •  Improve  opera'onal  efficiency  
    –  Machine  recep'on  and  tes'ng  
    –  Hardware  interven'ons  with  long  running  programs  
    –  Mul'ple  opera'ng  system  demand  
    •  Improve  resource  efficiency  
    –  Exploit  idle  resources,  especially  wai'ng  for  tape  I/O  
    –  Highly  variable  load  such  as  interac've  or  build  machines  
    •  Improve  responsiveness  
    –  Self-­‐Service  
    –  Coffee  break  response  'me  
    PuppetConf  2012   Tim  Bell,  CERN   25  

    View Slide

  26. Service  Model
     
    PuppetConf  2012   Tim  Bell,  CERN   26  
    •  Pets are given names like
    pussinboots.cern.ch
    •  They are unique, lovingly hand raised
    and cared for
    •  When they get ill, you nurse them back
    to health
    •  Cattle are given numbers like
    vm0042.cern.ch
    •  They are almost identical to other cattle
    •  When they get ill, you get another one
    •  Future application architectures tend towards Cattle but Pets
    with configuration management are also viable

    View Slide

  27. OpenStack
     
    PuppetConf  2012   Tim  Bell,  CERN   27  
    •  Open  source  cloud  run  by  an  independent  founda'on  
    with  over  6,000  members  from  850  organisa'ons  
    •  Started  in  2010  but  maturing  rapidly  with  public  cloud  
    services  from  Rackspace,  HP  and  Ubuntu    
    Pla'num  Members  

    View Slide

  28. Many  OpenStack  Components  to  Configure
     
    PuppetConf  2012   Tim  Bell,  CERN   28  
    Compute Scheduler
    Network
    Volume
    Registry Image
    KEYSTONE
    HORIZON
    NOVA  
    GLANCE

    View Slide

  29. When  communi'es  combine…  
    •  OpenStack’s  many  components  and  op'ons  make  
    configura'on  complex  out  of  the  box  
    •  Puppet  forge  module  from  PuppetLabs  (Thanks,  Dan  Bode)  
    •  The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  
     
    PuppetConf  2012   Tim  Bell,  CERN   29  

    View Slide

  30. Scaling  up  with  Puppet  and  OpenStack  
    •  Use  [email protected]  based  on  BOINC  for  simula'ng  magne'cs  
    guiding  par'cles  around  the  LHC  
    •  Naturally,  there  is  a  puppet  module  puppet-­‐boinc  
    •  1000  VMs  spun  up  to  stress  test  the  hypervisors  with  Puppet,  
    Foreman  and  OpenStack  
    PuppetConf  2012   Tim  Bell,  CERN   30  

    View Slide

  31. Next  Steps  
    •  Expand  tool  chain  
    –  Mcollec've  
    –  Puppet-­‐DB  
    •  Deploy  at  scale  in  produc'on  
    –  Move  towards  15,000  hypervisors  over  next  two  years  
    –  Ex'mate  100-­‐300,000  virtual  machines  
    •  Work  with  labs  on  common  solu'ons  for  scien'fic  compu'ng  
    –  Batch  system  configura'ons  
    –  Grids  
    –  Publishing  to  h`p://github.com/cernops  
    •  Inves'gate  desktop  and  device  management  
    –  Linux  desktops  
    –  Macs  
    –  KVMs,  PDUs  
    PuppetConf  2012   Tim  Bell,  CERN   31  

    View Slide

  32. Final  Thoughts  
    PuppetConf  2012   Tim  Bell,  CERN   32  
    •  A  small  project  to  share  documents  at  
    CERN  in  the  ‘90s  created  the  massive  
    phenomenon  that  is  today’s  world  wide  
    web  
    •  Open  Source  
    •  Vibrant  community  and  eco-­‐system  
    •  Working  with  the  Puppet  and  OpenStack  
    communi'es  has  shown  the  power  of  
    collabora'on    
    •  We  have  built  a  toolchain  in  one  
    year  with  part  'me  resources  
    •  Running  15,000  servers  and  up  to  
    300,000  VMs  is  scary  but  achievable  
    •  Looking  forward  to  further  contribu'ons  
    as  we  move  to  large  scale  deployment  

    View Slide

  33. For  more  details,  see  Ben  Jones’  talk  at  15:50  today  
    Configura'on  Management  at  CERN  –  From  
    Homegrown  to  Industry  Standard  
    Tim  Bell  

    View Slide

  34. References
     
    PuppetConf  2012   Tim  Bell,  CERN   34  
    CERN   h`p://public.web.cern.ch/public/  
    Scien'fic  Linux   h`p://www.scien'ficlinux.org/  
    Worldwide  LHC  Compu'ng  Grid   h`p://lcg.web.cern.ch/lcg/  
    h`p://rtm.hep.ph.ic.ac.uk/  
    Jobs   h`p://cern.ch/jobs  
    Detailed  Report  on  Agile  Infrastructure   h`p://cern.ch/go/N8wp  

    View Slide

  35. Backup  Slides  
    PuppetConf  2012   Tim  Bell,  CERN   35  

    View Slide

  36. CERN’s  tools  
    •  The  world’s  most  powerful  accelerator:  LHC  
    –  A  27  km  long  tunnel  filled  with  high-­‐tech  instruments  
    –  Equipped  with  thousands  of  superconduc'ng  magnets  
    –  Accelerates  par'cles  to  energies  never  before  obtained  
    –  Produces  par'cle  collisions  crea'ng  microscopic  “big  bangs”  
    •  Very  large  sophis'cated  detectors  
    –  Four  experiments  each  the  size  of  a  cathedral  
    –  Hundred  million  measurement  channels  each  
    –  Data  acquisi'on  systems  trea'ng  Petabytes  per  second  
    •  Top  level  compu'ng  to  distribute  and  analyse  the  data  
    –  A  Compu'ng  Grid  linking  ~200  computer  centres  around  the  globe  
    –  Sufficient  compu'ng  power  and  storage  to  handle  25  Petabytes  per  
    year,  making  them  available  to  thousands  of  physicists  for  analysis  
    PuppetConf  2012   Tim  Bell,  CERN   36  

    View Slide

  37. Our  Infrastructure  
    •  Hardware  is  generally  based  on  commodity,  white-­‐box  servers  
    –  Open  tendering  process  based  on  SpecInt/CHF,  CHF/Wa`  and  GB/CHF  
    –  Compute  nodes  typically  dual  processor,  2GB  per  core  
    –  Bulk  storage  on  24x2TB  disk  storage-­‐in-­‐a-­‐box  with  a  RAID  card  
    •  Vast  majority  of  servers  run  Scien'fic  Linux,  developed  by  
    Fermilab  and  CERN,  based  on  Redhat  Enterprise  
    –  Focus  is  on  stability  in  view  of  the  number  of  centres  on  the  WLCG  
    PuppetConf  2012   Tim  Bell,  CERN   37  

    View Slide

  38. New  architecture  data  flows  
    PuppetConf  2012   Tim  Bell,  CERN   38  

    View Slide

  39. OpenStack
     
    PuppetConf  2012   Tim  Bell,  CERN   39  
    Gold  Members  

    View Slide