Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Large Scale Services - LISA 2013

Building Large Scale Services - LISA 2013

Jennifer Davis

November 08, 2013
Tweet

More Decks by Jennifer Davis

Other Decks in Technology

Transcript

  1. Define Core Principles 11/11/13   8   § Common    

    ›  CollaboraGon  across  teams,  companies,  industry,   define  standards   ›  Incident,  Problem,  Change,  Config,  Release   management   § DisGnct   ›  Specifics  to  an  applicaGon  or  service   ›  Availability,  Service,  Business  ConGnuity,  Capacity    
  2. Kill the Myths   11/11/13   10   § Stupid  User

      § System  Admin  ==  Operator    
  3. 11/11/13   11   Failing Gracefully puppet ruby SKILLS perl

    nosql operability security mysql unix TCP/ IP bash CHEF
  4. Kill the Myths   11/11/13   13   § Stupid  User

      § System  Admin  ==  Operator   § Words  have  a  common  universal  implicit  meaning      
  5. Team 11/11/13   17   § People  working  towards  common  goal.

      § Different  roles.     § Different  views.   § Same  objecGves.  
  6. Team 11/11/13   19   Sugges/on:  Don’t  talk  about  the

     “devs”  request,   talk  about  Elaine’s  request.    
  7. Team 11/11/13   20   Sugges/on:  Don’t  talk  about  the

     “devs”  request,   talk  about  Elaine’s  request.     Sugges/on:  Verify  that  your  team  has  the  same   vision.  
  8. Understand the vision. 11/11/13   21   §  Are  there

     other  opGons,  open  source  or  not  within  the  company?   §  Are  there  other  opGons  outside  the  company?   §  Is  EVERYONE  on  the  same  page  about  what  the  service  is?  
  9. Vision Statement 11/11/13   22   §  Clear  statement  about

     the  problem  that  the  service  is  solving.   ›  DirecGon   ›  IdenGty  management   ›  Team  cohesion   New  product?  Be  part  of  creaGng  that  vision!  
  10. Sherpa’s Vision 11/11/13   23   ..  Distributed  replicated  eventually

     consistent  key  value  store  that  had  a  focus   on  scalability  ..    
  11. My Job 11/11/13   24   §  Examine  soaware  

    §  Define  risk   §  Communicate  cost  of  risks     §  MiGgate  risks   §  IdenGfy  events   §  Manage  events  
  12. Change is inevitable 11/11/13   26   §  Products  pivot

     based  on  needs.   §  Requirements  change  and  evolve.   §  Know  core  issues.  
  13. Know Core Issues 11/11/13   28   §  Limit  the

     scope  of  focus.   §  Focus  on  the  biggest  prioriGes.    
  14. Know Core Issues 11/11/13   29   §  Limit  the

     scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?    
  15. Know Core Issues 11/11/13   30   §  Limit  the

     scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?   ›  IdenGfy  the  key  “Gme”  elements.    
  16. Know Core Issues 11/11/13   31   §  Limit  the

     scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?   ›  IdenGfy  the  key  “Gme”  elements.   ›  Talk  to  them.  IdenGfy  their  key  terms.  “Enhancements”,  “Defects”    
  17. Know Core Issues 11/11/13   32   §  Limit  the

     scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?   ›  IdenGfy  the  key  “Gme”  elements.   ›  Talk  to  them.  IdenGfy  their  key  terms.  “Enhancements”,  “Defects”   ›  Establish  the  “Top”  list.      
  18. Create checklists 11/11/13   33   §  Not  because  people

     are  dumb.   §  Not  only  because  of  automaGon.   §  When  things  break,  knowing  what  needs  focus.   §  During  normal  maintenance,  can  idenGfy  “not  OK”.   ›  Audit  checklists  for  deployment  through  staging  environment.  
  19. Know Outputs 11/11/13   34   §  IdenGfy  components.  

    §  Well  defined  protocols  between  components.   §  Expected  Inputs.   §  Expected  Outputs.  
  20. Know State Transitions Explicitly. 11/11/13   40   §  When

     component  is  installed  but  not  ready  
  21. Know State Transitions Explicitly. 11/11/13   41   §  When

     component  is  installed  but  not  ready   §  When  the  colo  is  going  away   §  Go  through  What  If  Scenarios.   ›  Document  them.  
  22. Know choke points explicitly. 11/11/13   42   §  Memory

      §  Disk   §  Bandwidth   Now  and  in  6  months.   JIT?  
  23. Failure will happen. 11/11/13   43   §  There  are

     no  0  failure  systems.   §   “Give  me  the  brain”  documentaGon  so  that  anyone  can  be  the  brain.   §  Repeatable/Reliable  failure  handling.   §  Run  fire  drills.  Really.    
  24. System Administration is Gardening. 11/11/13   45   §  No

     guarantee  of  resources.   §  Only  guarantee  is  change.  
  25. System Administration is Gardening. 11/11/13   46   §  Nurture

     relaGonships.   ›  Be  authenGc.   ›  Be  trusGng  and  trustworthy.   ›  Have  integrity.  
  26. 11/11/13   49   0 2 4 6 8 Jan

    Apr Jul Oct # of Support Engineers # of Support Engineers
  27. 11/11/13   50   0 1 2 3 4 5

    6 Jan Apr Jul Oct # of Support Engineers # of Support Engineers
  28. Documentation is not the cure. 11/11/13   52   § DocumentaGon

     doesn’t  guarantee   understanding.   ›  OperaGons  Sandbox  Environment   § Don’t  spend  Gme  at  the  end  documenGng.  
  29. Acknowledgements 11/11/13   56   •  hkp://www.flickr.com/photos/levork     • 

    hkp://www.flickr.com/photos/puggles   •  hkp://www.flickr.com/photos/byteorder   •  hkp://www.flickr.com/photos/egoant   •  hkp://www.flickr.com/photos/happymonkey   •  Kyle  LaGno     •  Greg  Connor