Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What should wake you up at night?

D52a081f2f45871ab197ec858e06fb33?s=47 bob
October 18, 2017

What should wake you up at night?

On-Call can be painful but it doesn't have to be.
Learn how modern practices can make on-call better for everyone.

bob has been doing on-call for over 15 years and has opinions about what should
wake you up and who should be woken up. He intends to share them with you.



October 18, 2017

More Decks by bob

Other Decks in Technology


  1. bob walker Head of Web Operations Government Digital Service @rjw1

  2. GDS My name should be lowercased e.g bob walker http://randomness.org.uk/branding/

  3. GDS My pronouns are he/him or they/them.

  4. A long time ago in company far, far away...

  5. GDS

  6. GDS Copyright : Alex Howarth

  7. What should wake you up at night?

  8. nothing

  9. Exit pursued by bears

  10. No wait! I suppose I should explain that.

  11. Ways to not get woken up

  12. Shifts

  13. GDS 4 shifts of 8 hours So you can have

  14. GDS Just say no! “Long-term night shift work is associated

    with an increased risk of certain cancers, as well as metabolic problems, heart disease, ulcers, gastrointestinal problems and obesity. ... People who work night shifts or rotating shifts also often don't sleep enough, and long-term sleep deprivation is known to be bad for health.” - https://sleepfoundation.org/shift-work/content/living-coping-shift-work-disorder
  15. Follow the Sun

  16. GDS

  17. GDS If you are big enough this is a good

  18. Self healing systems

  19. GDS https://www.flickr.com/photos/johnclare/7124089493/

  20. GDS Automatic restarts: • daemontools • monit • God •

    SMF • systemd
  21. GDS Clusters: • mongodb • elasticsearch

  22. GDS Auto scaling groups

  23. GDS Schedulers: • Mesos • Kubernetes • ECS

  24. What if you can’t do all these?

  25. Be selective!

  26. GDS What 5 things would you monitor and alert on

    first if you had no monitoring?
  27. GDS • disk

  28. GDS • disk • CPU

  29. GDS • disk • CPU • Memory

  30. GDS • disk • CPU • Memory • A couple

    of other things which don’t matter to the user
  31. GDS https://www.flickr.com/photos/benterrett/17936132731

  32. Monitor your user journey

  33. GDS • Availability • Response times • Error rates

  34. GDS Legal reasons

  35. GDS Life or death situations

  36. Who?

  37. GDS Which team should get called? • Ops team •

    Dev team • DevOps team
  38. GDS Trick question!

  39. GDS DevOps is a culture not a team or job

  40. Everyone!

  41. “Always two there are” - Yoda

  42. GDS We’re hiring - Web Operations Engineers - Technical Architects

    - Developers https://gds.blog.gov.uk/jobs/
  43. GDS No Questions! Come talk to me later. I have

  44. Thanks! bob walker @rjw1