$30 off During Our Annual Pro Sale. View Details »

How GOV.UK does on call

bob
May 19, 2016
160

How GOV.UK does on call

Talk given about how GOV.UK does on call at the first #humanops meetup http://www.meetup.com/HumanOps-London/events/229460050/

bob

May 19, 2016
Tweet

Transcript

  1. bob walker
    Head of Web Operations && \
    GOV.UK Infrastructure Product Owner
    Government Digital Service
    @rjw1

    View Slide

  2. How GOV.UK does on call.

    View Slide

  3. Who am I?

    View Slide

  4. GDS
    Over 17 years as a Sys Admin

    View Slide

  5. GDS
    Over 3 years on GOV.UK

    View Slide

  6. GDS
    Head of Web Operations community at GDS

    View Slide

  7. Is GOV.UK
    Scale?

    View Slide

  8. https://www.gov.uk/performance/site-activity GDS
    ~75 applications
    ~2000 req/s at edge at peak
    ~150 req/s at origin at peak
    ~13M Unique Users per week

    View Slide

  9. Is GOV.UK
    Important?

    View Slide

  10. How GOV.UK does on call.

    View Slide

  11. Second Line

    View Slide

  12. GDS
    2 technical members of staff
    dealing with:
    ● alerts
    ● incidents
    ● requests

    View Slide

  13. Second Line Charter

    View Slide

  14. GDS
    1. 2nd line is a learning experience, you should aim to learn more about
    how GOV.UK operates during the week
    2. Work together, be inclusive and don’t leave anyone out or alone
    3. It starts at 9:30, be there
    4. Feel free to go to essential meetings, but tell people ahead of time and
    see item 2
    5. Make small improvements to make it better for the next team (e.g.
    documentation, automation)
    6. There’s no such thing as a stupid question (but check the Ops Manual
    first)
    7. Help others and be patient - not everyone knows the same things
    8. Leave the desks clean and tidy when you finish your stint

    View Slide

  15. GDS
    28 people

    View Slide

  16. On Call

    View Slide

  17. GDS
    Currently 9 people

    View Slide

  18. GDS
    6 alerts will call people

    View Slide

  19. GDS
    VMs going down will call people

    View Slide

  20. GDS
    They probably shouldn’t

    View Slide

  21. GDS
    Mini incident reviews when we’re called

    View Slide

  22. GDS
    PagerDuty Drill

    View Slide

  23. Escalations

    View Slide

  24. GDS
    3 people

    View Slide

  25. GDS
    On-call escalation
    Major Technical Issues
    Emergency Publishing

    View Slide

  26. Incident Reviews

    View Slide

  27. Make things open: it
    makes things better

    View Slide

  28. Tools

    View Slide

  29. GDS
    ● Icinga
    ● graphite
    ● collectd
    ● PagerDuty
    ● Pingdom

    View Slide

  30. What could be better?

    View Slide

  31. https://gds.blog.gov.uk/jobs/ GDS
    We’re hiring!

    View Slide

  32. Thanks!
    bob walker
    @rjw1

    View Slide