Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SRE Essentials

Jason Hand
December 03, 2018

SRE Essentials

The basics on Site Reliability Engineering.
First delivered at Slush pre-event (Microsoft Flux - 12/3)

Jason Hand

December 03, 2018
Tweet

More Decks by Jason Hand

Other Decks in Technology

Transcript

  1. Tip#1 @jasonhand | #microsoftxslush • Observability = Reality • Set

    Service Level Expectations • Measure & report against specific expectations
  2. Service Level Indicator (SLI) (represented as a ratio or …%

    proportion) # of successful HTTP calls / # of HTTP calls # of operations that completed in < 10ms / # of operations # of “full quality responses” / # of responses # of records processed / # of records ratio X 100 = % proportion @jasonhand | #microsoftxslush
  3. Service Level Objective (SLO) HTTP requests (as reported by the

    load balancer) 95% 30-day (example) SLI @jasonhand | #microsoftxslush
  4. @jasonhand | #microsoftxslush / year / quarter / month /

    week / day / hour 99% 3.65 days 21.6 hours 7.2 hours 1.68 hours 14.4 minutes 36 seconds 99.9% 8.76 hours 2.16 minutes 43.2 seconds 10.1 minutes 1.44 minutes 3.6 seconds 99.99% 52.6 minutes 12.96 minutes 4.32 minutes 60.5 seconds 8.64 seconds 0.36 seconds 99.999% 5.26 minutes 1.30 minutes 25.9 seconds 6.05 seconds 0.87 seconds 0.04 seconds 9’s appropriate?
  5. @jasonhand | #microsoftxslush Tip#2 ”Organizations which design systems ... are

    constrained to produce designs which are copies of the communication structures of these organizations."
  6. Game Days Using knowledge and structured plans routinely to rehearse

    incident response. Expanding and improving the current baseline. @jasonhand | #microsoftxslush Prepare - Rehearse - Respond
  7. reliability up what keeps you up at night? SLI SLO

    Error Budgets critically You @jasonhand | #microsoftxslush
  8. At Microsoft, we're aspiring to have a living, learning culture

    with a growth mindset that allows us to learn from ourselves and our customers. These are the key attributes of the new culture at Microsoft, and I feel great about how it seems to be resonating and how it's seen as empowering. @jasonhand | #microsoftxslush