Upgrade to Pro — share decks privately, control downloads, hide ads and more …

10-SRE, DevOps, Google and You by Björn Rabenstein

10-SRE, DevOps, Google and You by Björn Rabenstein

DevOps Gathering

March 13, 2019
Tweet

More Decks by DevOps Gathering

Other Decks in Programming

Transcript

  1. SRE? DevOps? Google and you? DevOps Gathering, Bochum – 2019-03-13

    Björn “Beorn” Rabenstein, Production Engineer, SoundCloud Ltd.
  2. What is SRE? How does it relate to DevOps? But

    you are not Google so how does it apply to you?
  3. Scale Scale Scale Culture ~1B users ~10k engineers engineers >

    services it’s complicated ~100M users ~100 engineers engineers < services it’s complicated
  4. SRE is what happens when you ask a software engineer

    to design an operations team. Ben Treynor, Google Inc.
  5. Symptom-based alerting Operational underload Everything you always wanted to know

    about SRE but were afraid to ask DevOps vs. SRE Dickerson's hierarchy of service reliability Error budgets On-call ProdEng at SoundCloud Less than 50% ops work Postmortems
  6. Dickerson’s hierarchy of service reliability Site Reliability Engineering – How

    Google Runs Production Systems, B. Beyer et al. (ed.), O’Reilly 2016, p104
  7. “True” DevOps is if there are no separate dev and

    ops teams anymore, and not even designated dev or ops roles within a team. Björn Rabenstein & Matthias Rampke, SoundCloud Ltd.
  8. DevOps is a set of practices intended to reduce the

    time between committing a change to a system and the change being placed into normal production, while ensuring high quality. Len Bass, Ingo Weber, Liming Zhu: DevOps: A Software Architect's Perspective. DevOps is if you have CI/CD and run containers. J. Random Manager: During the last strategy meeting.
  9. CALMS (John Willis, Damon Edwards, Jez Humble) Culture Automation Lean

    (management or continuous improvement) Metrics Sharing [The first SRE book] explicitly references Culture, Automation, Metrics, and Sharing alongside anecdotes about Google’s journey to continuously improve. Andrew Clay Shafer: The Site Reliability Workbook.
  10. Ultimately, I know DevOps when I see it and I

    see SRE at Google, in theory and practice, as one of the most advanced implementations. Andrew Clay Shafer: Foreword II.
  11. The principles from the first SRE book align so well

    with what I always imagined DevOps to be, and the practices are insightful, even when they aren’t 100% applicable outside of Google. Andrew Clay Shafer: Foreword II.
  12. • Minimal size of an on-call rotation: ◦ 6 if

    following the sun. ◦ 8 otherwise. • Minimal size of a dedicated SRE team: 8. • Feasible percentage of all engineers in SRE: 5%? 10%? • Number of SRE teams SoundCloud could afford: 1.
  13. DevOps is the result of 5 principles: DevOps is ...

    … if every person uses the same tool for the same job … codified knowledge - everybody contributing their part to common automation … if all people have the same privileges in their tooling … if human error is equally possible for Dev and Ops … replacing people interfaces by automated decisions and processes … the result Schlomo Schapiro
  14. In Site Reliability Engineering, we did not make it sufficiently

    clear that product development teams in Google own their service by default. SRE is neither available nor warranted for the bulk of services, although SRE principles still inform how services are managed throughout Google. Chapter 1: How SRE relates to DevOps
  15. I have been waiting for this book ever since I

    left Google’s enchanted castle.It is the gospel I am preaching to my peers at work. Beorn’s praise for the 1st SRE book Finally, this volume and its predecessor are not intended to be gospel. Please don’t treat them that way. Even after all these years, we’re still finding conditions and cases that cause us to tweak (or in some cases, replace) previously firmly held beliefs. Preface of the Site Reliability Workbook