Slide 1

Slide 1 text

How To assess SRE Status Quo through Maturity Models

Slide 2

Slide 2 text

Yury Niño Roa Cloud Infrastructure Engineer SRE Volunteer & Chaos Engineer Advocate. Love networking, devops, software, reading, writing and teaching. @yurynino - www.yurynino.dev

Slide 3

Slide 3 text

How To assess SRE Status Quo through a Maturity Model @yurynino - www.yurynino.dev

Slide 4

Slide 4 text

How To assess Site Reliability Engineering Status Quo through a Maturity Model @yurynino - www.yurynino.dev

Slide 5

Slide 5 text

Agenda Revisiting SRE with a Case Study Maturity Models in the spectrum What about a Maturity Model in Google 01 02 03 @yurynino - www.yurynino.dev

Slide 6

Slide 6 text

Waze: Were we an SRE Mature Organization?

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

● Waze is a community-based navigation app acquired by Google in 2013. ● The growth introduced many challenges. ● Waze’s startup ethos led them to meet these challenges with a grassroots technical response. ● When Waze grow, their message queueing system regressed badly, leading to increasingly frequent and severe outages. The Messaging Queue: Replacing a System While Maintaining Reliability @yurynino - www.yurynino.dev

Slide 9

Slide 9 text

● Because SRE was also responsible for building the software, this operational load had a noticeable impacted. ● Engineers convinced leadership to reevaluate priorities and dedicate some engineering time to reliability work. ● SREs and Software Engineers formed a strategic vision of a future where a new custom-built solution was implemented. @yurynino - www.yurynino.dev

Slide 10

Slide 10 text

Lessons Learned Assess where you are and design a strategic vision is the first step. Theories and maturity models are not a waste of time! Dedicate time and prioritize the reliability work was key to face the early challenges. @yurynino - www.yurynino.dev

Slide 11

Slide 11 text

Because SRE Journey doesn’t happen all at once, having a master plan is invaluable. @yurynino - www.yurynino.dev

Slide 12

Slide 12 text

Based on the mentioned considerations in assessing a product delivery organization on various aspects related to production operations, it is possible to create a maturity model for SRE. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. @yurynino - www.yurynino.dev

Slide 13

Slide 13 text

● Where is my Organization? ● Where is my Team? ● Where is the Tech? ● Where is Culture? ● Where is Process? @yurynino - www.yurynino.dev

Slide 14

Slide 14 text

BCG SRE Maturity Model @yurynino - www.yurynino.dev

Slide 15

Slide 15 text

https://bcgplatinion.com/ SRE Maturity Model by BGP

Slide 16

Slide 16 text

Vladyslav’s SRE Maturity Model @yurynino - www.yurynino.dev

Slide 17

Slide 17 text

Advanced Beginner Organization People Tech Culture Process Regressive SRE Maturity Model by Vladyslav Ukis https://www.amazon.com/Establishing-Foundations-Step-Step-Organizations/dp/0137424604 Collaboration Toleration Alerting SRE Role Haphazard Automation Monitoring Crush Bureaucracy Ownership Failures Data Driven

Slide 18

Slide 18 text

Advanced Beginner Organization People Tech Culture Process Regressive SRE Maturity Model https://www.amazon.com/Establishing-Foundations-Step-Step- Organizations/dp/0137424604 by Vladyslav Ukis

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No free lunch! Maturity models have disadvantages!

Slide 21

Slide 21 text

What does Google say about SRE MMs? @yurynino - www.yurynino.dev

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Reliability Maturity Model at Google Operational Processes Risk Management Productivity System Complexity People Leadership Organization Attributes Visibility and Culture

Slide 24

Slide 24 text

Reliability Maturity Model at Google Operational Processes Risk Management Productivity System Complexity People Leadership Organization Attributes Visibility and Culture

Slide 25

Slide 25 text

Visionary Strategic Proactive Reactive Absent 1. Reliability is a secondary consideration. 2. Response to known reliability issues/risks are tied to recent outages. 3. Potential reliability risks are identified and addressed. 4. Classes of risks are managed and architecturally addressed. Reliability Maturity Model By Vartika Agarwal, Tracy Ferrell https://www.usenix.org/publications/loginonline/reliability-maturity-model 5. Organization has reached the highest order of reliability. @yurynino - www.yurynino.dev

Slide 26

Slide 26 text

Thank you! So much @yurynino www.yurynino.com