Google’s Site Reliability Engineering books lay out the principles and practices of SRE and the workbooks provide great practical examples of implementing these practices. However, anyone who has tried to roll out such practices across an organisation will no doubt have run into some hurdles.
In this talk we will dig into how RVU got started on their SRE journey. From what prompted the initial discussions, to how we rolled out new tooling to automate away some of the pains of adoption. We’ll cover the interfaces we built to engage with teams and what other possibilities we see in the future of our SRE automation journey.
- Data-driven conversations and visibility are the key ingredients in winning over teams and ultimately operating systems more reliably
- Rolling out SRE practices is difficult, find ways to make this simpler for all involved to adopt
- Automation is for everyone and should reach far beyond just deployments and infrastructure