This talk was given at SREcon EMEA 22, in Amsterdam: https://www.usenix.org/conference/srecon22emea/presentation/sinjakli
Service Level Objectives (SLOs) are a familiar topic in SRE circles. They provide a framework for measuring and thinking about the reliability of a service in terms of a percentage of successful operations, such as HTTP requests.
That key strength of SLOs - viewing reliability as a percentage game - can also also be a weakness. Within that framing, there are certain solutions we're likely to overlook.
This talk explores another lens for reliability - one that's complementary to SLOs: structuring software in a way that rules out entire classes of problem.
We'll explore this idea via three worked examples, and finish with some concrete take-aways, including how to spot problems that fit this shape.