As we construct larger or more complex systems, failure and change are
ever-present. We need to accept and even embrace these tensions to
build software that works and keeps working.This is a talk on building
and operating reliable systems. We will look at how systems fail,
particularly in the face of complexity or scale, and build up a set of
principles and practices that will help us implement, understand and
verify reliable systems.