Operating distributed systems is hard, not only because of their inherent complexity of the number of components and their distribution but also because the unpredictability of their failures modes: it is plenty of unknown unknowns. We are left with an imperative to build systems that can be debugged, armed with evidence instead of conjecture.
Observability is the practice of understanding the internal state of a system via knowledge of its external outputs. In this talk, we will discuss observability practices, benefits, and opportunities. We’ll also explore observability as a part of the development process.