Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Observability from the Panopticon: Measuring What Matters

Observability from the Panopticon: Measuring What Matters

If microservice falls down in the middle of a server farm, does my pager make a sound? Hopefully, the answer is “yes!”. But all too often, services can become partially degraded in ways that are difficult to predict - and therefore difficult to monitor proactively. How can we develop the confidence that the services we develop are instrumented for observability in the right places - the parts which actually matter - so that we're alerted quickly to problems that arise and have enough information to resolve those problems?

We'll look at a framework for modeling interdependent systems so we can understand how to identify the areas of our code that need to be instrumented. By isolating these key components, we'll ensure that we are writing software designed for resiliency.

Aditya Mukerjee

October 04, 2019

More Decks by Aditya Mukerjee

Other Decks in Technology


  1. Observability measures how well internal states of a system can

    be inferred from knowledge of its external outputs @chimeracoder
  2. 1. What should I observe or monitor? 2. How do

    I measure and monitor those things? 3. What do we enable using this framework for observability? @chimeracoder
  3. Disclaimer: Surveillance of people has different ethical properties from surveillance

    of software microservices! @chimeracoder (Don’t build tech for human rights abusers!)
  4. In the panopticon, nobody knows if they’re being watched, so

    everyone behaves as if they’re always being watched @chimeracoder
  5. Panopticon-style observability @chimeracoder We can’t observe all actions…. …but if

    we choose the right subset of actions to observe, we can have high confidence that everything is doing its job
  6. Let’s Create an API •Return a list of all Twitter

    followers •Record a copy to the database •Distributed! @chimeracoder API API API DB
  7. Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators:

    Data used to evaluate the SLA Service-Level Objective: What we target internally
  8. Service Indicators •Rate: Number of requests received •Errors: Number of

    responses written, broken down by HTTP status •Duration: Distribution of response latency @chimeracoder
  9. @chimeracoder Define an SLA for every behavior your clients rely

    on… …then apply this recursively, for behaviors you rely on
  10. Define and measure your service indicator metrics, based on the

    externally-observable behaviors your users will notice @chimeracoder
  11. We can’t observe everything @chimeracoder But if we choose and

    observe the right indicators, that’s enough