Observability from the Panopticon: Measuring What Matters

Observability from the Panopticon: Measuring What Matters Aditya Mukerjee Observability
Engineer at Stripe Asbury Agile October 2019

Observability measures how well internal states of a system can
be inferred from knowledge of its external outputs @chimeracoder

1. What should I observe or monitor? 2. How do
I measure and monitor those things? 3. What do we enable using this framework for observability? @chimeracoder

@chimeracoder

Disclaimer: Surveillance of people has different ethical properties from surveillance
of software microservices! @chimeracoder

Disclaimer: Surveillance of people has different ethical properties from surveillance
of software microservices! @chimeracoder (Don’t build tech for human rights abusers!)

In the panopticon, nobody knows if they’re being watched, so
everyone behaves as if they’re always being watched @chimeracoder

Panopticon-style observability @chimeracoder We can’t observe all actions…. …but if
we choose the right subset of actions to observe, we can have high confidence that everything is doing its job

Let’s Create an API •Return a list of all Twitter
followers •Record a copy to the database •Distributed! @chimeracoder API API API DB

99.99% of requests return HTTP 200 in <300ms @chimeracoder Is
this API healthy?

Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators:
Data used to evaluate the SLA

Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators:
Data used to evaluate the SLA Service-Level Objective: What we target internally

Service Indicators •Rate: Number of requests received •Errors: Number of
responses written, broken down by HTTP status •Duration: Distribution of response latency @chimeracoder

Every monitor involves a service-level indicator* @chimeracoder *for sufficiently broad
definitions of “service”

@chimeracoder Define an SLA for every behavior your clients rely
on… …then apply this recursively, for behaviors you rely on

Define and measure your service indicator metrics, based on the
externally-observable behaviors your users will notice @chimeracoder

What does the panopticon approach enable? @chimeracoder

@chimeracoder Panopticon observability helps us set development priorities

Error budgets are bidirectional @chimeracoder Panopticon observability helps us set
development priorities

Panopticon observability helps us recover from failures @chimeracoder

Panopticon observability helps us understand how our systems actually work
@chimeracoder

Yes, there is such a thing as “too much reliability”
@chimeracoder

@chimeracoder

We can’t observe everything @chimeracoder But if we choose and
observe the right indicators, that’s enough

Thank you! Aditya Mukerjee @chimeracoder

Observability from the Panopticon: Measuring Wh...

Observability from the Panopticon: Measuring What Matters

Aditya Mukerjee

More Decks by Aditya Mukerjee

Other Decks in Technology

Featured

Transcript