Observability

Making Observability Tangible with the help of test automation

(slides and references are available online) Why I f ind
the topic interesting O11y 101 (a brief intro and some ideas) A playground to experiment Agenda

https://unsplash.com/photos/time-lapse-photo-of-lighted-ferris-wheel-at-park-during-nighttime-WnADr2BG174

(Reenactment) A long time ago...

https://unsplash.com/photos/yellow-and-black-bird-on-brown-wooden-post-q77K0zIDTmI

https://unsplash.com/photos/photography-of-theater-chairs-e_RpjNyMgEM

https://web.archive.org/web/20110128133821/https://www.net f lix.com/

Please hold for a brief interruption https://unsplash.com/photos/white-and-red-plastic-packs-9FDI-_E29 f k

The o11y building blocks O`bservabilit | wc - l `y
• Metrics • Traces • Logs

Metrics • High rate of events, with low cardinality at
a reasonable cost • Gauges and counters (and distributions/histograms from those) • Think CPU load, temperature, durations, number of requests

Traces • Medium rate of events, supplement metrics • Durations
across systems • Think waterfall graphs and pro f ilers (which some APM tools now also include in trace data)

Logs • Lower rate of events for comparable cost to
metrics but with higher cardinality of related data • Ideally structured, ideally one line per service per request

Lower level questions What questions to ask? • which of
my hosts have high cpu load? • which service is giving a lot of 500 error responses? • where is the time spent on this queue processing an event?

Slightly broader questions What questions to ask? • when will
I run out of my error budget for my SLO? • how many items are we selling compared to the same time last month? • how many instances can we turn off?

Higher level questions What questions to ask? • how much
of our daily sales are from repeat customers? • how many streams do our customers play on average per month over time? • how much are we paying per request handled?

https://unsplash.com/photos/a-sign-that-is-on-the-side-of-a-hill-jCfDzOQ2-C8

Considerations A sample project • Needs to be actively maintained
• Needs to run locally • Needs to mostly maintain itself

Observability

Observability

Daniel Temme

More Decks by Daniel Temme

Featured

Transcript

Making Observability Tangible with the help of test automation

(slides and references are available online) Why I f ind

https://unsplash.com/photos/time-lapse-photo-of-lighted-ferris-wheel-at-park-during-nighttime-WnADr2BG174

(Reenactment) A long time ago...

https://unsplash.com/photos/yellow-and-black-bird-on-brown-wooden-post-q77K0zIDTmI

https://unsplash.com/photos/photography-of-theater-chairs-e_RpjNyMgEM

https://web.archive.org/web/20110128133821/https://www.net f lix.com/

Please hold for a brief interruption https://unsplash.com/photos/white-and-red-plastic-packs-9FDI-_E29 f k

The o11y building blocks O`bservabilit | wc - l `y

Metrics • High rate of events, with low cardinality at

Traces • Medium rate of events, supplement metrics • Durations

Logs • Lower rate of events for comparable cost to

Lower level questions What questions to ask? • which of

Slightly broader questions What questions to ask? • when will

Higher level questions What questions to ask? • how much

https://unsplash.com/photos/a-sign-that-is-on-the-side-of-a-hill-jCfDzOQ2-C8

Considerations A sample project • Needs to be actively maintained