Observability - Speaker Deck

Slide 1

Slide 1 text

Making Observability Tangible with the help of test automation

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

(slides and references are available online) Why I f ind the topic interesting O11y 101 (a brief intro and some ideas) A playground to experiment Agenda

Slide 4

Slide 4 text

https://unsplash.com/photos/time-lapse-photo-of-lighted-ferris-wheel-at-park-during-nighttime-WnADr2BG174

Slide 5

Slide 5 text

(Reenactment) A long time ago...

Slide 6

Slide 6 text

https://unsplash.com/photos/yellow-and-black-bird-on-brown-wooden-post-q77K0zIDTmI

Slide 7

Slide 7 text

https://unsplash.com/photos/photography-of-theater-chairs-e_RpjNyMgEM

Slide 8

Slide 8 text

https://web.archive.org/web/20110128133821/https://www.net f lix.com/

Slide 9

Slide 9 text

Please hold for a brief interruption https://unsplash.com/photos/white-and-red-plastic-packs-9FDI-_E29 f k

Slide 10

Slide 10 text

The o11y building blocks O`bservabilit | wc - l `y • Metrics • Traces • Logs

Slide 11

Slide 11 text

Metrics • High rate of events, with low cardinality at a reasonable cost • Gauges and counters (and distributions/histograms from those) • Think CPU load, temperature, durations, number of requests

Slide 12

Slide 12 text

Traces • Medium rate of events, supplement metrics • Durations across systems • Think waterfall graphs and pro f ilers (which some APM tools now also include in trace data)

Slide 13

Slide 13 text

Logs • Lower rate of events for comparable cost to metrics but with higher cardinality of related data • Ideally structured, ideally one line per service per request

Slide 14

Slide 14 text

Lower level questions What questions to ask? • which of my hosts have high cpu load? • which service is giving a lot of 500 error responses? • where is the time spent on this queue processing an event?

Slide 15

Slide 15 text

Slightly broader questions What questions to ask? • when will I run out of my error budget for my SLO? • how many items are we selling compared to the same time last month? • how many instances can we turn off?

Slide 16

Slide 16 text

Higher level questions What questions to ask? • how much of our daily sales are from repeat customers? • how many streams do our customers play on average per month over time? • how much are we paying per request handled?

Slide 17

Slide 17 text

https://unsplash.com/photos/a-sign-that-is-on-the-side-of-a-hill-jCfDzOQ2-C8

Slide 18

Slide 18 text

Considerations A sample project • Needs to be actively maintained • Needs to run locally • Needs to mostly maintain itself