Making
Observability
Tangible
with the help of test
automation
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
(slides and references are available online)
Why I
f
ind the topic interesting
O11y 101
(a brief intro and some ideas)
A playground to experiment
Agenda
https://web.archive.org/web/20110128133821/https://www.net
f
lix.com/
Slide 9
Slide 9 text
Please hold for a brief
interruption
https://unsplash.com/photos/white-and-red-plastic-packs-9FDI-_E29
f
k
Slide 10
Slide 10 text
The o11y building blocks
O`bservabilit | wc
-
l
`y
• Metrics
• Traces
• Logs
Slide 11
Slide 11 text
Metrics
• High rate of events, with low cardinality
at a reasonable cost
• Gauges and counters (and
distributions/histograms from those)
• Think CPU load, temperature, durations,
number of requests
Slide 12
Slide 12 text
Traces
• Medium rate of events, supplement
metrics
• Durations across systems
• Think waterfall graphs and pro
f
ilers
(which some APM tools now also
include in trace data)
Slide 13
Slide 13 text
Logs
• Lower rate of events for comparable
cost to metrics but with higher
cardinality of related data
• Ideally structured, ideally one line per
service per request
Slide 14
Slide 14 text
Lower level questions
What questions to ask?
• which of my hosts have high cpu load?
• which service is giving a lot of 500 error responses?
• where is the time spent on this queue processing an event?
Slide 15
Slide 15 text
Slightly broader questions
What questions to ask?
• when will I run out of my error budget for my SLO?
• how many items are we selling compared to the same time last month?
• how many instances can we turn off?
Slide 16
Slide 16 text
Higher level questions
What questions to ask?
• how much of our daily sales are from repeat customers?
• how many streams do our customers play on average per month over time?
• how much are we paying per request handled?