Upgrade to Pro — share decks privately, control downloads, hide ads and more …


Daniel Temme
May 06, 2024


Talk from DevConf 2024

Daniel Temme

May 06, 2024


  1. (slides and references are available online) Why I f ind

    the topic interesting O11y 101 (a brief intro and some ideas) A playground to experiment Agenda
  2. The o11y building blocks O`bservabilit | wc - l `y

    • Metrics • Traces • Logs
  3. Metrics • High rate of events, with low cardinality at

    a reasonable cost • Gauges and counters (and distributions/histograms from those) • Think CPU load, temperature, durations, number of requests
  4. Traces • Medium rate of events, supplement metrics • Durations

    across systems • Think waterfall graphs and pro f ilers (which some APM tools now also include in trace data)
  5. Logs • Lower rate of events for comparable cost to

    metrics but with higher cardinality of related data • Ideally structured, ideally one line per service per request
  6. Lower level questions What questions to ask? • which of

    my hosts have high cpu load? • which service is giving a lot of 500 error responses? • where is the time spent on this queue processing an event?
  7. Slightly broader questions What questions to ask? • when will

    I run out of my error budget for my SLO? • how many items are we selling compared to the same time last month? • how many instances can we turn off?
  8. Higher level questions What questions to ask? • how much

    of our daily sales are from repeat customers? • how many streams do our customers play on average per month over time? • how much are we paying per request handled?
  9. Considerations A sample project • Needs to be actively maintained

    • Needs to run locally • Needs to mostly maintain itself