Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A practical introduction to observability

A practical introduction to observability

Good observability is essential for modern software. It gives us confidence that our systems are working properly. And it also allows us to debug issues efficiently. In this talk, we’ll explore everything you need to know to start applying good observability to your projects. And we’ll see the most common pitfalls you need to be aware of. We will start with the tools and basic concepts in monitoring. And we’ll go over the 3 most common mistakes people make with it. Then we’ll see how to have automatic alerts to detect issues. And, we’ll touch on the principles for setting up good alerts. As a final step, we’ll see how to build our logging system and how to apply it in the most efficient way to debug issues easily.

Nikolay Stoitsev

August 14, 2021
Tweet

More Decks by Nikolay Stoitsev

Other Decks in Technology

Transcript

  1. Cardinality • search.success, app_version=1, type=Patient • search.success, app_version=1, type=Exam •

    search.success, app_version=2, type=Patient • search.success, app_version=2, type=Exam
  2. Metrics are not accurate • DB engine optimizes for faster

    operations • When performing some operations for a different time resolution • When archiving metrics for long term storage
  3. Use percentiles • Represents the worst experience in 90% of

    the time • Can measure p90, p95, p99 p90
  4. Symptom-based monitoring • Number of 5xx HTTP response codes •

    Response time • Email sending is not working • Users can’t log in
  5. Cause-based monitoring • Free disk space on database server •

    Memory utilisation • Free file descriptors
  6. Picking alerts to start with Front-end Load Balancer Back-end DB

    Count rate of successful log-in Count request success rate
  7. Logging system Application Application Application Log Aggregation Database Dashboard Log

    Collector Log Collector Log Collector Elasticsearch, Loki
  8. Finding logs Can search by: • content of log message

    message : *notification* • all logs from a service kubernetes.labels.app/name.keyword : "api-gateway" • many more thanks to flexible query schema
  9. Structured logging • Append useful key=value pairs • Can group

    (aggregate) by the keys • Can sort by aggregations
  10. Too many logs Application Application Application Log Aggregation Real Time

    Search Engine Log Scraper Log Scraper Log Scraper Dashboard
  11. Too many logs Application Application Application Log Aggregation Real Time

    Search Engine Log Scraper Log Scraper Log Scraper Dashboard Reduce log retention period
  12. Too many logs Application Application Application Log Aggregation Real Time

    Search Engine Log Scraper Log Scraper Log Scraper Dashboard Cold Storage Query UI
  13. End-to-end summary 1. Configure automated alerts 2. Use metrics and

    tracing to pinpoint the problem 3. Use structured logging to find the root cause of the problem easily
  14. End-to-end summary 1. Configure automated alerts 2. Use metrics and

    tracing to pinpoint the problem 3. Use structured logging to find the root cause of the problem easily 4. Fix problems and make sure all metrics are always back to normal
  15. Thank you! Q&A Nikolay Stoitsev Engineering Manager at Halo DX

    Photo by Pixabay, Şahin Sezer Dinçer, Andrea Piacquadio, Ian Beckley from Pexels