Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Generating SLO rules for Prometheus

Generating SLO rules for Prometheus

We want to use SLOs in Prometheus and a lot of the alerting rules and recording rules are quite some repetition, so we can use Jsonnet to generate Prometheus files.

Matthias Loibl

November 07, 2019
Tweet

More Decks by Matthias Loibl

Other Decks in Technology

Transcript

  1. SLO libsonnet
    Matthias Loibl [@metalmatze]
    Software Engineer, Red Hat

    View full-size slide

  2. SLOs
    Service Level Objectives

    View full-size slide

  3. SLIs
    Service Level Indicators

    View full-size slide

  4. SLIs
    rate(http_requests_total{code=~”5..”}[5m])

    View full-size slide

  5. SLOs
    rate(http_requests_total{code=~”5..”}[5m]) > 0.01

    View full-size slide

  6. Multi-Burn-Rate Alerts

    View full-size slide

  7. alert: ErrorBudgetBurn
    expr:
    (
    status_class_5xx:http_requests_total:ratio_rate1h{job="prometheus"} > (14.4*0.001000)
    and
    status_class_5xx:http_requests_total:ratio_rate5m{job="prometheus"} > (14.4*0.001000)
    )
    or
    (
    status_class_5xx:http_requests_total:ratio_rate6h{job="prometheus"} > (6*0.001000)
    and
    status_class_5xx:http_requests_total:ratio_rate30m{job="prometheus"} > (6*0.001000)
    )

    View full-size slide

  8. record: status_class_5xx:http_requests_total:ratio_rate5m
    expr:
    sum(status_class:http_requests_total:rate5m{job="prometheus",status_class="5xx"})
    /
    sum(status_class:http_requests_total:rate5m{job="prometheus"})

    View full-size slide

  9. record: status_class:http_requests_total:rate5m
    expr:
    sum by (status_class) (
    label_replace(
    rate(http_requests_total{job="prometheus"}[5m]
    ), "status_class", "${1}xx", "code", "([0-9])..")
    )

    View full-size slide

  10. SLO-libsonnet
    github.com/metalmatze/slo-libsonnet

    View full-size slide

  11. local slo = import '../slo-libsonnet/slo.libsonnet';
    {
    local errorburnrate = slo.errorburn({
    metric: http_requests_total,
    selectors: ['job="prometheus"'],
    errorBudget: 1-0.99,
    }),
    rules:
    errorburnrate.recordingrules +
    errorburnrate.alerts,
    }

    View full-size slide

  12. HTML 43.4%
    Dart
    (no JS)
    23.0%
    Go 21.3%
    Jsonnet 7.1%

    View full-size slide

  13. Feedback!
    ...and help :D

    View full-size slide

  14. Thanks
    to Björn Rabenstein & Aditya Konarde

    View full-size slide

  15. promtools.matthiasloibl.com
    github.com/metalmatze/slo-libsonnet
    github.com/metalmatze/slo-libsonnet-web

    View full-size slide