Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Metric-Driven Decision Making with Custom Prometheus Exporter

Metric-Driven Decision Making with Custom Prometheus Exporter

Cloud Native Days Spring 2021 Online
https://event.cloudnativedays.jp/cndo2021/talks/681

Takeshi Kondo

March 02, 2021
Tweet

More Decks by Takeshi Kondo

Other Decks in Technology

Transcript

  1. Metric-Driven Decision Making
    with Custom Prometheus Exporter
    Takeshi Kondo / @chaspy
    2021/03/12
    Cloud Native Days Spring 2021 Online

    View Slide

  2. Who am I
    chaspy chaspy_
    Lead Software Engineer

    Site Reliability at Quipper
    Takeshi Kondo

    View Slide

  3. Metric-Driven Decision Making
    ϝτϦοΫΛجʹͨ͠ҙࢥܾఆ

    View Slide

  4. ͜Μͳ͜ͱɺ͋ΔΑͶʁ
    • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ
    • ͨͿΜ
    • ͓ͦΒ͘
    • ͖ͬͱ
    • ஌ΒΜ͚Ͳ

    View Slide

  5. ೔ৗۀ຿ͷ͋ΒΏΔࣄ৅
    • Deploy Αࣦ͘ഊ͢ΔͶɺRerun ͯ͠ΔͶ
    • ͋ͷαʔόΑ͘Ԡ౴͠ͳ͘ͳΔͶɺRestart ͯ͠ΔͶ
    • ηΩϡϦςΟҧ൓ͯ͠ΔͶɺ௨஌དྷͯΔ͚Ͳ୭΋Έͯͳ͍Ͷ
    • νέοτͨ·ͬͯΔͶɺͣͬͱͨ·ͬͯΔͶ

    View Slide

  6. ͜Μͳ͜ͱɺ͋ΔΑͶʁ
    • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ
    ࣮ࡍʹͲΕ͙Β͍ͳͷʁ
    ͲΕ͙Β͍ͳΒڐ༰Ͱ͖Δͷʁ
    ͲΕ͙Β͍ͳΒΞΫγϣϯΛى͜͢ͷʁ

    View Slide

  7. Metric-Driven Decision Making
    ϝτϦοΫΛجʹͨ͠ҙࢥܾఆ

    View Slide

  8. ຊ೔ͷΰʔϧ
    • "ͳΜͱͳ͘”໰୊Λײ͍ͯͯ͡ఆྔతʹղܾ͍ͨ͠ͻͱ͕
    • Metric-Driven ͳ໰୊ղܾͷϝϦοτΛ஌Γ
    • Prometheus Exporter Λࣗ࡞͢Δํ๏Λ஌Γ
    • ਎ۙͳ໰୊ղܾͷͨΊͷώϯτΛಘΔ͜ͱ

    View Slide

  9. ຊ೔͓࿩͢͠Δ͜ͱ
    • Prometheus Exporter ͱ͸Կ͔
    • Prometheus ܗࣜ / OpenMetrics ͱ͸Կ͔
    • Prometheus Exporter ͷࣗ࡞ํ๏
    • ࣄྫ঺հ
    • ·ͱΊ

    View Slide

  10. Prometheus Exporter
    • Prometheus: ૯߹؂ࢹ OSS
    • Prometheus Exporter:
    • Prometheus ͷίϯϙʔωϯτͷ1ͭ
    • Prometheus server(metrics Λऩू͢Δ܅) ʹ metrics Λެ։͢Δ܅
    • Official / 3rd-party ؚΊͨ͘͞Μͷछྨ͕ଘࡏ͢Δ
    EXPORTERS AND INTEGRATIONS: https://prometheus.io/docs/instrumenting/exporters/

    View Slide

  11. Prometheus Architecture
    https://prometheus.io/docs/introduction/overview/#architecture

    View Slide

  12. Prometheus Architecture
    https://prometheus.io/docs/introduction/overview/#architecture

    View Slide

  13. Prometheus ܗࣜ / OpenMetrics ͱ͸Կ͔
    https://openmetrics.io

    View Slide

  14. Open Metrics
    • Prometheus exposition format 0.0.4 ͔Β֦ு
    • Metrics ͷඪ४ԽΛ໨ࢦ͢
    • OpenMetrics specifies today's de-facto standard for transmitting cloud-native
    metrics at scale, with support for both text representation and Protocol Buffers
    and brings it into IETF. It supports both pull and push-based data collection.
    • Metrics ͱ͸: Ұ࿈ͷσʔλͷݱࡏͷ snapshot (Log ΍ Event ͱ͸ҟͳΔ)
    • HTTP GET /metrics ʹରͯ͠ Openmetrics ܗࣜͷ metrics Λެ։
    • ͦͷଞ͍Ζ͍Ζ

    View Slide

  15. Prometheus Exporter Λࣗ࡞͢Δ
    • Client library ͕͋Δ
    • ཁ݅
    • http hostname:8080/metric Λ export (port ͸ͳΜͰ΋͍͍)
    • ܾ·ͬͨܗࣜͰ metric value Λ return
    • ࣗ࡞ͨ͠Ϟνϕʔγϣϯ
    • ۀ຿Ͱղܾ͍ͨ͠՝୊͕͋ͬͨ
    • ཁ݅Λຬͨ͢ Exporter ͸ͳ͔ͬͨɹ
    • Go ॻ͘܇࿅

    View Slide

  16. ࣮ߦ؀ڥ
    Kubernetes Integrations Autodiscovery https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes
    ݩͱͳΔσʔλΛఏڙ͢Δ API
    YYYFYQPSUFS %FQMPZNFOU

    EBUBEPHBHFOU %BFNPOTFU

    Get http://host/metrics
    Get the data
    Send custom metrics
    Parse and expose
    the metric

    View Slide

  17. ࣄྫ঺հ
    • GitHub
    • Issue
    • Pull Request
    • CircleCI
    • Insights API
    • AWS
    • RDS ͷ Engine Version
    • RDS ͷ Max Connections

    View Slide

  18. Template: title
    • Problem:
    • How to solve:
    • Result:

    View Slide

  19. GitHub Issue
    • Problem
    • Open ͳϙετϞʔςϜͷ Issue ͕ͨ·͍ͬͯͨ
    • How to solve:
    • Label Λλάͱͯ͠෇༩ͯ͠ Issue ਺Λ export ͢Δ exporter Λॻ͍ͨ
    • Result:
    • ݱঢ়ͷ਺͕ՄࢹԽ͞Εͨ
    • NݸҎ্ͩͱด͡ʹ͍͜͏ɺͱҙࢥܾఆͰ͖ΔΑ͏ʹͳͬͨ
    https://github.com/chaspy/github-issue-prometheus-exporter

    View Slide

  20. GitHub Issue
    $ curl -s localhost:8080/metrics | grep github_issue_prometheus_exporter_issue_count
    github_issue_prometheus_exporter_issue_count{author="chaspy",label="SRE",number="27193
    ",repo="quipper/quipper"} 1
    https://github.com/quipper/quipper/issues?
    q=is:open+label:Postmortem+label:SRE

    View Slide

  21. GitHub Pull Request
    • Problem
    • Renovate ͷ PR ͕ͨ·͍ͬͯͨ
    • How to solve:
    • Label Λλάͱͯ͠෇༩ͯ͠ PR ਺Λ export ͢Δ exporter Λॻ͍ͨ
    • Result:
    • ݱঢ়ͷ਺͕ϦϙδτϦͷ਺ͱͱ΋ʹՄࢹԽ͞Εͨ
    • NݸҎ্ͩͱด͡ʹ͍͜͏ɺͱҙࢥܾఆͰ͖ΔΑ͏ʹͳͬͨ
    https://github.com/chaspy/github-pr-prometheus-exporter

    View Slide

  22. GitHub Pull Request
    $ curl -s localhost:8080/metrics | grep github_pr_prometheus_exporter_pull_request_count
    github_pr_prometheus_exporter_pull_request_count{author="renovate[bot]",label="renovate:ingress-
    nginx,renovate:ingress-nginx/3.20.1",number="1739",repo="quipper/kubernetes-clusters",reviewer="chaspy"} 1
    https://github.com/search?
    q=org:quipper+is:open+author:app/renovate

    View Slide

  23. CircleCI Insights
    • Problem:
    • ڊେͳ monorepo ͷ CI ͕஗͍ɺFlaky Test ͷ໰୊ͷ෼ੳ͕೉͍͠
    • How to solve:
    • CircleCI Insights ͷ API ͷ Prometheus Exporter Λ࡞ͬͨ
    • Result:
    • ݁Ռ͕ՄࢹԽ͞ΕɺϘτϧωοΫͷ෦෼͔ΒվળՄೳʹͳͬͨ
    https://github.com/chaspy/circleci-insights-prometheus-exporter

    View Slide

  24. CircleCI Insights

    View Slide

  25. AWS RDS Engine Version
    • Problem:
    • RDS Engine Version ͷ EOL ৘ใʹؾͮ͘ͷ͕೉͍͠
    • How to solve:
    • RDS Engine Version ͷ EOL ৘ใͷ Prometheus Exporter Λॻ͍ͨ
    • Result:
    • อ༗͢Δ RDS ͕ EOL ʹ͍ۙͮͨΒΞϥʔτΛඈ͹ͤΔΑ͏ʹͳͬͨ
    https://github.com/chaspy/aws-rds-engine-version-prometheus-exporter

    View Slide

  26. AWS RDS Engine Version

    View Slide

  27. AWS RDS Max Connections
    • Problem:
    • MaxConnections ͕ Metric ʹͳ͍ͷͰΞϥʔτઃఆͰ͖ͳ͍
    • Connection ਺ͷ Anomaly Alert ͕͕͋ͬͨ False Positive ͕ى͖Δ
    • How to solve:
    • Max Connections ͷ Prometheus Exporter Λॻ͍ͨ
    • Result:
    • Max Connections ͱݱࡏͷ Connections Λ༻͍ͯΞϥʔτઃఆͰ͖ͨ
    https://github.com/chaspy/aws-rds-maxcon-prometheus-exporter

    View Slide

  28. AWS RDS Max Connections

    View Slide

  29. ·ͱΊ(1)
    • Prometheus Exporter ͸ϥΠϒϥϦ͕͋Γ؆୯ʹࣗ࡞Մೳ
    • Metric ͱͯ͠ѻ͏͜ͱʹΑΔϝϦοτ
    • Tag ʹΑͬͯϑΟϧλͰ͖Δ
    • ೚ҙͷᮢ஋ΛઃఆͰ͖Δ
    • ՄࢹԽʹΑΓ௕ظτϨϯυΛ೺ѲͰ͖Δ

    View Slide

  30. ·ͱΊ(2)
    • Event ΑΓ΋ Metric
    • ؔ৺͝ͱͷ Event Λ Slack ௨஌౳Α͘΍Δ͕Ξϯνύλʔϯ
    • ೝ஌ίετΛୣ͏͚ͩʹ͔͠ͳΒͳ͍
    • ؔ৺ͷ Event Λঢ়ଶʹม׵͠ɺGauge ͱͯ͠ Export ͢Δͷ͕ྑ͍

    View Slide

  31. Metric-Driven Decision Making
    ϝτϦοΫΛجʹͨ͠ҙࢥܾఆ

    View Slide

  32. Metric ͱ͍͏ Fact Λجʹత֬ͳҙࢥܾఆΛ͍ͯ͜͠͏
    • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ
    ࣮ࡍʹͲΕ͙Β͍ͳͷʁ
    ͲΕ͙Β͍ͳΒڐ༰Ͱ͖Δͷʁ
    ͲΕ͙Β͍ͳΒΞΫγϣϯΛى͜͢ͷʁ

    View Slide

  33. Metric ͱ͍͏ Fact Λجʹత֬ͳҙࢥܾఆΛ͍ͯ͜͠͏
    • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ
    ࣮ࡍʹ࠷ۙ͸ n݅དྷ͍ͯΔ
    ຖ೔/िҰͰνΣοΫͯ͠n݅௒͑ͨΒΞΫγϣϯ
    ͠Α͏
    ᮢ஋Λઃఆͯ͠ΞϥʔτΛઃఆ͠Α͏

    View Slide

  34. Metric-Driven Decision Making
    ೝ஌ίετΛݮΒ͠ɺఆྔ৘ใʹΑΓ
    త֬ͳҙࢥܾఆΛ͍ͯ͜͠͏

    View Slide

  35. Thank you!
    chaspy chaspy_
    Lead Software Engineer

    Site Reliability at Quipper
    Takeshi Kondo

    View Slide