Slide 1

Slide 1 text

Metric-Driven Decision Making with Custom Prometheus Exporter Takeshi Kondo / @chaspy 2021/03/12 Cloud Native Days Spring 2021 Online

Slide 2

Slide 2 text

Who am I chaspy chaspy_ Lead Software Engineer Site Reliability at Quipper Takeshi Kondo

Slide 3

Slide 3 text

Metric-Driven Decision Making ϝτϦοΫΛجʹͨ͠ҙࢥܾఆ

Slide 4

Slide 4 text

͜Μͳ͜ͱɺ͋ΔΑͶʁ • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ • ͨͿΜ • ͓ͦΒ͘ • ͖ͬͱ • ஌ΒΜ͚Ͳ

Slide 5

Slide 5 text

೔ৗۀ຿ͷ͋ΒΏΔࣄ৅ • Deploy Αࣦ͘ഊ͢ΔͶɺRerun ͯ͠ΔͶ • ͋ͷαʔόΑ͘Ԡ౴͠ͳ͘ͳΔͶɺRestart ͯ͠ΔͶ • ηΩϡϦςΟҧ൓ͯ͠ΔͶɺ௨஌དྷͯΔ͚Ͳ୭΋Έͯͳ͍Ͷ • νέοτͨ·ͬͯΔͶɺͣͬͱͨ·ͬͯΔͶ

Slide 6

Slide 6 text

͜Μͳ͜ͱɺ͋ΔΑͶʁ • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ ࣮ࡍʹͲΕ͙Β͍ͳͷʁ ͲΕ͙Β͍ͳΒڐ༰Ͱ͖Δͷʁ ͲΕ͙Β͍ͳΒΞΫγϣϯΛى͜͢ͷʁ

Slide 7

Slide 7 text

Metric-Driven Decision Making ϝτϦοΫΛجʹͨ͠ҙࢥܾఆ

Slide 8

Slide 8 text

ຊ೔ͷΰʔϧ • "ͳΜͱͳ͘”໰୊Λײ͍ͯͯ͡ఆྔతʹղܾ͍ͨ͠ͻͱ͕ • Metric-Driven ͳ໰୊ղܾͷϝϦοτΛ஌Γ • Prometheus Exporter Λࣗ࡞͢Δํ๏Λ஌Γ • ਎ۙͳ໰୊ղܾͷͨΊͷώϯτΛಘΔ͜ͱ

Slide 9

Slide 9 text

ຊ೔͓࿩͢͠Δ͜ͱ • Prometheus Exporter ͱ͸Կ͔ • Prometheus ܗࣜ / OpenMetrics ͱ͸Կ͔ • Prometheus Exporter ͷࣗ࡞ํ๏ • ࣄྫ঺հ • ·ͱΊ

Slide 10

Slide 10 text

Prometheus Exporter • Prometheus: ૯߹؂ࢹ OSS • Prometheus Exporter: • Prometheus ͷίϯϙʔωϯτͷ1ͭ • Prometheus server(metrics Λऩू͢Δ܅) ʹ metrics Λެ։͢Δ܅ • Official / 3rd-party ؚΊͨ͘͞Μͷछྨ͕ଘࡏ͢Δ EXPORTERS AND INTEGRATIONS: https://prometheus.io/docs/instrumenting/exporters/

Slide 11

Slide 11 text

Prometheus Architecture https://prometheus.io/docs/introduction/overview/#architecture

Slide 12

Slide 12 text

Prometheus Architecture https://prometheus.io/docs/introduction/overview/#architecture

Slide 13

Slide 13 text

Prometheus ܗࣜ / OpenMetrics ͱ͸Կ͔ https://openmetrics.io

Slide 14

Slide 14 text

Open Metrics • Prometheus exposition format 0.0.4 ͔Β֦ு • Metrics ͷඪ४ԽΛ໨ࢦ͢ • OpenMetrics specifies today's de-facto standard for transmitting cloud-native metrics at scale, with support for both text representation and Protocol Buffers and brings it into IETF. It supports both pull and push-based data collection. • Metrics ͱ͸: Ұ࿈ͷσʔλͷݱࡏͷ snapshot (Log ΍ Event ͱ͸ҟͳΔ) • HTTP GET /metrics ʹରͯ͠ Openmetrics ܗࣜͷ metrics Λެ։ • ͦͷଞ͍Ζ͍Ζ

Slide 15

Slide 15 text

Prometheus Exporter Λࣗ࡞͢Δ • Client library ͕͋Δ • ཁ݅ • http hostname:8080/metric Λ export (port ͸ͳΜͰ΋͍͍) • ܾ·ͬͨܗࣜͰ metric value Λ return • ࣗ࡞ͨ͠Ϟνϕʔγϣϯ • ۀ຿Ͱղܾ͍ͨ͠՝୊͕͋ͬͨ • ཁ݅Λຬͨ͢ Exporter ͸ͳ͔ͬͨɹ • Go ॻ͘܇࿅

Slide 16

Slide 16 text

࣮ߦ؀ڥ Kubernetes Integrations Autodiscovery https://docs.datadoghq.com/agent/kubernetes/integrations/?tab=kubernetes ݩͱͳΔσʔλΛఏڙ͢Δ API YYYFYQPSUFS %FQMPZNFOU EBUBEPHBHFOU %BFNPOTFU Get http://host/metrics Get the data Send custom metrics Parse and expose the metric

Slide 17

Slide 17 text

ࣄྫ঺հ • GitHub • Issue • Pull Request • CircleCI • Insights API • AWS • RDS ͷ Engine Version • RDS ͷ Max Connections

Slide 18

Slide 18 text

Template: title • Problem: • How to solve: • Result:

Slide 19

Slide 19 text

GitHub Issue • Problem • Open ͳϙετϞʔςϜͷ Issue ͕ͨ·͍ͬͯͨ • How to solve: • Label Λλάͱͯ͠෇༩ͯ͠ Issue ਺Λ export ͢Δ exporter Λॻ͍ͨ • Result: • ݱঢ়ͷ਺͕ՄࢹԽ͞Εͨ • NݸҎ্ͩͱด͡ʹ͍͜͏ɺͱҙࢥܾఆͰ͖ΔΑ͏ʹͳͬͨ https://github.com/chaspy/github-issue-prometheus-exporter

Slide 20

Slide 20 text

GitHub Issue $ curl -s localhost:8080/metrics | grep github_issue_prometheus_exporter_issue_count github_issue_prometheus_exporter_issue_count{author="chaspy",label="SRE",number="27193 ",repo="quipper/quipper"} 1 https://github.com/quipper/quipper/issues? q=is:open+label:Postmortem+label:SRE

Slide 21

Slide 21 text

GitHub Pull Request • Problem • Renovate ͷ PR ͕ͨ·͍ͬͯͨ • How to solve: • Label Λλάͱͯ͠෇༩ͯ͠ PR ਺Λ export ͢Δ exporter Λॻ͍ͨ • Result: • ݱঢ়ͷ਺͕ϦϙδτϦͷ਺ͱͱ΋ʹՄࢹԽ͞Εͨ • NݸҎ্ͩͱด͡ʹ͍͜͏ɺͱҙࢥܾఆͰ͖ΔΑ͏ʹͳͬͨ https://github.com/chaspy/github-pr-prometheus-exporter

Slide 22

Slide 22 text

GitHub Pull Request $ curl -s localhost:8080/metrics | grep github_pr_prometheus_exporter_pull_request_count github_pr_prometheus_exporter_pull_request_count{author="renovate[bot]",label="renovate:ingress- nginx,renovate:ingress-nginx/3.20.1",number="1739",repo="quipper/kubernetes-clusters",reviewer="chaspy"} 1 https://github.com/search? q=org:quipper+is:open+author:app/renovate

Slide 23

Slide 23 text

CircleCI Insights • Problem: • ڊେͳ monorepo ͷ CI ͕஗͍ɺFlaky Test ͷ໰୊ͷ෼ੳ͕೉͍͠ • How to solve: • CircleCI Insights ͷ API ͷ Prometheus Exporter Λ࡞ͬͨ • Result: • ݁Ռ͕ՄࢹԽ͞ΕɺϘτϧωοΫͷ෦෼͔ΒվળՄೳʹͳͬͨ https://github.com/chaspy/circleci-insights-prometheus-exporter

Slide 24

Slide 24 text

CircleCI Insights

Slide 25

Slide 25 text

AWS RDS Engine Version • Problem: • RDS Engine Version ͷ EOL ৘ใʹؾͮ͘ͷ͕೉͍͠ • How to solve: • RDS Engine Version ͷ EOL ৘ใͷ Prometheus Exporter Λॻ͍ͨ • Result: • อ༗͢Δ RDS ͕ EOL ʹ͍ۙͮͨΒΞϥʔτΛඈ͹ͤΔΑ͏ʹͳͬͨ https://github.com/chaspy/aws-rds-engine-version-prometheus-exporter

Slide 26

Slide 26 text

AWS RDS Engine Version

Slide 27

Slide 27 text

AWS RDS Max Connections • Problem: • MaxConnections ͕ Metric ʹͳ͍ͷͰΞϥʔτઃఆͰ͖ͳ͍ • Connection ਺ͷ Anomaly Alert ͕͕͋ͬͨ False Positive ͕ى͖Δ • How to solve: • Max Connections ͷ Prometheus Exporter Λॻ͍ͨ • Result: • Max Connections ͱݱࡏͷ Connections Λ༻͍ͯΞϥʔτઃఆͰ͖ͨ https://github.com/chaspy/aws-rds-maxcon-prometheus-exporter

Slide 28

Slide 28 text

AWS RDS Max Connections

Slide 29

Slide 29 text

·ͱΊ(1) • Prometheus Exporter ͸ϥΠϒϥϦ͕͋Γ؆୯ʹࣗ࡞Մೳ • Metric ͱͯ͠ѻ͏͜ͱʹΑΔϝϦοτ • Tag ʹΑͬͯϑΟϧλͰ͖Δ • ೚ҙͷᮢ஋ΛઃఆͰ͖Δ • ՄࢹԽʹΑΓ௕ظτϨϯυΛ೺ѲͰ͖Δ

Slide 30

Slide 30 text

·ͱΊ(2) • Event ΑΓ΋ Metric • ؔ৺͝ͱͷ Event Λ Slack ௨஌౳Α͘΍Δ͕Ξϯνύλʔϯ • ೝ஌ίετΛୣ͏͚ͩʹ͔͠ͳΒͳ͍ • ؔ৺ͷ Event Λঢ়ଶʹม׵͠ɺGauge ͱͯ͠ Export ͢Δͷ͕ྑ͍

Slide 31

Slide 31 text

Metric-Driven Decision Making ϝτϦοΫΛجʹͨ͠ҙࢥܾఆ

Slide 32

Slide 32 text

Metric ͱ͍͏ Fact Λجʹత֬ͳҙࢥܾఆΛ͍ͯ͜͠͏ • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ ࣮ࡍʹͲΕ͙Β͍ͳͷʁ ͲΕ͙Β͍ͳΒڐ༰Ͱ͖Δͷʁ ͲΕ͙Β͍ͳΒΞΫγϣϯΛى͜͢ͷʁ

Slide 33

Slide 33 text

Metric ͱ͍͏ Fact Λجʹత֬ͳҙࢥܾఆΛ͍ͯ͜͠͏ • ࠷ۙ ○○ ͕ [ଟ͍ | গͳ͍ | ଎͍ | ஗͍] ؾ͕͢Δʁ ࣮ࡍʹ࠷ۙ͸ n݅དྷ͍ͯΔ ຖ೔/िҰͰνΣοΫͯ͠n݅௒͑ͨΒΞΫγϣϯ ͠Α͏ ᮢ஋Λઃఆͯ͠ΞϥʔτΛઃఆ͠Α͏

Slide 34

Slide 34 text

Metric-Driven Decision Making ೝ஌ίετΛݮΒ͠ɺఆྔ৘ใʹΑΓ త֬ͳҙࢥܾఆΛ͍ͯ͜͠͏

Slide 35

Slide 35 text

Thank you! chaspy chaspy_ Lead Software Engineer Site Reliability at Quipper Takeshi Kondo