Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
ユーザ目線でのPrometheus #mackerel_ug /monitoring-prom...
Search
Manabu Matsuzaki
October 27, 2017
Technology
1
3.6k
ユーザ目線でのPrometheus #mackerel_ug /monitoring-prometheus
モニタリング勉強会の発表資料です
Manabu Matsuzaki
October 27, 2017
Tweet
Share
More Decks by Manabu Matsuzaki
See All by Manabu Matsuzaki
Spring BootユーザのためのArmeria入門 #jsug / Introduce to Armeria for Spring users
matsumana
0
2.8k
Canary Release with Argo Rollouts #ふくばねてす / canary-release-with-argo-rollouts
matsumana
1
1.2k
Getting started Central Dogma with Golang #fukuokago #umedago / getting-started-central-dogma-with-golang
matsumana
0
900
Micrometer入門 #javaq / introduce-to-micrometer
matsumana
1
2.9k
ArmeriaとCentral Dogmaから学ぶ、マイクロサービスに必要な機能 #edayfuk / lean-from-armeria-and-central-dogma
matsumana
0
4.3k
SREcon19 Americas 参加レポート #srefukuoka / srecon19-americas-report
matsumana
0
880
SRE入門 & チームで取り組んでいるSRE #srefukuoka / introduce-to-sre
matsumana
0
1.3k
Introduce to Armeria and Central Dogma #GWD_Nulab / introduce-to-armeria-and-central-dogma
matsumana
0
560
Connector/JでMaster/Slave Replication構成のMySQLに接続する #mysql_casual_fukuoka /connector-j-master-slave-replication
matsumana
0
1.5k
Other Decks in Technology
See All in Technology
The Madness of Multiple Gemini CLIs Developing Simultaneously with Jujutsu
gunta
1
2.6k
TROCCO今昔
gtnao
0
210
OpenTelemetry の Log を使いこなそう
biwashi
5
1k
Bliki (ja), and the Cathedral, and the Bazaar
koic
8
1.4k
Ktor + Google Cloud Tasks/PubSub におけるOTel Messaging計装の実践
sansantech
PRO
1
300
大規模組織にAIエージェントを迅速に導入するためのセキュリティの勘所 / AI agents for large-scale organizations
i35_267
6
240
生成AIによる情報システムへのインパクト
taka_aki
1
160
Amazon CloudWatchのメトリクスインターバルについて / Metrics interval matters
ymotongpoo
3
220
OTel 公式ドキュメント翻訳 PJ から始めるコミュニティ活動/Community activities starting with the OTel official document translation project
msksgm
0
260
Snowflake のアーキテクチャは本当に筋がよかったのか / Data Engineering Study #30
indigo13love
0
260
公開初日に個人環境で試した Gemini CLI 体験記など / Gemini CLI実験レポート
you
PRO
3
360
MCP とマネージド PaaS で実現する大規模 AI アプリケーションの高速開発
nahokoxxx
1
1.5k
Featured
See All Featured
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Faster Mobile Websites
deanohume
308
31k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Building Adaptive Systems
keathley
43
2.7k
Why Our Code Smells
bkeepers
PRO
337
57k
A designer walks into a library…
pauljervisheath
207
24k
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
3.9k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Transcript
ϢʔβઢͰͷPrometheus ϞχλϦϯάษڧձ 2017/10/27 @matsumana
ࣗݾհ • ໊લɿ দ࡚ ֶ • ॴଐɿ LINE Fukuokaגࣜձࣾ ։ൃࣨ
• Twitterɿ @matsumana
ΞδΣϯμ • ࣮ࡍʹͲ͏͍ͬͨϝτϦΫεΛऔ͍ͬͯΔͷ͔ • exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε • Ξϥʔτϧʔϧͷॻ͖ํͷྫ • ·ͱΊ ※࣌ؒͷ߹্ɺPrometheusೖతͳࣄ͠·ͤΜ
લఏ • ݱࡏ༻͍ͯ͠Δόʔδϣϯ • Prometheus 1.8.0
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • node_exporter • loading • CPU༻ • ϝϞϦ༻
• εϫοϓ • slab • context switch
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • node_exporter • σΟεΫI/O • σΟεΫ༻ • inode
• TCPଓ • ESTABLISHED • TIME_WAIT • ωοτϫʔΫ τϥϑΟοΫ
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • apache_exporter • mod_statusϞδϡʔϧͰऔΕΔϝτϦΫε • nginx_exporter • ngx_http_stub_status_moduleϞδϡʔϧͰऔΕΔϝτϦΫε
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • plack_exporter • Plack::Middleware::ServerStatus::LiteͰऔΕΔϝτϦΫε
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • jmx_exporter • JVMͷڞ௨తͳϝτϦΫε • Heap • Metaspace
• GC • Thread • Code cache
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • jmx_exporter • ϛυϧΣΞ/ϥΠϒϥϦݻ༗ͷϝτϦΫε • Tomcat • Jetty
• HikariCP • Caffeine • etc…
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • memcached_exporter • ΩϟογϡαΠζ • Ωϟογϡ • Ωϟογϡώοτ
• operation(get/set) • Eviction
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • mysqld_exporter • QPS • Slow query •
Replicaton delay • binlog • connection • InnoDBؔ࿈
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • redis_exporter • ༻ϝϞϦ • Key • Eviction
• Slow query
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • elasticsearch_exporter • Cluster health • Document(primary shardͷΈ)
• DocumentαΠζ(primary shardͷΈ) • Thread pool queue • Thread pool rejected • Circuit Breaker tripped
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • elasticsearch_exporter • Fielddata cache size • Query
cache size • Query cache hit,miss,total • Request cache size
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • fluent-plugin-prometheus • ֤BufferedOutput pluginͷϝτϦΫε • buffer_queue_length •
buffer_total_bytes • retry_count
ݱࡏ͍ͬͯΔexporterͱɺऔಘ͍ͯ͠ΔϝτϦΫε ʢҰ෦ͷΈൈਮʣ • fluentd_exporter • fluentdϓϩηε୯ҐͷCPU༻ • fluentdϓϩηε୯ҐͷϝϞϦ༻ • td-agent_exporter
• fluentdϓϩηε୯ҐͷCPU༻ • fluentdϓϩηε୯ҐͷϝϞϦ༻ • fluent-agent-lite_exporter • fluentdϓϩηε୯ҐͷCPU༻ • fluentdϓϩηε୯ҐͷϝϞϦ༻
exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε • https://prometheus.io/docs/instrumenting/exporters/ • https://github.com/prometheus/prometheus/wiki/Default- port-allocations • άάΔ • ͕ࣗඞཁͳϝτϦΫε͕export͞Ε͍ͯͳΕPR͢Δ
·ͣɺଞͷਓ͕࡞ͬͨͷ͕ແ͍͔୳ͯ͠ΈΔ
• ΦϑΟγϟϧͷΨΠυϥΠϯ https://prometheus.io/docs/instrumenting/writing_exporters/ exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε ࣗͰ࡞Δ߹
• ΦϑΟγϟϧͷclient library https://prometheus.io/docs/instrumenting/clientlibs/ • ΦϑΟγϟϧgolang, Java, Python, RubyͷΈ •
Third partyʹPHPͳͲ͋Δ exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε ࣗͰ࡞Δ߹
• exportergoͰॻ͘ͷ͕ྑͦ͞͏͚ͩͲҙ͋Δ • Linux Kernel 2.6.23ະຬgolangࣗମ͕αϙʔτͯ͠ͳ͍ CentOS5ܥͩͱgolangͷexporterਖ਼͘͠ಈ͔ͳ͍͔ https://github.com/golang/go/wiki/MinimumRequirements exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε ࣗͰ࡞Δ߹
exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε ΦϑΟγϟϧΨΠυϥΠϯʹهࡌ͞Ε͍ͯΔͷΛҰ෦͝հ • ଞͷexporterͰΘΕͯͳ͍portΛ͏ɻ࡞ͬͨΒهࡌ͠Α͏ https://github.com/prometheus/prometheus/wiki/Default-port- allocations • ϝτϦΫε໊ɺϥϕϧ໊ͷϕετϓϥΫςΟεʹै͏ https://prometheus.io/docs/practices/naming/ •
exporterͰϝτϦΫεΛܭࢉ͠ͳ͍ɻPromQLͰܭࢉ͢Δ • ྫʣΩϟογϡώοτͳͲ
exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε • exporterͷόʔδϣϯ൪߸ΛϝτϦΫεͱͯ͠export͢Δ • ࢹରͷαʔόexporterͷ͕૿͑ΔͱɺͲͷόʔδϣ ϯ͕σϓϩΠ͞Ε͍ͯΔ͔Θ͔Βͳ͘ͳΔ • ͪͳΈʹɺPrometheusຊମҎԼͷΑ͏ͳϝτϦΫεΛ export͍ͯ͠Δ prometheus_build_info{branch=“master",goversion="go1.8.3",revision="3afb3fffa3a29
c3de865e1172fb740442e9d0133",version="1.7.1"} 1 ΦϑΟγϟϧΨΠυϥΠϯʹهࡌ͞Ε͍ͯΔͷΛҰ෦͝հ
exporterΛ࡞Δ࣌ͷϕετϓϥΫςΟε ΦϑΟγϟϧΨΠυϥΠϯʹهࡌ͞Ε͍ͯΔͷΛҰ෦͝հ • ̋̋̋_up ϝτϦΫεΛexport͢Δ • ྫ͑ɺMySQLͷ߹mysql_upͱ͍͏ϝτϦΫε • ࢹର͔ΒϝτϦΫεऔಘޭͳΒ1ɺࣦഊͳΒ0Λexport •
ཧ༝ޙड़
Ξϥʔτϧʔϧͷॻ͖ํͷྫ • ྫ1) exporter͕མ͍ͪͯͳ͍͔Ͳ͏͔ࢹ͢Δ • ྫ2) exporertͷࢹର͕མ͍ͪͯͳ͍͔Ͳ͏͔ࢹ͢Δ
• ྫ3) dailyόον͕ਖ਼ৗʹಈ͍͍ͯΔ͔Ͳ͏͔ࢹ͢Δ
Ξϥʔτϧʔϧͷॻ͖ํͷྫ • Prometheusαʔό͕ࣗɺશͯͷexporterʹ͍ͭͯ upϝτϦΫεΛه͍ͯ͠Δ https://prometheus.io/docs/concepts/jobs_instances/ ྫʣnode_exporterͷ߹ up{instance="host00:9100", job="node"} 1
ྫ1ʣexporter͕མ͍ͪͯͳ͍͔Ͳ͏͔ࢹ͢Δ
Ξϥʔτϧʔϧͷॻ͖ํͷྫ ALERT ExporterDown IF up == 0 FOR 1m LABELS
{severity="major"} ANNOTATIONS { description="{{ $labels.instance }} of job {{ $labels.job }} has been down” } ྫ1ʣexporter͕མ͍ͪͯͳ͍͔Ͳ͏͔ࢹ͢Δ • ͳͷͰɺҎԼͷ༷ͳΞϥʔτϧʔϧΛ1ͭొ͢Ε શͯͷexporterʹରԠͰ͖Δ
Ξϥʔτϧʔϧͷॻ͖ํͷྫ ྫ̎ʣexporertͷࢹର͕མ͍ͪͯͳ͍͔Ͳ͏͔ࢹ͢Δ • ҎԼͷ༷ͳΞϥʔτϧʔϧΛ1ͭొ͢Εɺ ̋̋̋_upϝτϦΫεΛ࣋ͭશͯͷexporterʹରԠͰ͖Δ ALERT ExporterTargetDown IF {__name__=~".*_up"} ==
0 FOR 1m LABELS {severity="major"} ANNOTATIONS { summary="Exporter target ( {{ $labels.job }} ) down" }
Ξϥʔτϧʔϧͷॻ͖ํͷྫ • node_exporterͷTextfile CollectorػೳΛͬͯϝτϦΫεΛexport͢Δ (pushgatewayΛͬͯϝτϦΫεexportग़དྷΔ) • ͜Μͳײ͡ͷshellΛjobεέδϡʔϥͰ࣮ߦ͢ΔΑ͏ʹͯ͠ɺ ྫ̏ʣdailyόον͕ਖ਼ৗʹಈ͍͍ͯΔ͔Ͳ͏͔ࢹ͢Δ #!/bin/bash #
్தͰΤϥʔ͕ൃੜͨ͠Βऴྃ͢Δ set -e UNIXTIME=$(date +%s) RESULT=/path/to/node_exporter_textfile_directory/my_batch_job.prom # execute job /path/to/my_batch_job.sh # write metrics echo my_batch_job_completion_time{cron="my_batch_job", period="daily"} $UNIXTIME > $RESULT.tmp mv $RESULT.tmp $RESULT
Ξϥʔτϧʔϧͷॻ͖ํͷྫ • ҎԼͷ༷ͳΞϥʔτϧʔϧΛొ͢ΕݕՄೳ ྫ̏ʣdailyόον͕ਖ਼ৗʹಈ͍͍ͯΔ͔Ͳ͏͔ࢹ͢Δ ALERT DailyCronLate IF time() - {period="daily"}
> 60 * 60 * 24 FOR 5m LABELS {severity="warning"} ANNOTATIONS { summary="Daily cronjob for {{ $labels.cron }} has not run in 24 hours” }
·ͱΊ • ͍ͬͯΔexporterͱɺऔಘग़དྷΔϝτϦΫεͷҰ෦Λ͝հ ͠·ͨ͠ • exporterΛ࡞Δ߹ͷҙ(golang)ʹ͍ͭͯ͝հ͠·ͨ͠ • exporterΛ࡞Δ߹ͷϕετϓϥΫςΟεΛ͍͔ͭ͘ൈਮͯ͠ ͝հ͠·ͨ͠ •
ΞϥʔτϧʔϧͷྫΛ͍͔ͭ͘͝հ͠·ͨ͠
Thank you :)