KubeCon___CNCon_でみた最近のPrometheus.pdf

Slide 1

Slide 1 text

ç KubeCon + CNCon でみた最近のPrometheus KubeCon+CNCon NA 2018 Recap!!@Cybozu @yosshi_

Slide 2

Slide 2 text

⾃⼰紹介 n 吉村翔太(@yosshi_) n NTTコミュニケーションズ所属 n データサイエンスチーム n インフラエンジニア/データエンジニアリング n Kurbernetes、Kafka etc n コミュニティ活動 “Cloud Native Developers JP”

Slide 3

Slide 3 text

What is Prometheus? 参考< https://prometheus.io/docs/introduction/overview/ > Googleで使⽤していた監視ツール「Borgmon」を参考にしたらしい所感：メトリクスを取り扱うのが得意

Slide 4

Slide 4 text

Architecture 参考< https://prometheus.io/docs/introduction/overview/ >

Slide 5

Slide 5 text

Kubeconの参加前に気になってたこと n 可⽤性/拡張性 n 冗⻑化構成 n 監視対象が増加した際の性能の維持 n ログの⻑期保管

Slide 6

Slide 6 text

Prometheusだけで出来そうなこと - 信頼性 n SPOFを無くしたい LB Act-Act LBでヘルスチェック Prometheus Grafana Target ２個作ってもいいよね

Slide 7

Slide 7 text

Prometheusだけで出来そうなこと - 拡張性 n監視対象が増えても通知のレスポンスは落としたくない Prometheus 1 Grafana 1 Target 1 2系統作ればいいじゃないか Target 2 Prometheus 2 Grafana 2

Slide 8

Slide 8 text

Prometheusだけで出来そうなこと - ⻑期保管 n キャパシティプランニング⽤に欲しい外部ストレージに書き出せば参考< https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations >

Slide 9

Slide 9 text

Prometheusだけで出来そうなこと - ⻑期保管多段にしても何とかなるか？ Target Grafana 監視⽤ Prometheus ⻑期保管⽤ (保管期間1年) Grafana ⻑期保管⽤ Prometheus 監視⽤ (保管期間14⽇) 特定のメトリクスだけ収集

Slide 10

Slide 10 text

Kubeconでの収穫 n Cortex - Infinitely Scalable Prometheus n URL(https://sched.co/GrXL) n Adopting Prometheus the Hard Way n URL(https://sched.co/GrXX) n Large Scale Automated Storage with Kubernetes n URL(https://sched.co/Gsxn) n Intro: Prometheus nURL(https://sched.co/GrXX)

Slide 11

Slide 11 text

Friends of Prometheus Thanos/Cortex/M3 参考< https://kccna18.sched.com/event/GrXX>

Slide 12

Slide 12 text

Prometheusで最初に⽬指した世界参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > n Target毎にPrometheusを配置 n フェデーレションを使って、横断してみれるPrometheus ダメでした構成は複雑だし遅くなる

Slide 13

Slide 13 text

What is Thanos 参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > n Sidecarを使って、複数のPrometheusを1個⾒せてるふう VitessとMySQLの関係に似てるな

Slide 14

Slide 14 text

Thanosの分散処理 n Querierが神様になって処理を分散してくれるよく⾒る “神”&”Sidecar”パターンですね既存のプロダクトを“神”&”Sidecar”でスケールするのってKube界に多いですね。

Slide 15

Slide 15 text

Thanosの⻑期保管 n SidecarさんがS3 or GCSに書いてくれる

Slide 16

Slide 16 text

What is Cortex ? LogoのURL < https://github.com/cortexproject/cortex > n CNCF sandbox project n 特徴 n Horizontally scalable n Highly Available n Long-term storage n Multi-tenant 参考< https://sched.co/GrXL>

Slide 17

Slide 17 text

Architecture 参考< https://github.com/cortexproject/cortex/blob/master/docs/architecture.md> Prometheusの Remote write API Prometheusそのものをスケールさせようとはしていない書き出し先の外部ストレージに Queryの実⾏エンジンが付いてるふう

Slide 18

Slide 18 text

What is M3 参考< https://github.com/m3db/m3> nUberが開発 n分散型の時系列DB

Slide 19

Slide 19 text

参考< https://sched.co/Gsxn> UberでのM3の利⽤状況

Slide 20

Slide 20 text

参考< https://sched.co/Gsxn> UberでのM3の利⽤状況

Slide 21

Slide 21 text

Architecture 参考< https://eng.uber.com/m3/ > これもPrometheusのSidecar

Slide 22

Slide 22 text

Architecture 参考< https://eng.uber.com/m3/ > n etcdに書いてあるルールに応じて集計、圧縮して書き込んでくれる

Slide 23

Slide 23 text

【参考】盛り上がったKeynote Julia Evans(Stripe) Melanie Cebula(Airbnb) Matt Schallert(Uber) Celina Ward(Uber) 2⽇⽬朝(https://sched.co/GsxY) 1⽇⽬夜(https://sched.co/GsxA) 3⽇⽬朝(https://sched.co/Gsxn) 1⽇⽬朝(https://www.cncf.io/phippy/) Phippy(Simple PHP app) & Fiends

Slide 24

Slide 24 text

【参考】Phippy & Friendsの画像の⼊⼿先参考< https://github.com/cncf/artwork/tree/master/other/phippy-and-friends >