Prometheusの今後.pdf

Slide 1

Slide 1 text

ç KubeCon + CNCon でみた Prometheusの今後 Cloud Native Meetup Tokyo #6 KubeCon + CNCon Recap @CyberAgent @yosshi_

Slide 2

Slide 2 text

⾃⼰紹介 n 吉村翔太(@yosshi_) n NTTコミュニケーションズ所属 n データサイエンスチーム n インフラエンジニア/データエンジニアリング n Kurbernetes、Kafka 、Hadoop etc n コミュニティ活動 “Cloud Native Developers JP”

Slide 3

Slide 3 text

What is Prometheus? 参考< https://prometheus.io/docs/introduction/overview/ > Googleで使⽤していた監視ツール「Borgmon」を参考にしたらしい所感：メトリクスを取り扱うのが得意

Slide 4

Slide 4 text

Architecture 参考< https://prometheus.io/docs/introduction/overview/ >

Slide 5

Slide 5 text

Prometheusの課題 n ⻑期保管の機能がない n 保管期間を延ばすと性能の劣化がある n 時系列DBなので統計情報を取るのが得意ではない n デフォルトで冗⻑構成がない n ActのPrometheusサーバを2つ⽴てるとかしか単体では⽅法がない

Slide 6

Slide 6 text

Remote storage integrations n Third-party Storageが使⽤可能参考< https://prometheus.io/docs/prometheus/latest/storage/#remote-storage-integrations > ⻑期保管はこっちで考える

Slide 7

Slide 7 text

Remote Endpoints and Storage n 現時点で対応している物の⼀覧(2019年1⽉) 参考< https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage > Uber製注意：readのPromQL の評価はPrometheusがするのであんまり期待しない

Slide 8

Slide 8 text

Friends of Prometheus Thanos/Cortex/M3 参考< https://kccna18.sched.com/event/GrXX> Improbable Worlds Limited Waveworks, Grafana Labs Ubder

Slide 9

Slide 9 text

What is Thanos? 参考< https://www.slideshare.net/BartomiejPotka/thanos-global-durable-prometheus-monitoring> n ⽬的 n 複数のPrometheusを横断してメトリクスの収集を可能にする n 無制限にログを保管する n 上記を応答時間を犠牲にすることなく実現する

Slide 10

Slide 10 text

Querier 参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > 1. QuerierがSidecar経由でPrometheusから時系列データを取得 2. PromQLクエリを評価(重複の削除を実施)

Slide 11

Slide 11 text

Sidecar -> Object Storage 参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > n Prometheusがディスクに書き込んだら SidecarがObject Storageに書き込む

Slide 12

Slide 12 text

Store 参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > n Storeを経由するとQuerierがObject Storageのデータを Sidecar経由で操作してるのと同じよう扱える

Slide 13

Slide 13 text

Compactor 参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > n 圧縮とダウンサンプリングを⾏うコンポーネント

Slide 14

Slide 14 text

Ruler 参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > n 通知⽤ n Prometheusそのものが壊れた n Prometheusを横断してルールで通知したい時⽤

Slide 15

Slide 15 text

PrometheusへのThanosの追加⽅法参考< https://improbable.io/games/blog/thanos-prometheus-at-scale > 1. PrometheusサーバーにThanos Sidecarを追加 2. Thanos Queriersをデプロイこの時点で複数のPrometheus使⽤できる

Slide 16

Slide 16 text

1. GCS 2. Thanos Queriersをデプロイ 3. S3等のBuketを作成してsidecarの保存先に指定 4. Store Gatewayをデプロイ(Buketのデータもクエリの対象に) 5. Compactorをデプロイ(圧縮とダウンサンプリングが実施) PrometheusへのThanosの追加⽅法参考< https://improbable.io/games/blog/thanos-prometheus-at-scale >

Slide 17

Slide 17 text

Object storageの対応状況参考< https://github.com/improbable-eng/thanos/blob/master/docs/storage.md > n 2019年1⽉時点ではGCSのみStable オンプレでやってる⾝としてはS3がStableになるまでは様⼦⾒かな

Slide 18

Slide 18 text

What is “Grafana Loki”?

Slide 19

Slide 19 text

名前由来 Prometheus ギリシャ神話に登場する神、天上の⽕を盗んで⼈間に与えた "pro"（先に、前に）＋"mētheus"（考える者）と分解でき、「先⾒の明を持つ者」「熟慮する者」の意 Loki 北欧神話に登場する悪戯好きの神。神々の世界に災いを持ち込んではまたその解決を図るトリックスター的存在。「閉ざす者」、「終わらせる者」の意元は⽕を神格化した存在という説がある Thanos マーベルコミックスが出版するコミック作品に登場するキャラクター、スーパーヴィラン？？ Cortex ⽪質⼀般的に⽣物の器官が、構造的に複数の階層に分けられる場合の外側部分。副腎⽪質や⼤脳⽪質など。？？ M3 ？？？名前のおさらい

Slide 20

Slide 20 text

What is “Grafana Loki”? 参考< https://www.slideshare.net/BartomiejPotka/thanos-global-durable-prometheus-monitoring> n 概要 n ログの収集と可視化ツール n 背景 n メトリクスだけでは、インシデントの全容の半分しか分からない n メトリクスとログの参照時の切替コストを最⼩限に抑える n アプローチ n Prometheusを参考にラベルを活⽤した時系列にデータを格納 n Cortexを参考にスケーラビリティを確保 n Non-Goal n “top 10 users with highest 99th percentile latency”みたいな統計情報の取得

Slide 21

Slide 21 text

KubernetesでのLoggingとは？参考< https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/ >

Slide 22

Slide 22 text

Demo 参考< https://grafana.com/video/loki_intro.mp4>

Slide 23

Slide 23 text

Architecture 参考< https://prometheus.io/docs/introduction/overview/ > deamonsetで配置 deploymentで配置 Grafana v6.0(2019年2⽉予定)で Lokiを正式サポート

Slide 24

Slide 24 text

Architecture 参考< https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/ > : write : read

Slide 25

Slide 25 text

Distributor 参考< https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/ > Promtail からのログの⼀次受けログのラベルのハッシュ値から Ingesterの割当を決定冗⻑化の為にn個に割当(default 3)

Slide 26

Slide 26 text

Ingester(1/2) 参考< https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/ > 圧縮しながら追記

Slide 27

Slide 27 text

Ingester(2/2) 参考< https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/ > Cassandra, Bigtable, DynamoDB オブジェクトストレージ⼀杯になったら書き込み

Slide 28

Slide 28 text

Querier 参考< https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/ > Cassandra, Bigtable, DynamoDB オブジェクトストレージまだIngesterにあるやつ

Slide 29

Slide 29 text

LokiはまだAlpha版 Loki is very much alpha software and should not be used in production environments. 参考< https://grafana.com/blog/2018/12/12/loki-prometheus-inspired-open-source-logging-for-cloud-natives/ > n 2019年2⽉にGrafana v6.0が出て正式サポート予定

Slide 30

Slide 30 text

監視(インシデントの究明)の⽬指すとこ

Slide 31

Slide 31 text

⻑期保管の今後 n 時系列DBば該当時間の指定して解析するのは得意 n ⻑期保管とそこからの統計情報の収集は別プロダクトを併⽤

Slide 32

Slide 32 text

Kubeconでの収穫(1/2) n Cortex - Infinitely Scalable Prometheus n URL(https://sched.co/GrXL) n Adopting Prometheus the Hard Way n URL(https://sched.co/GrXX) n Large Scale Automated Storage with Kubernetes n URL(https://sched.co/Gsxn) n Intro: Prometheus nURL(https://sched.co/GrXX)

Slide 33

Slide 33 text

Kubeconでの収穫(2/2) n Deep Dive: Prometheus n URL(https://sched.co/H8nM) n Loki nURL(https://sched.co/GrXC)

Slide 34

Slide 34 text

【参考】盛り上がったKeynote Julia Evans(Stripe) Melanie Cebula(Airbnb) Matt Schallert(Uber) Celina Ward(Uber) 2⽇⽬朝(https://sched.co/GsxY) 1⽇⽬夜(https://sched.co/GsxA) 3⽇⽬朝(https://sched.co/Gsxn) 1⽇⽬朝(https://www.cncf.io/phippy/) Phippy(Simple PHP app) & Fiends