Prometheusでデータの水平分割を試みる/Let's split prometheus data

by watawuwu

Slide 1

Slide 1 text

Prometheus でデータ分割を試みる Prometheus Meetup Tokyo #3

Slide 2

Slide 2 text

proﬁle: name: Wataru Matsui org: [ zlab.co.jp ] twitter: @watawuwu

Slide 3

Slide 3 text

Agenda ● Motivation ● Non-goal ● How to respond to increase ● How to scale out ● Conﬁguration ● Browse and Aelrts ● Issue

Slide 4

Slide 4 text

Motivation

Slide 5

Slide 5 text

Address growing data

Slide 6

Slide 6 text

Non-goal

Slide 7

Slide 7 text

× Highly available ○ Data redundancy × Long term storage

Slide 8

Slide 8 text

How to respond to increase in memory and storage usage

Slide 9

Slide 9 text

● Reduce data retention ● Prolong scrape interval ● Reduce unnecessary metrics ● Scale up ● Scale out ● Remote Write/Storage

Slide 10

Slide 10 text

How to scale out without remote storage

Slide 11

Slide 11 text

Prometheus is easy to scale out Pod Prometheus Pod Pod Pod Prometheus

Slide 12

Slide 12 text

Conﬁguration

Slide 13

Slide 13 text

A. Per scrape rule ● For popular settings in Kubernetes ○ Container Metrics(cAdvisor) ○ Node Metrics ○ Application Metrics cAdvisor Node Application

Slide 14

Slide 14 text

A. Per scrape rule ● Application Metrics can be easily split into multiple scrape rules App A App B - job_name: 'app-xxx' kubernetes_sd_conﬁgs: - role: endpoints relabel_conﬁgs: - source_labels: [__meta_kubernetes_service_annotation_app_xxx_scrape] action: keep

Slide 15

Slide 15 text

B. Per metrics(Not Time series) ● Same scrape target, but decide whether to scrape by metrics name metric_relabel_conﬁgs: - source_labels: [__name__] action: drop regex: 'container_fs' cAdvisor cAdvisor

Slide 16

Slide 16 text

C. Per label hash ● Switch targets by label hash - source_labels: [__address__] modulus: ${shard_total} target_label: __tmp_hash action: hashmod - source_labels: [__tmp_hash] regex: ${shard_num} action: keep cAdvisor cAdvisor addr: 10.26.80.18 addr: 10.26.80.19 shard_num: 0 shard_num: 1 shard_total: 2

Slide 17

Slide 17 text

Browse And Alerts

Slide 18

Slide 18 text

A. Aggregate using Remote read API remote_read: - url: http://prometheus-01:9090/api/v1/read read_recent: true - url: http://prometheus-02:9090/api/v1/read read_recent: true - url: http://prometheus-03:9090/api/v1/read read_recent: true

Slide 19

Slide 19 text

B. Aggregate using Thanos Querier Ruler Sidecar Sidecar Sidecar

Slide 20

Slide 20 text

Issue

Slide 21

Slide 21 text

× Not autoscale × Complicated by redundancy × Can't resharding or rebalancing

Slide 22

Slide 22 text

Thanks!