Slide 1

Slide 1 text

Prometheus Introduction R&D Sharing

Slide 2

Slide 2 text

What is Prometheus? • Timeseries Database • Pull based • A multi-dimensional data model • Alert System (with AlertManager) • Metrics Collection & Storage • Metrics, not logging, not tracing • Dashboarding / Graphing / Trending • Focus on OS monitoring • A flexible query language(PromQL) to leverage this dimensionality

Slide 3

Slide 3 text

Components The Prometheus ecosystem consists of multiple components, many of which are optional: • the main Prometheus server which scrapes and stores time series data • client libraries for instrumenting application code. • a push gateway for supporting short-lived jobs. • special-purpose exporters for services like HAProxy, StatsD, Graphite, etc. • an alertmanager to handle alerts.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Short-Live Job

Slide 6

Slide 6 text

Short-Live Job Long-Live Job

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Data Model (Label) # HELP http_requests_total Total number of HTTP requests made. # TYPE http_requests_total counter http_requests_total{code="200",path="/status"} 8 Label Value Metric Name Metric regexp: [a-zA-Z_:][a-zA-Z0-9_:]*

Slide 12

Slide 12 text

Metric types • Counter: A counter is a cumulative metric that represents a single monotonically increasing counter. • Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. • Histogram: A histogram samples observations and counts them in configurable buckets. • Summary: Similar to a histogram, a summary samples observations.

Slide 13

Slide 13 text

Jobs and instances • Jobs: The configured job name that the target belongs to. • Instance: The : part of the target's URL that was scraped.

Slide 14

Slide 14 text

Third-party exporters There are a number of libraries and servers which help in exporting existing metrics from third-party systems as Prometheus metrics. https://prometheus.io/docs/instrumenting/exporters/

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

sum by(path) rate(http_requests_total{status="500"}[5m])) / sum by(path) rate(http_requests_total[5m])) {path=“/status"} {path="/"} {path="/api/v1/topics/:topic"} {path="/api/v1/topics} Current percentage of HTTP errors across all service instances? 0.0039 0.0011 0.087 0.0342

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Prometheus UI

Slide 19

Slide 19 text

http://172.22.145.40:3000/?orgId=1 Grafana Dashboard

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

Is any disk about to run full within 4 hours? ALERT DiskWillFillIn4Hours IF predict_linear(node_filesystem_free[1h], 4*3600) < 0

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

AlertManager It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or OpsGenie.

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

Storage(Local) Prometheus includes a local on-disk time series database.

Slide 26

Slide 26 text

Storage(Remote) but also optionally integrates with remote storage systems(v1.8).

Slide 27

Slide 27 text

Remote Storage HA

Slide 28

Slide 28 text

Management • Prometheus Operator: Manages Prometheus on top of Kubernetes. • Promgen: Web UI and configuration generator for Prometheus and Alertmanager.

Slide 29

Slide 29 text

Prometheus NOT do • Raw log / Event collection • Durable long-term storage (can use remote storage) • Automatic horizontal scaling

Slide 30

Slide 30 text

Thank you for your attention!! Q & A