Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prometheus Introduction

Prometheus Introduction

Kyle Bai

June 20, 2018
Tweet

More Decks by Kyle Bai

Other Decks in Technology

Transcript

  1. What is Prometheus? • Timeseries Database • Pull based •

    A multi-dimensional data model • Alert System (with AlertManager) • Metrics Collection & Storage • Metrics, not logging, not tracing • Dashboarding / Graphing / Trending • Focus on OS monitoring • A flexible query language(PromQL) to leverage this dimensionality
  2. Components The Prometheus ecosystem consists of multiple components, many of

    which are optional: • the main Prometheus server which scrapes and stores time series data • client libraries for instrumenting application code. • a push gateway for supporting short-lived jobs. • special-purpose exporters for services like HAProxy, StatsD, Graphite, etc. • an alertmanager to handle alerts.
  3. Data Model (Label) # HELP http_requests_total Total number of HTTP

    requests made. # TYPE http_requests_total counter http_requests_total{code="200",path="/status"} 8 Label Value Metric Name Metric regexp: [a-zA-Z_:][a-zA-Z0-9_:]*
  4. Metric types • Counter: A counter is a cumulative metric

    that represents a single monotonically increasing counter. • Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. • Histogram: A histogram samples observations and counts them in configurable buckets. • Summary: Similar to a histogram, a summary samples observations.
  5. Jobs and instances • Jobs: The configured job name that

    the target belongs to. • Instance: The <host>:<port> part of the target's URL that was scraped.
  6. Third-party exporters There are a number of libraries and servers

    which help in exporting existing metrics from third-party systems as Prometheus metrics. https://prometheus.io/docs/instrumenting/exporters/
  7. sum by(path) rate(http_requests_total{status="500"}[5m])) / sum by(path) rate(http_requests_total[5m])) {path=“/status"} {path="/"} {path="/api/v1/topics/:topic"}

    {path="/api/v1/topics} Current percentage of HTTP errors across all service instances? 0.0039 0.0011 0.087 0.0342
  8. Is any disk about to run full within 4 hours?

    ALERT DiskWillFillIn4Hours IF predict_linear(node_filesystem_free[1h], 4*3600) < 0
  9. AlertManager It takes care of deduplicating, grouping, and routing them

    to the correct receiver integration such as email, PagerDuty, or OpsGenie.
  10. Management • Prometheus Operator: Manages Prometheus on top of Kubernetes.

    • Promgen: Web UI and configuration generator for Prometheus and Alertmanager.
  11. Prometheus NOT do • Raw log / Event collection •

    Durable long-term storage (can use remote storage) • Automatic horizontal scaling