Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Prometheus Introduction

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Prometheus Introduction

Avatar for Kyle Bai

Kyle Bai

June 20, 2018
Tweet

More Decks by Kyle Bai

Other Decks in Technology

Transcript

  1. What is Prometheus? • Timeseries Database • Pull based •

    A multi-dimensional data model • Alert System (with AlertManager) • Metrics Collection & Storage • Metrics, not logging, not tracing • Dashboarding / Graphing / Trending • Focus on OS monitoring • A flexible query language(PromQL) to leverage this dimensionality
  2. Components The Prometheus ecosystem consists of multiple components, many of

    which are optional: • the main Prometheus server which scrapes and stores time series data • client libraries for instrumenting application code. • a push gateway for supporting short-lived jobs. • special-purpose exporters for services like HAProxy, StatsD, Graphite, etc. • an alertmanager to handle alerts.
  3. Data Model (Label) # HELP http_requests_total Total number of HTTP

    requests made. # TYPE http_requests_total counter http_requests_total{code="200",path="/status"} 8 Label Value Metric Name Metric regexp: [a-zA-Z_:][a-zA-Z0-9_:]*
  4. Metric types • Counter: A counter is a cumulative metric

    that represents a single monotonically increasing counter. • Gauge: A gauge is a metric that represents a single numerical value that can arbitrarily go up and down. • Histogram: A histogram samples observations and counts them in configurable buckets. • Summary: Similar to a histogram, a summary samples observations.
  5. Jobs and instances • Jobs: The configured job name that

    the target belongs to. • Instance: The <host>:<port> part of the target's URL that was scraped.
  6. Third-party exporters There are a number of libraries and servers

    which help in exporting existing metrics from third-party systems as Prometheus metrics. https://prometheus.io/docs/instrumenting/exporters/
  7. sum by(path) rate(http_requests_total{status="500"}[5m])) / sum by(path) rate(http_requests_total[5m])) {path=“/status"} {path="/"} {path="/api/v1/topics/:topic"}

    {path="/api/v1/topics} Current percentage of HTTP errors across all service instances? 0.0039 0.0011 0.087 0.0342
  8. Is any disk about to run full within 4 hours?

    ALERT DiskWillFillIn4Hours IF predict_linear(node_filesystem_free[1h], 4*3600) < 0
  9. AlertManager It takes care of deduplicating, grouping, and routing them

    to the correct receiver integration such as email, PagerDuty, or OpsGenie.
  10. Management • Prometheus Operator: Manages Prometheus on top of Kubernetes.

    • Promgen: Web UI and configuration generator for Prometheus and Alertmanager.
  11. Prometheus NOT do • Raw log / Event collection •

    Durable long-term storage (can use remote storage) • Automatic horizontal scaling