Prometheus - A Whirlwind Tour

Prometheus A Whirlwind Tour Cindy Sridharan Oscon 2017 Austin, Texas

@copyconstruct @copyconstruct @copyconstruct

The Future?

OBSERVABILITY > TESTING

Things testing cannot detect

elasticity of the production environment

unpredictability of inputs

the vagaries of upstream and downstream dependencies

Cloud native architectures need best in class observability

We cannot understand software unless we observe it

Debugging must be viewed as the process by which systems
are understood and improved, not merely as the process by which bugs are made to go away! - Bryan Cantrill

OBSERVABILITY must also be viewed as the process by which
systems are understood and improved, not merely as the process by which bugs are made to go away!

OBSERVABILITY cannot be an afterthought

Instrumentation should be a requirement for a PR to be
merged

OBSERVABILITY needs to be a part of system design and
development

But … what even is “observability” ?

There are three pillars that make up a modern Observability
stack

Logging Tracing Metrics

All three are examples of whitebox “monitoring”

WHITEBOX Observability data gathered from the internals of the target
system Is capable of providing warning about a problem before it occurs BLACKBOX Observes external functionality as observed by an end user of the system Helps detect when a problem is ongoing and contributing to external symptoms

Blackbox methods test your Service Level Objectives

Whitebox methods monitor your Service Level Agreements

Different systems have different blackbox monitoring and whitebox instrumentation requirements
given their agreed upon SLO and SLA

Where does Prometheus fit in here?

Prometheus

Whitebox monitoring toolkit and a TSDB for metrics

Monitoring Toolkit

Client Instrumentation Metrics Ingestion Metrics Processing and Storage Querying and
Visualization Analysis Alerting

Client instrumentation

What even is a “metric”?

A set of numbers that give information about a particular
process or activity

Metrics are usually measured over intervals of time — in other words,
a time series

What metrics to collect?

The Four Golden Signals Proposed by the SRE book

Latency Traffic Errors Saturation Proposed by the SRE book

USE method by Brendan Gregg

Utilization average time the resource is busy servicing work Saturation
degree to which resource has extra work which it can't service, often queued Errors count of error events B R E N D A N G R E G G

RED method by Tom Wilkie

How busy is my service? R equest rate Are there
any errors in my service E rror rate What is the latency in my service D uration of requests T O M W I L K I E

Prometheus has stateful client libraries in all major languages

Server is agnostic to the type of metric

The Prometheus client libraries support four types of metrics

Counters Gauges Histogram Summary

“Target” discovery happens via service discovery

Metrics ingestion

Pull over HTTP

Does Pull scale?

Prometheus isn’t an event based system or Nagios that spawns
a subprocess while “pulling”

Pull lowers risk of DDoSing your monitoring system

Pull based systems monitor if a service is down (if
a scrape fails) as a part of gathering metrics

With statsd type of systems, the application sends a UDP
message for every event it observes

Monitoring traffic increases proportionally to user traffic or whatever traffic
is generating monitoring data

Prometheus clients aggregate metrics in memory which is scraped by
the Prometheus server upon regular intervals

If you want to push, there’s a PUSHGATEWAY for short
lived jobs

EXPORTERS

Exporters help in exporting existing metrics from third-party systems as
Prometheus metrics.

JMX SNMP HAProxy MySQL Blackbox cAdvisor (Node) system metrics

S T O R A G E

Single node, no clustering

For HA, run 2 identical Prometheus servers

In Prometheus, a time series has an ID and a
sample

An ID is a combination of both the metric name
and the labels associated

A sample is a combination of a millisecond precision timestamp
and a float64 value

Requirements of *any* TSDB? Effective queries Effective writes

Write optimized Requires parallel queries and aggregation for diverse query
patterns during read time

Write pattern is horizontal A TSDB ingests potentially several time
series from every target at specific intervals of time

Reads are random We read not entire rows or columns
but sparse matrices

Read optimized Write data in such a way that it
is closely aligned for reads

The time series are stored in a one file per
time series format on disk

Incoming time series are stored in chunks in memory Chunks
are flushed to disk when they are full

Incomplete chunks are checkpointed to disk so as to be
able to recover after a crash

All data required to evaluate a PromQL expression needs to
be in memory This data is also cached aggressively for future queries.

Prometheus supports two types of rules which may be configured
and then evaluated at regular intervals - Recording rules and Alerting rules.

Same chunk eviction policy applies while evaluating for Alerting and
Recording Rules

RECORDING RULES Recording rules allow you to precompute frequently needed
or computationally expensive expressions and save their result as a new set of time series

RECORDING RULES Querying the precomputed result will then often be
much faster than executing the original expression every time it is needed

RECORDING RULES Come in handy while creating dashboards where the
same expression is evaluated every time a dashboard is refreshed

ALERTING RULES Allow defining alert conditions based on PromQL expressions
and to send notifications about firing alerts to an external service.

Drawbacks of V2 storage

Single file per time series

High resource utilization because of time-series churn

Checkpointing to disk can be longer than acceptable

Deletion of stale time-series is prohibitively expensive

SQOF a ka Single Query of Failure

FEDERATION

Federation allows a Prometheus server to scrape selected time series
from another Prometheus server

CROSS-SERVICE FEDERATION

A Prometheus server of one service is configured to scrape
selected data from another service's Prometheus server to enable alerting and queries against both datasets within a single server

HIERARCHICAL FEDERATION

The federation topology resembles a tree, with higher level Prometheus
servers collecting aggregated time series data from a larger number of subordinated servers

REMOTE STORAGE

Weave Cortex (DynamoDB + S3) Chronix (Solr) Vulcan (Kafka +
Cassandra)

VISUALIZATION

ANALYSIS

PromQL one of the defining features of Prometheus

Labels > Hierarchy

stats . timers . accounts . ios . http .
post . authenticate . response_time . upper_95

{ resource=accounts, method=post, protocol=http, user_agent=ios, endpoint=/authenticate, name=response_time, }

Better exploration because of dimensional queries

PromQL rate(api_http_requests_total [5m] ) SQL SELECT job, instance, method, status,
path, rate(value, 5m) FROM api_http_requests_total

ALERTING

No automatic anomaly detection

ALERT <alert name> IF <expression> [ FOR <duration> ] [
LABELS <label set> ] [ ANNOTATIONS <label set> ]

ALERT ConsulRaftPeersLow IF consul_raft_peers < 5 FOR 1m LABELS {severity="page”,
team=“infra”} ANNOTATIONS {description="consul raft peer count low: {{$value}}", summary="consul raft peer count low: {{$value}}"}

ALERT QueueCritical IF sum (broker_q{svc_pref="prod"}) > 5000 FOR 10m LABELS
{severity="page", team=”product"} ANNOTATIONS {description="service: {{$labels.service}} instance: {{$labels.instance}} queue length: {{$value}} for too long", summary="service: {{$labels.service}} instance: {{$labels.instance}} queue length: {{$value}} for too long"}

ALERTMANAGER

Deduplication Grouping Routing Suppression of Alerts

CASE STUDY

24 employees 8 engineers

Requirements for a monitoring system?

Ease of Use

Ease of Operation

Cost Effective!

Cost Effective “at scale”

Scale?

imgix Our last outage when we were both shedding load
and serving up errors

CONCLUSION

Our stack is C, Lua, Go, Python

Fantastic official Go and Python clients

Custom LuaJIT client for counters, gauges and histograms

Single statically linked Go binary

No clustering No dependency on Zookeeper et al.

~2 years of Prometheus use in production

Only “cost” has been SSD upgrades on boxes

Let’s not answer that last question!

Thank You! @copyconstruct

Prometheus - A Whirlwind Tour

Prometheus - A Whirlwind Tour

More Decks by Cindy Sridharan

Other Decks in Technology

Featured

Transcript