Boston 2013 - Graphite Workshop - Michael Leinartas

Graphite Workshop michael leinartas

Who am I? (@mleinart)

RTFC (read the freakin code)

Graphite Time Series Data

(as Graphite knows it) • Divided into evenly-spaced “buckets” •
(end - start) / step == len(values) • Can be consolidated into another time series with larger buckets Time Series

Time Series (another way) Pairs of (<timestamp>, <point>) 1364460000 2.0
1364460060 1.2 1364460120 3.6 1364460180 4.7 start=1364460000, end=1264460180, step=60

Whisper • Graphite’s disk format for time series data •
Round-robin • Fixed size • Newer points overwrite older ones • No concept of type: Everything’s a ﬂoat (or None) • Can contain multiple “Archives” with different precisions

Whisper Archives • Deﬁned by precision and retention • e.g.
minutely data for a year • Data exists from “now” until max retention • Composed of (<timestamp, <value>) pairs

10s precision 10s:60d, 60s:180d, 300s:360d 60s precision 300s precision NOW
60 days ago 180 days ago 360 days ago Archive 0 Archive 1 Archive 2

Everything written to Whisper happens in the update() operation. When
a point is written to a Whisper ﬁle, every archive is updated* * well, sometimes

xFilesFactor • Idea and name comes from RRD • Default
value: 0.5 The ratio of datapoints present for rollup to occur 1 2 2 4 2 1 1 _ _ 3 _ _ 2 2 2 2 2 2 _ _ _ _ _ _

More archives => More I/O per write ...So use them
wisely

When choosing storage schemas, consider: • How long you can
wait between graph updates • How long your data is useful for

And balance that with: • How much disk space you
have • At what point lower precisions stop being useful

If you can afford the space, stick with a single
retention 1 year of minutely data: 6mb per metric

If you choose to use multiple archives: • Don’t go
overboard. Avoid this: • Keep xFilesFactor at 0.5 or higher to avoid excess I/O unless the data is expected to be sparse

Also note: Whisper will only return data from a single
archive during a fetch (remember: evenly spaced data). Whisper will choose the highest-precision archive that covers the time period

Consolidation/Aggregation Why would we throw away data on purpose? •
To coerce our data into buckets • To save on storage space • To ﬁt a lot of data onto a graph

Averaging Why is averaging a sane default? • Fine for
trending • Works well with most data types • Can calculate aggregate sum if number of samples known

Aggregation types • Average • Sum • Min • Max
• Latest

Average • Latency measurements • Gauges • Rates • Min/max/percentile
histograms (e.g. from statsd)

Sum • Raw counts • Derived counters

• Min/max histograms (e.g. from statsd) Min/Max

Latest • Raw counters (e.g. interface packet count)

storage-aggregation.conf

Per-second rates • Store these as averages • Multiply by
precision at display time for per-bucket rate • e.g. for minutely stats, use scale(<metric>, 60) Some tools store counts as per-second rates statsd, collectd (StoreRates true)

Render-time Consolidation Why? There are only so many pixels Vs:

Render-time Consolidation But look at this spike to 17.5k Why
isn’t it still 17.5k when zoomed out?

It’s a count so maybe we consolidate by sum Perhaps
max? consolidateBy()

Or instead, control your granularity directly summarize()

Boston 2013 - Graphite Workshop - Michael Leina...

Boston 2013 - Graphite Workshop - Michael Leinartas

Monitorama

More Decks by Monitorama

Featured

Transcript

Graphite Workshop michael leinartas

Who am I? (@mleinart)

RTFC (read the freakin code)

Graphite Time Series Data

(as Graphite knows it) • Divided into evenly-spaced “buckets” •

Time Series (another way) Pairs of (<timestamp>, <point>) 1364460000 2.0

Whisper • Graphite’s disk format for time series data •

Whisper Archives • Deﬁned by precision and retention • e.g.

10s precision 10s:60d, 60s:180d, 300s:360d 60s precision 300s precision NOW

Everything written to Whisper happens in the update() operation. When

xFilesFactor • Idea and name comes from RRD • Default

More archives => More I/O per write ...So use them

When choosing storage schemas, consider: • How long you can

And balance that with: • How much disk space you

If you can afford the space, stick with a single

If you choose to use multiple archives: • Don’t go

Also note: Whisper will only return data from a single

Consolidation/Aggregation Why would we throw away data on purpose? •

Averaging Why is averaging a sane default? • Fine for

Aggregation types • Average • Sum • Min • Max

Average • Latency measurements • Gauges • Rates • Min/max/percentile

Sum • Raw counts • Derived counters

• Min/max histograms (e.g. from statsd) Min/Max

Latest • Raw counters (e.g. interface packet count)

storage-aggregation.conf

Per-second rates • Store these as averages • Multiply by

Render-time Consolidation Why? There are only so many pixels Vs:

Render-time Consolidation But look at this spike to 17.5k Why

It’s a count so maybe we consolidate by sum Perhaps

Or instead, control your granularity directly summarize()