(as Graphite knows it)
• Divided into evenly-spaced “buckets”
• (end - start) / step == len(values)
• Can be consolidated into another time series with larger buckets
Time Series
Slide 6
Slide 6 text
Time Series
(another way)
Pairs of (,
)
1364460000 2.0
1364460060 1.2
1364460120 3.6
1364460180 4.7
start=1364460000,
end=1264460180,
step=60
Slide 7
Slide 7 text
Whisper
• Graphite’s disk format for time series data
• Round-robin
• Fixed size
• Newer points overwrite older ones
• No concept of type: Everything’s a float (or None)
• Can contain multiple “Archives” with different
precisions
Slide 8
Slide 8 text
Whisper Archives
• Defined by precision and retention
• e.g. minutely data for a year
• Data exists from “now” until max retention
• Composed of ()
pairs
Slide 9
Slide 9 text
10s precision
10s:60d, 60s:180d, 300s:360d
60s precision
300s precision
NOW 60 days ago 180 days ago 360 days ago
Archive 0
Archive 1
Archive 2
Slide 10
Slide 10 text
Everything written to Whisper happens in the
update() operation.
When a point is written to a Whisper file, every
archive is updated*
* well, sometimes
Slide 11
Slide 11 text
xFilesFactor
• Idea and name comes from RRD
• Default value: 0.5
The ratio of datapoints present for rollup to occur
1 2 2 4 2 1 1 _ _ 3 _ _
2
2
2
2
2
2 _
_
_
_
_
_
Slide 12
Slide 12 text
More archives => More I/O per write
...So use them wisely
Slide 13
Slide 13 text
When choosing storage schemas,
consider:
• How long you can wait between graph updates
• How long your data is useful for
Slide 14
Slide 14 text
And balance that with:
• How much disk space you have
• At what point lower precisions stop being useful
Slide 15
Slide 15 text
If you can afford the space, stick with a single retention
1 year of minutely data: 6mb per metric
Slide 16
Slide 16 text
If you choose to use multiple archives:
• Don’t go overboard. Avoid this:
• Keep xFilesFactor at 0.5 or higher to avoid excess
I/O unless the data is expected to be sparse
Slide 17
Slide 17 text
Also note:
Whisper will only return data from a single archive
during a fetch (remember: evenly spaced data).
Whisper will choose the highest-precision archive that
covers the time period
Slide 18
Slide 18 text
Consolidation/Aggregation
Why would we throw away data on purpose?
• To coerce our data into buckets
• To save on storage space
• To fit a lot of data onto a graph
Slide 19
Slide 19 text
Averaging
Why is averaging a sane default?
• Fine for trending
• Works well with most data types
• Can calculate aggregate sum if number of samples known
Slide 20
Slide 20 text
Aggregation types
• Average
• Sum
• Min
• Max
• Latest
Slide 21
Slide 21 text
Average
• Latency measurements
• Gauges
• Rates
• Min/max/percentile histograms (e.g. from statsd)
Slide 22
Slide 22 text
Sum
• Raw counts
• Derived counters
Slide 23
Slide 23 text
• Min/max histograms (e.g. from statsd)
Min/Max
Slide 24
Slide 24 text
Latest
• Raw counters (e.g. interface packet count)
Slide 25
Slide 25 text
storage-aggregation.conf
Slide 26
Slide 26 text
Per-second rates
• Store these as averages
• Multiply by precision at display time for
per-bucket rate
• e.g. for minutely stats, use scale(,
60)
Some tools store counts as per-second rates
statsd, collectd (StoreRates true)
Slide 27
Slide 27 text
Render-time Consolidation
Why? There are only so many pixels
Vs:
Slide 28
Slide 28 text
Render-time Consolidation
But look at this spike to 17.5k
Why isn’t it still 17.5k when zoomed out?
Slide 29
Slide 29 text
It’s a count so maybe we consolidate by sum
Perhaps max?
consolidateBy()
Slide 30
Slide 30 text
Or instead, control your granularity directly
summarize()