Boston 2013 - Graphite Workshop - Michael Leinartas

Slide 1

Slide 1 text

Graphite Workshop michael leinartas

Slide 2

Slide 2 text

Who am I? (@mleinart)

Slide 3

Slide 3 text

RTFC (read the freakin code)

Slide 4

Slide 4 text

Graphite Time Series Data

Slide 5

Slide 5 text

(as Graphite knows it) • Divided into evenly-spaced “buckets” • (end - start) / step == len(values) • Can be consolidated into another time series with larger buckets Time Series

Slide 6

Slide 6 text

Time Series (another way) Pairs of (, ) 1364460000 2.0 1364460060 1.2 1364460120 3.6 1364460180 4.7 start=1364460000, end=1264460180, step=60

Slide 7

Slide 7 text

Whisper • Graphite’s disk format for time series data • Round-robin • Fixed size • Newer points overwrite older ones • No concept of type: Everything’s a ﬂoat (or None) • Can contain multiple “Archives” with different precisions

Slide 8

Slide 8 text

Whisper Archives • Deﬁned by precision and retention • e.g. minutely data for a year • Data exists from “now” until max retention • Composed of () pairs

Slide 9

Slide 9 text

10s precision 10s:60d, 60s:180d, 300s:360d 60s precision 300s precision NOW 60 days ago 180 days ago 360 days ago Archive 0 Archive 1 Archive 2

Slide 10

Slide 10 text

Everything written to Whisper happens in the update() operation. When a point is written to a Whisper ﬁle, every archive is updated* * well, sometimes

Slide 11

Slide 11 text

xFilesFactor • Idea and name comes from RRD • Default value: 0.5 The ratio of datapoints present for rollup to occur 1 2 2 4 2 1 1 _ _ 3 _ _ 2 2 2 2 2 2 _ _ _ _ _ _

Slide 12

Slide 12 text

More archives => More I/O per write ...So use them wisely

Slide 13

Slide 13 text

When choosing storage schemas, consider: • How long you can wait between graph updates • How long your data is useful for

Slide 14

Slide 14 text

And balance that with: • How much disk space you have • At what point lower precisions stop being useful

Slide 15

Slide 15 text

If you can afford the space, stick with a single retention 1 year of minutely data: 6mb per metric

Slide 16

Slide 16 text

If you choose to use multiple archives: • Don’t go overboard. Avoid this: • Keep xFilesFactor at 0.5 or higher to avoid excess I/O unless the data is expected to be sparse

Slide 17

Slide 17 text

Also note: Whisper will only return data from a single archive during a fetch (remember: evenly spaced data). Whisper will choose the highest-precision archive that covers the time period

Slide 18

Slide 18 text

Consolidation/Aggregation Why would we throw away data on purpose? • To coerce our data into buckets • To save on storage space • To ﬁt a lot of data onto a graph

Slide 19

Slide 19 text

Averaging Why is averaging a sane default? • Fine for trending • Works well with most data types • Can calculate aggregate sum if number of samples known

Slide 20

Slide 20 text

Aggregation types • Average • Sum • Min • Max • Latest

Slide 21

Slide 21 text

Average • Latency measurements • Gauges • Rates • Min/max/percentile histograms (e.g. from statsd)

Slide 22

Slide 22 text

Sum • Raw counts • Derived counters

Slide 23

Slide 23 text

• Min/max histograms (e.g. from statsd) Min/Max

Slide 24

Slide 24 text

Latest • Raw counters (e.g. interface packet count)

Slide 25

Slide 25 text

storage-aggregation.conf

Slide 26

Slide 26 text

Per-second rates • Store these as averages • Multiply by precision at display time for per-bucket rate • e.g. for minutely stats, use scale(, 60) Some tools store counts as per-second rates statsd, collectd (StoreRates true)

Slide 27

Slide 27 text

Render-time Consolidation Why? There are only so many pixels Vs:

Slide 28

Slide 28 text

Render-time Consolidation But look at this spike to 17.5k Why isn’t it still 17.5k when zoomed out?