Slide 1

Slide 1 text

Everything you never knew you wanted to ask about Time Series Data Presented by: Brad Lhotsky

Slide 2

Slide 2 text

You gotta know where I’ve been to know where we’re going.

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

http://oss.oetiker.ch/rrdtool/

Slide 5

Slide 5 text

Reverse Polish Notation!!!

Slide 6

Slide 6 text

Where are we now?

Slide 7

Slide 7 text

Open Source Solutions Graphite OpenTSDB InfluxDB Hosted Solutions Circonus Librato Datadog New Relic

Slide 8

Slide 8 text

What do you use?

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

• Pros • Easy to send metrics • Support for “Metrics 2.0” • SQL-ish interface to the data •Cons • Read scalability is lacking • Still quite young, good things to come here! InfluxDB

Slide 11

Slide 11 text

• Pros • Easy to send metrics • Support for “Metrics 2.0” • Hbase backend • No “roll up” all points stored for eternity! •Cons • Read scalability is lacking • Hbase backend? OpenTSDB

Slide 12

Slide 12 text

YMMV, and that’s O.K.

Slide 13

Slide 13 text

Why do you use Graphite?

Slide 14

Slide 14 text

It’s Open Source.

Slide 15

Slide 15 text

It’s scalable.

Slide 16

Slide 16 text

It’s easy.

Slide 17

Slide 17 text

It’s composable.

Slide 18

Slide 18 text

It’s FUN!

Slide 19

Slide 19 text

What is Time Series Data?

Slide 20

Slide 20 text

Measurements at fixed regular intervals, impossible to have two values for the same metric at the same point in time. Graphite’s rules are that the last value for an interval wins. Time Series Data

Slide 21

Slide 21 text

Graphite does not care, it just stores a value at a point in time. It’s up to you to store what you want and understand how to retrieve it. Gauge v. Counter

Slide 22

Slide 22 text

Gauges usually fit within a fixed range, but only represent state at the time of reading, meaning you can miss spikes. Counters allow more complete history, but can overflow. Use nonNegativeDerivative() to view the changes between points. Gauge v. Counter

Slide 23

Slide 23 text

How does Graphite work?

Slide 24

Slide 24 text

• Dot separated namespaces: • sys.datacenter.zone.host.class.metric • Created automatically the first time it’s updated • All storage pre-allocated • Multiple storage engines • Whisper (Flat Files) • Ceres • Cyanite (based on Cassandra) Metrics

Slide 25

Slide 25 text

• Ask for a metric • sys.datacenter.zone.host.class.metric • Ask for all the metrics • sys.datacenter.zone.*.class.metric • Ask for a combination, mutation, or selection • sumSeries(sys.datacenter.zone.*.class .metric) • Returns PNG, SVG, JSON, CSV, … Queries / API

Slide 26

Slide 26 text

• carbon • Route and store metrics • whisper • Storage file format and utilities • graphite-web • User-facing interface to Graphite Components

Slide 27

Slide 27 text

How does Graphite scale?

Slide 28

Slide 28 text

• Cluster using relays • Use SSD’s for fast writes • Use redundancy because SSD’s fail • Read Jason Dixon’s book With Knowledge, Well.

Slide 29

Slide 29 text

• https://github.com/grobian/carbon-c-relay • Pass and route metrics to storage • https://github.com/dgryski/carbonzipper • Map/Reduce metric queries • https://github.com/dgryski/carbonapi • Intelligent caching layer for JSON/CSV/Raw outputs With Help, WebScale 2.0!!

Slide 30

Slide 30 text

Would you deploy Graphite today?

Slide 31

Slide 31 text

Yes.

Slide 32

Slide 32 text

How does Graphite store data?

Slide 33

Slide 33 text

Every metric is a flat file on disk that’s pre- allocated at creation time to a fixed size. Size is based on the the defined retention periods, which we’ll discuss shortly. Whisper Files

Slide 34

Slide 34 text

The dots in the metric names are directory separators on the file system. Whisper Files

Slide 35

Slide 35 text

Time series databases allow for prolonged storage. It’s common for metrics to remain for two or more years. To cut costs, aggregations are performed as the data ages. Data Compression

Slide 36

Slide 36 text

What do I need to know about roll up?

Slide 37

Slide 37 text

Backend configuration for allocating on disk storage for metrics. Can only be set at metric creation. Define retentions. Storage: Schema [mysql] pattern = ^mysql\. retentions = 10s:2d,60s:14d,30m:2y [default] pattern = .* retentions = 60s:14d,30m:2y

Slide 38

Slide 38 text

Configuration for handling how metrics traverse the retention boundaries. Storage: AGGREGATIONS [default] pattern = .* xFilesFactor = 0.5 aggregationMethod = average

Slide 39

Slide 39 text

Float between 0 and 1 representing the percentage of non-null points required to roll up to a non-null value. xFilesFactor [default] pattern = .* xFilesFactor = 0.5 aggregationMethod = average

Slide 40

Slide 40 text

Functions available to turn multiple values into a single value for retention roll ups. Aggregators ‣ average - Average all values ‣ min - minimum of set ‣ max - maximum of set ‣ sum - sum of set ‣ last - take the last value

Slide 41

Slide 41 text

Storage: AGGREGATIONS [alerts] pattern = ^alerts\. xFilesFactor = 0 aggregationMethod = sum

Slide 42

Slide 42 text

We lose resolution as data rolls up. Rolling Data 1 Minute 5 Minutes 25 Minutes

Slide 43

Slide 43 text

How do I get data in?

Slide 44

Slide 44 text

Getting your data into Graphite is as simple as sending the metric string to the relevant carbon host and port! Sending Data echo “metric.name.as.dotted.path value epoch” \ nc graphite 2003

Slide 45

Slide 45 text

There are a lot of libraries that encapsulate most to all of this incredibly complicated task for every web-scale programming language.

Slide 46

Slide 46 text

Pretty pictures, please?

Slide 47

Slide 47 text

No content

Slide 48

Slide 48 text

http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-2

Slide 49

Slide 49 text

Autocomplete can’t be disabled, use Esc key to close it. Interface

Slide 50

Slide 50 text

security.logging.indexer.*.total Metrics: Wildcards

Slide 51

Slide 51 text

security.logging.indexer.logproc-[12]01.total Metrics: Character Classes

Slide 52

Slide 52 text

security.logging.indexer.logproc-{202,102}.total Metrics: Word Groups

Slide 53

Slide 53 text

aliasByNode(security.logging.indexer.*.total,3) Metrics: Aliases

Slide 54

Slide 54 text

sumSeries(security.logging.indexer.*.total) Combining Metrics

Slide 55

Slide 55 text

sumSeriesWithWildcards(security.logging.indexer.*.*,3) Combining & Grouping Metrics

Slide 56

Slide 56 text

averageSeries(security.logging.indexer.*.total) Combining Metrics

Slide 57

Slide 57 text

averageSeriesWithWildcards( security.logging.indexer.*.{total,ignore}, 3) Combining & Grouping Metrics

Slide 58

Slide 58 text

averageSeriesWithWildcards( security.logging.indexer.*.*,3) Combining Metrics

Slide 59

Slide 59 text

nPercentile(security.logging.indexer.*.total,95) Combining Metrics

Slide 60

Slide 60 text

percentileOfSeries(security.logging.indexer.*.total,95) Combining Metrics

Slide 61

Slide 61 text

mostDeviant(2,security.logging.indexer.*.total) Selecting Metrics

Slide 62

Slide 62 text

highestCurrent(security.logging.indexer.*.total,2) Selecting Metrics

Slide 63

Slide 63 text

highestAverage(security.logging.indexer.*.total,2) Selecting Metrics

Slide 64

Slide 64 text

general.es.logsearch-208.jvm.gc.collectors.old.collection_ms Transforming Metrics general.es.logsearch-201.indices.indexing.index_ms

Slide 65

Slide 65 text

general.es.logsearch-208.jvm.gc.collectors.old.collection_ms Side Bar: Y-Minimun as 0

Slide 66

Slide 66 text

nonNegativeDerivative( general.es.logsearch-208.jvm.gc.collectors.old.collection_ms) Transforming Metrics nonNegativeDerivative( general.es.logsearch-201.indices.indexing.index_ms)

Slide 67

Slide 67 text

removeAbovePercentile( nonNegativeDerivative( general.es.logsearch-201.indices.indexing.index_ms ), 95) Selecting Data

Slide 68

Slide 68 text

scaleToSeconds( removeAbovePercentile( nonNegativeDerivative( general.es.logsearch-201.indices.indexing.index_ms ), 95), 1) Per Second

Slide 69

Slide 69 text

nonNegativeDerivative( general.es.logsearch-208.jvm.gc.collectors.old.collection_ms) Transforming Metrics

Slide 70

Slide 70 text

offset( nonNegativeDerivative( general.es.logsearch-208.jvm.gc.collectors.old.collection_ms ), -30) Transforming Metrics

Slide 71

Slide 71 text

drawAsInfinite( offset( nonNegativeDerivative( general.es.logsearch-208.jvm.gc.collectors.old.collection_ms ), -30) ) Transforming Metrics

Slide 72

Slide 72 text

alias(sumSeries(security.logging.indexer.*.total),”Today") alias( timeShift( sumSeries(security.logging.indexer.*.total), “7d"), "Last Week") Comparing Metrics

Slide 73

Slide 73 text

color(constantLine(0),"red") diffSeries( sumSeries(security.logging.indexer.*.total), timeShift(sumSeries(security.logging.indexer.*.total),”7d") ) Comparing Metrics

Slide 74

Slide 74 text

alias(alpha(color(areaBetween( holtWintersConfidenceBands( maxSeries(general.es.logsearch-20*.jvm.mem.heap_used_bytes) ) ),“gray"),0.1),"Hot Winter Confidence Bands”) color(alias( maxSeries(general.es.logsearch-20*.jvm.mem.heap_used_bytes), "Max Heap Size"),"red") Advanced Tricks

Slide 75

Slide 75 text

# Same as last slide & set Y-Minimum to 0 color(alias( secondYAxis( maxSeries(general.es.logsearch-20*.indices.docs.count) ), "Max Docs per Node"),"green") Advanced Tricks

Slide 76

Slide 76 text

drawAsInfinite( removeBelowValue( offset( nonNegativeDerivative( general.es.logsearch-*.jvm.gc.collectors.old.collection_ms ), -250), 0) ) Advanced Tricks

Slide 77

Slide 77 text

Do you even Dashboard?

Slide 78

Slide 78 text

Grafana http://grafana.org/

Slide 79

Slide 79 text

GraphExplorer https://vimeo.github.io/graph-explorer/

Slide 80

Slide 80 text

Cubism https://square.github.io/cubism/

Slide 81

Slide 81 text

Rubics Cubism https://github.com/reyjrar/rubics-cubism

Slide 82

Slide 82 text

Thank you! [email protected] https://twitter.com/reyjrar https://github.com/reyjrar https://speakerdeck.com/reyjrar