Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Everything You Never Knew You Wanted to Ask about Time Series Databases

Everything You Never Knew You Wanted to Ask about Time Series Databases

This talk covers the Graphite ecosystem. It is intended to serve as an introduction to core concepts. I also highlight some gotchas I see people run into when they start using time series databases.

Brad Lhotsky

October 17, 2015

More Decks by Brad Lhotsky

Other Decks in Technology


  1. Everything you never knew you wanted to ask about Time

    Series Data Presented by: Brad Lhotsky
  2. • Pros • Easy to send metrics • Support for

    “Metrics 2.0” • SQL-ish interface to the data •Cons • Read scalability is lacking • Still quite young, good things to come here! InfluxDB
  3. • Pros • Easy to send metrics • Support for

    “Metrics 2.0” • Hbase backend • No “roll up” all points stored for eternity! •Cons • Read scalability is lacking • Hbase backend? OpenTSDB
  4. Measurements at fixed regular intervals, impossible to have two values

    for the same metric at the same point in time. Graphite’s rules are that the last value for an interval wins. Time Series Data
  5. Graphite does not care, it just stores a value at

    a point in time. It’s up to you to store what you want and understand how to retrieve it. Gauge v. Counter
  6. Gauges usually fit within a fixed range, but only represent

    state at the time of reading, meaning you can miss spikes. Counters allow more complete history, but can overflow. Use nonNegativeDerivative() to view the changes between points. Gauge v. Counter
  7. • Dot separated namespaces: • sys.datacenter.zone.host.class.metric • Created automatically the

    first time it’s updated • All storage pre-allocated • Multiple storage engines • Whisper (Flat Files) • Ceres • Cyanite (based on Cassandra) Metrics
  8. • Ask for a metric • sys.datacenter.zone.host.class.metric • Ask for

    all the metrics • sys.datacenter.zone.*.class.metric • Ask for a combination, mutation, or selection • sumSeries(sys.datacenter.zone.*.class .metric) • Returns PNG, SVG, JSON, CSV, … Queries / API
  9. • carbon • Route and store metrics • whisper •

    Storage file format and utilities • graphite-web • User-facing interface to Graphite Components
  10. • Cluster using relays • Use SSD’s for fast writes

    • Use redundancy because SSD’s fail • Read Jason Dixon’s book With Knowledge, Well.
  11. • https://github.com/grobian/carbon-c-relay • Pass and route metrics to storage •

    https://github.com/dgryski/carbonzipper • Map/Reduce metric queries • https://github.com/dgryski/carbonapi • Intelligent caching layer for JSON/CSV/Raw outputs With Help, WebScale 2.0!!
  12. Every metric is a flat file on disk that’s pre-

    allocated at creation time to a fixed size. Size is based on the the defined retention periods, which we’ll discuss shortly. Whisper Files
  13. Time series databases allow for prolonged storage. It’s common for

    metrics to remain for two or more years. To cut costs, aggregations are performed as the data ages. Data Compression
  14. Backend configuration for allocating on disk storage for metrics. Can

    only be set at metric creation. Define retentions. Storage: Schema [mysql] pattern = ^mysql\. retentions = 10s:2d,60s:14d,30m:2y [default] pattern = .* retentions = 60s:14d,30m:2y
  15. Configuration for handling how metrics traverse the retention boundaries. Storage:

    AGGREGATIONS [default] pattern = .* xFilesFactor = 0.5 aggregationMethod = average
  16. Float between 0 and 1 representing the percentage of non-null

    points required to roll up to a non-null value. xFilesFactor [default] pattern = .* xFilesFactor = 0.5 aggregationMethod = average
  17. Functions available to turn multiple values into a single value

    for retention roll ups. Aggregators ‣ average - Average all values ‣ min - minimum of set ‣ max - maximum of set ‣ sum - sum of set ‣ last - take the last value
  18. Getting your data into Graphite is as simple as sending

    the metric string to the relevant carbon host and port! Sending Data echo “metric.name.as.dotted.path value epoch” \ nc graphite 2003
  19. There are a lot of libraries that encapsulate most to

    all of this incredibly complicated task for every web-scale programming language.
  20. # Same as last slide & set Y-Minimum to 0

    color(alias( secondYAxis( maxSeries(general.es.logsearch-20*.indices.docs.count) ), "Max Docs per Node"),"green") Advanced Tricks