InfluxDB - a distributed events and time series database

Slide 1

Slide 1 text

InfluxDB - a distributed time series, metrics, and events database Paul Dix paul@influxdb.com @pauldix @influxdb

Slide 2

Slide 2 text

YC (W13), 3 people full time: Todd Persen John Shahid Paul Dix (me)

Slide 3

Slide 3 text

What it’s for…

Slide 4

Slide 4 text

Metrics

Slide 5

Slide 5 text

Time Series

Slide 6

Slide 6 text

Analytics

Slide 7

Slide 7 text

Events

Slide 8

Slide 8 text

Can’t you just use a regular DB?

Slide 9

Slide 9 text

order by time?

Slide 10

Slide 10 text

Doesn’t Scale

Slide 11

Slide 11 text

Example from metrics: ! 100 measurements per host * 10 hosts * 8640 per day (once every 10s) * 365 days ! = 3,153,600,000 records per year

Slide 12

Slide 12 text

Have fun with that table…

Slide 13

Slide 13 text

But wait, we’ll just keep the summaries!

Slide 14

Slide 14 text

1h averages = ! 8,760,000 per year

Slide 15

Slide 15 text

Lose Detail and AdHoc Queryability

Slide 16

Slide 16 text

So let’s use Cassandra, HBase, or Scaleasaurus!

Slide 17

Slide 17 text

Too much application code and complexity

Slide 18

Slide 18 text

Application logic and scripts to compute summaries

Slide 19

Slide 19 text

Application level logic for balancing

Slide 20

Slide 20 text

No data locality for AdHoc queries

Slide 21

Slide 21 text

And then there’s more…

Slide 22

Slide 22 text

Web services

Slide 23

Slide 23 text

Libraries for web services

Slide 24

Slide 24 text

Data collection

Slide 25

Slide 25 text

Visualization

Slide 26

Slide 26 text

–Paul Dix “Building an application with an analytics component today is like building a web application in 1998. You spend months building infrastructure before getting to the actual thing you want to build.”

Slide 27

Slide 27 text

Analytics should be about analyzing and interpreting data, not the infrastructure to store and process it.

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

HTTP API Web services built in

Slide 30

Slide 30 text

HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'

Slide 31

Slide 31 text

Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value", "host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]

Slide 32

Slide 32 text

HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'

Slide 33

Slide 33 text

SQL-ish select * from events where time > now() - 1h

Slide 34

Slide 34 text

SQL-ish select * from “series with weird chars ()*@#0982#$” where time > now() - 1h

Slide 35

Slide 35 text

Where Regex select line from application_logs where line =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Slide 36

Slide 36 text

Only scans the time range Series and time are the primary index

Slide 37

Slide 37 text

Work with many series…

Slide 38

Slide 38 text

Select from Regex select * from /stats\.cpu\..*/ limit 1

Slide 39

Slide 39 text

Downsampling on the ﬂy…

Slide 40

Slide 40 text

Aggregates select percentile(90, value) from response_times group by time(10m) where time > now() - 1d

Slide 41

Slide 41 text

Continuous Downsampling…

Slide 42

Slide 42 text

Continuous queries (summaries) select count(page_id) from events group by time(1h), page_id into events.[page_id]

Slide 43

Slide 43 text

Series per page id select count from events.67 where time > now() - 7d

Slide 44

Slide 44 text

Continuous queries (regex downsampling) select percentile(value, 90) as value from /stats\.*/ group by time(5m) into percentile.90.:series_name

Slide 45

Slide 45 text

Percentile series per host select value from percentile.90.stats.cpu.host1 where time > now() - 4h

Slide 46

Slide 46 text

Denormalization for performance

Slide 47

Slide 47 text

Range scans all user events for last hour select * from events where user_id = 3 and time > now() - 1h

Slide 48

Slide 48 text

Continuous queries (fan out) select * from events into events.[user_id]

Slide 49

Slide 49 text

Series per user id select * from events.3 where time > now() - 1h

Slide 50

Slide 50 text

Distributed Scale out, data locality, high availability

Slide 51

Slide 51 text

Raft for metadata We owe Ben Johnson a beer or three…

Slide 52

Slide 52 text

Protobuf + TCP for queries, writes

Slide 53

Slide 53 text

Scalable Have billions of points in 1 series* or a million different series

Slide 54

Slide 54 text

Libraries Go, Ruby, Javascript, Python, Node.js, Clojure, Java, Perl, Haskell, R, Scala, CLI (ruby and node)

Slide 55

Slide 55 text

Visualization

Slide 56

Slide 56 text

Built-in UI

Slide 57

Slide 57 text

Grafana

Slide 58

Slide 58 text

Javascript library + D3, HighCharts, Rickshaw, NVD3, etc. Deﬁnitely more to do here!

Slide 59

Slide 59 text

Data Collection CollectD Proxy, StatsD backend, Carbon ingestion, OpenTSDB (soon)

Slide 60

Slide 60 text

Coming Soon

Slide 61

Slide 61 text

ugh, Documentation

Slide 62

Slide 62 text

Series Metadata

Slide 63

Slide 63 text

Binary Protocol

Slide 64

Slide 64 text

Pubsub select * from some_series where host = “serverA” into subscription() select percentile(90, value) from some_series group by time(1m) into subscription()