InfluxDB - at NoVA MAMaL

Slide 1

Slide 1 text

InfluxDB - an open source distributed time series, metrics, and events database Paul Dix paul@influxdb.com @pauldix @influxdb

Slide 2

Slide 2 text

YC (W13) 4 people full time. hiring more!

Slide 3

Slide 3 text

What it’s for…

Slide 4

Slide 4 text

Metrics

Slide 5

Slide 5 text

Time Series

Slide 6

Slide 6 text

Analytics

Slide 7

Slide 7 text

Events

Slide 8

Slide 8 text

Can’t you just use a regular DB?

Slide 9

Slide 9 text

order by time?

Slide 10

Slide 10 text

Doesn’t Scale

Slide 11

Slide 11 text

Example from metrics: ! 100 measurements per host * 10 hosts * 8640 per day (once every 10s) * 365 days ! = 3,153,600,000 records per year

Slide 12

Slide 12 text

Have fun with that table…

Slide 13

Slide 13 text

But wait, we’ll just keep the summaries!

Slide 14

Slide 14 text

1h averages = ! 8,760,000 per year

Slide 15

Slide 15 text

Lose Detail and AdHoc Queryability

Slide 16

Slide 16 text

So let’s use Cassandra, HBase, or Scaleasaurus!

Slide 17

Slide 17 text

Too much application code and complexity

Slide 18

Slide 18 text

Application logic and scripts to compute summaries

Slide 19

Slide 19 text

Application level logic for balancing

Slide 20

Slide 20 text

No data locality for AdHoc queries

Slide 21

Slide 21 text

And then there’s more…

Slide 22

Slide 22 text

Web services

Slide 23

Slide 23 text

Libraries for web services

Slide 24

Slide 24 text

Data collection

Slide 25

Slide 25 text

Visualization

Slide 26

Slide 26 text

–Paul Dix “Building an application with an analytics component today is like building a web application in 1998. You spend months building infrastructure before getting to the actual thing you want to build.”

Slide 27

Slide 27 text

Analytics should be about analyzing and interpreting data, not the infrastructure to store and process it.

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

HTTP API Web services built in

Slide 30

Slide 30 text

HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'

Slide 31

Slide 31 text

Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value", "host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]

Slide 32

Slide 32 text

HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'

Slide 33

Slide 33 text

SQL-ish select * from events where time > now() - 1h

Slide 34

Slide 34 text

SQL-ish select * from “series with weird chars ()*@#0982#$” where time > now() - 1h

Slide 35

Slide 35 text

Where Regex select line from application_logs where line =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Slide 36

Slide 36 text

Only scans the time range Series and time are the primary index

Slide 37

Slide 37 text

Aggregate on the ﬂy…

Slide 38

Slide 38 text

Aggregates select percentile(90, value) from response_times group by time(10m) where time > now() - 1d

Slide 39

Slide 39 text

Continuous Aggregation…

Slide 40

Slide 40 text

Continuous queries (summaries) select count(page_id) from events group by time(1h), page_id into events.[page_id]

Slide 41

Slide 41 text

Series per page id select count from events.67 where time > now() - 7d

Slide 42

Slide 42 text

Work with many series…

Slide 43

Slide 43 text

Select from Regex select * from /stats\.cpu\..*/ limit 1

Slide 44

Slide 44 text

Continuous queries (regex aggregating) select percentile(value, 90) as value from /stats\.*/ group by time(5m) into percentile.90.:series_name

Slide 45

Slide 45 text

Merge with Regex select percentile(90, value) from merge(/stats\.cpu_load\..*/) group by time(10m) where time > now() - 4h

Slide 46

Slide 46 text

Percentile series per host select value from percentile.90.stats.cpu.host1 where time > now() - 4h

Slide 47

Slide 47 text

Denormalization for performance

Slide 48

Slide 48 text

Range scans all user events for last hour select * from events where user_id = 3 and time > now() - 1h

Slide 49

Slide 49 text

Continuous queries (fan out) select * from events into events.[user_id]

Slide 50

Slide 50 text

Series per user id select * from events.3 where time > now() - 1h

Slide 51

Slide 51 text

Distributed Scale out, data locality, high availability

Slide 52

Slide 52 text

Raft for metadata *Goraft now, streaming soon

Slide 53

Slide 53 text

Protobuf + TCP for queries, writes

Slide 54

Slide 54 text

Scalable Hundreds of thousands of series (soon millions)

Slide 55

Slide 55 text

Libraries Go, Ruby, Javascript, Python, Node.js, Clojure, Java, Perl, Haskell, R, Scala, CLI (ruby and node)

Slide 56

Slide 56 text

Visualization

Slide 57

Slide 57 text

Built-in UI

Slide 58

Slide 58 text

Grafana

Slide 59

Slide 59 text

Javascript library + D3, HighCharts, Rickshaw, NVD3, etc. Deﬁnitely more to do here!

Slide 60

Slide 60 text

Data Collection CollectD, StatsD backend, Carbon ingestion

Slide 61

Slide 61 text

Coming Soon

Slide 62

Slide 62 text

New Clustering Implementation

Slide 63

Slide 63 text

Two Parts

Slide 64

Slide 64 text

Broker

Slide 65

Slide 65 text

Data Node

Slide 66

Slide 66 text

How writes work

Slide 67

Slide 67 text

Any server Write

Slide 68

Slide 68 text

Broker Broker Broker Any server Write Streaming Raft Cluster

Slide 69

Slide 69 text

Writes are CP

Slide 70

Slide 70 text

Broker Data Node Broker Broker Any server Write

Slide 71

Slide 71 text

Broker Data Node Data Node Broker Broker Any server Write If replication factor = 2

Slide 72

Slide 72 text

Broker Data Node Data Node Broker Broker Any server Write Data Node Data Node Data Node Data Node

Slide 73

Slide 73 text

How Queries Work

Slide 74

Slide 74 text

Data Node Data Node Any server Data Node Data Node Data Node Data Node select mean(cpu_load)! where data_center = 'us-west'! and host = 'serverA'! and time > now() - 24h! group by time(10m)!

Slide 75

Slide 75 text

Data Node Data Node Any server Data Node Data Node Data Node Data Node Compute Locally select mean(cpu_load)! where data_center = 'us-west'! and host = 'serverA'! and time > now() - 24h! group by time(10m)!

Slide 76

Slide 76 text

Data Node Data Node Any server Data Node Data Node Data Node Data Node Send Summary Ticks select mean(cpu_load)! where data_center = 'us-west'! and host = 'serverA'! and time > now() - 24h! group by time(10m)!

Slide 77

Slide 77 text

Clustering Goal: 1-2M values per second

Slide 78

Slide 78 text

Potential Cluster Size: 3-5 Brokers 50 Data Nodes

Slide 79

Slide 79 text

Binary Protocol

Slide 80

Slide 80 text

Pubsub select * from some_series where host = “serverA” into subscription() select percentile(90, value) from some_series group by time(1m) into subscription()