Slide 1

Slide 1 text

InfluxDB - an open source distributed time series database Paul Dix @pauldix paul@influxdb.com

Slide 2

Slide 2 text

About me…

Slide 3

Slide 3 text

Microsoft, failed startup, Air Force Space Command, McAffee, EastMedia, Mint Digital, KGB (kind of failed startup), failed startup, Benchmark Solutions (failed finance startup), Thomson Reuters, InfluxDB

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Organizer NYC Machine Learning (4900+ members)

Slide 6

Slide 6 text

Series Editor - “Data & Analytics”

Slide 7

Slide 7 text

Y Combinator (W13)

Slide 8

Slide 8 text

Time series?

Slide 9

Slide 9 text

Metrics

Slide 10

Slide 10 text

Time Series

Slide 11

Slide 11 text

Analytics

Slide 12

Slide 12 text

Events

Slide 13

Slide 13 text

Measurements AND Events Over Time

Slide 14

Slide 14 text

Data model • Databases • Time series (or tables, but you can have millions) • Points (or rows, but column oriented)

Slide 15

Slide 15 text

Data [ { "name": "cpu", "columns": [ "time", "sequence_number", "value", "host" ], "points": [ [1395168540, 1, 56.7, "foo.influxdb.com"], [1395168540, 2, 43.9, "bar.influxdb.com"] ] } ]

Slide 16

Slide 16 text

Everything is indexed by series and time.

Slide 17

Slide 17 text

Simple Install No external dependencies

Slide 18

Slide 18 text

brew update brew install influxdb

Slide 19

Slide 19 text

RPM, Debian packages ! http://influxdb.org/download

Slide 20

Slide 20 text

http://localhost:8083

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

HTTP API Web services built in

Slide 26

Slide 26 text

HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'

Slide 27

Slide 27 text

Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value", "host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]

Slide 28

Slide 28 text

HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'

Slide 29

Slide 29 text

SQL-ish select * from events where time > now() - 1h

Slide 30

Slide 30 text

SQL-ish select * from “series with weird chars ()*@#0982#$” where time > now() - 1h

Slide 31

Slide 31 text

Where Regex select line from application_logs where line =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Slide 32

Slide 32 text

Only scans the time range Series and time are the primary index

Slide 33

Slide 33 text

Work with many series…

Slide 34

Slide 34 text

Select from Regex select * from /stats\.cpu\..*/ limit 1

Slide 35

Slide 35 text

Downsampling on the fly…

Slide 36

Slide 36 text

Aggregates select percentile(90, value) from response_times group by time(10m) where time > now() - 1d

Slide 37

Slide 37 text

Continuous Downsampling…

Slide 38

Slide 38 text

Continuous queries (summaries) select count(page_id) from events group by time(1h), page_id into events.[page_id]

Slide 39

Slide 39 text

Series per page id select count from events.67 where time > now() - 7d

Slide 40

Slide 40 text

Continuous queries (regex downsampling) select percentile(value, 90) as value from /^stats\.*/ group by time(5m) into percentile.90.5m.:series_name

Slide 41

Slide 41 text

Percentile series per host select value from percentile.90.stats.cpu.host1 where time > now() - 4h

Slide 42

Slide 42 text

Data Collection Client libraries, CollectD, StatsD, Carbon ingestion, OpenTSDB (soon), Riemann (soon)

Slide 43

Slide 43 text

Built-in UI

Slide 44

Slide 44 text

Grafana

Slide 45

Slide 45 text

Behind the scenes

Slide 46

Slide 46 text

#golang

Slide 47

Slide 47 text

Garbage Collector Generational won’t save us

Slide 48

Slide 48 text

MMAP + Unsafe?

Slide 49

Slide 49 text

Storage engines LevelDB, RocksDB, HyperLevelDB, LMDB

Slide 50

Slide 50 text

Range Deletes Wildly Expensive

Slide 51

Slide 51 text

Shards & Shard Spaces

Slide 52

Slide 52 text

Query Parser YACC & Bison

Slide 53

Slide 53 text

Raft Metadata, servers, cluster state

Slide 54

Slide 54 text

Data replication Not write scalable!

Slide 55

Slide 55 text

TCP + Protobuf Intra-cluster communication, queries, replication

Slide 56

Slide 56 text

How data is distributed

Slide 57

Slide 57 text

Shard type Shard struct { Id uint32 StartTime time.Time EndTime time.Time ServerIds []uint32 }

Slide 58

Slide 58 text

Multiple shards per duration Named “split” in the configuration

Slide 59

Slide 59 text

Data for a series for a given interval exists in a shard* *by default, but can be modified

Slide 60

Slide 60 text

hash(database, series) % split

Slide 61

Slide 61 text

Scale out with many series

Slide 62

Slide 62 text

Questions?

Slide 63

Slide 63 text

Thank you! Paul Dix @pauldix paul@influxdb.com