InfluxDB - an open source
distributed time series,
metrics, and events database
Paul Dix
paul@influxdb.com
@pauldix
@influxdb
Slide 2
Slide 2 text
YC (W13)
4 people full time.
hiring more!
Slide 3
Slide 3 text
What it’s for…
Slide 4
Slide 4 text
Metrics
Slide 5
Slide 5 text
Time Series
Slide 6
Slide 6 text
Analytics
Slide 7
Slide 7 text
Events
Slide 8
Slide 8 text
Can’t you just use a
regular DB?
Slide 9
Slide 9 text
order by time?
Slide 10
Slide 10 text
Doesn’t Scale
Slide 11
Slide 11 text
Example from metrics:
!
100 measurements per host *
10 hosts *
8640 per day (once every 10s) *
365 days
!
= 3,153,600,000 records per year
Slide 12
Slide 12 text
Have fun with that
table…
Slide 13
Slide 13 text
But wait, we’ll just keep
the summaries!
Slide 14
Slide 14 text
1h averages =
!
8,760,000 per year
Slide 15
Slide 15 text
Lose Detail and
AdHoc Queryability
Slide 16
Slide 16 text
So let’s use Cassandra,
HBase, or Scaleasaurus!
Slide 17
Slide 17 text
Too much application
code and complexity
Slide 18
Slide 18 text
Application logic and
scripts to compute
summaries
Slide 19
Slide 19 text
Application level logic
for balancing
Slide 20
Slide 20 text
No data locality for
AdHoc queries
Slide 21
Slide 21 text
And then there’s
more…
Slide 22
Slide 22 text
Web services
Slide 23
Slide 23 text
Libraries for web
services
Slide 24
Slide 24 text
Data collection
Slide 25
Slide 25 text
Visualization
Slide 26
Slide 26 text
–Paul Dix
“Building an application with an analytics
component today is like building a web
application in 1998. You spend months building
infrastructure before getting to the actual thing
you want to build.”
Slide 27
Slide 27 text
Analytics should be about
analyzing and interpreting data,
not the infrastructure to store and
process it.
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
HTTP API
Web services built in
Slide 30
Slide 30 text
HTTP API (writes)
curl -X POST \
'http://localhost:8086/db/mydb/series?u=paul&p=pass' \
-d '[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Javascript library + D3,
HighCharts, Rickshaw,
NVD3, etc.
Definitely more to do here!
Slide 60
Slide 60 text
Data Collection
CollectD, StatsD backend, Carbon ingestion
Slide 61
Slide 61 text
Coming Soon
Slide 62
Slide 62 text
New Clustering
Implementation
Slide 63
Slide 63 text
Two Parts
Slide 64
Slide 64 text
Broker
Slide 65
Slide 65 text
Data Node
Slide 66
Slide 66 text
How writes work
Slide 67
Slide 67 text
Any
server
Write
Slide 68
Slide 68 text
Broker
Broker
Broker
Any
server
Write
Streaming Raft Cluster
Slide 69
Slide 69 text
Writes are CP
Slide 70
Slide 70 text
Broker
Data
Node
Broker
Broker
Any
server
Write
Slide 71
Slide 71 text
Broker
Data
Node
Data
Node
Broker
Broker
Any
server
Write
If replication factor = 2
Slide 72
Slide 72 text
Broker
Data
Node
Data
Node
Broker
Broker
Any
server
Write
Data
Node
Data
Node
Data
Node
Data
Node
Slide 73
Slide 73 text
How Queries Work
Slide 74
Slide 74 text
Data
Node
Data
Node
Any
server
Data
Node
Data
Node
Data
Node
Data
Node
select mean(cpu_load)!
where data_center = 'us-west'!
and host = 'serverA'!
and time > now() - 24h!
group by time(10m)!
Slide 75
Slide 75 text
Data
Node
Data
Node
Any
server
Data
Node
Data
Node
Data
Node
Data
Node
Compute Locally
select mean(cpu_load)!
where data_center = 'us-west'!
and host = 'serverA'!
and time > now() - 24h!
group by time(10m)!
Slide 76
Slide 76 text
Data
Node
Data
Node
Any
server
Data
Node
Data
Node
Data
Node
Data
Node
Send Summary Ticks
select mean(cpu_load)!
where data_center = 'us-west'!
and host = 'serverA'!
and time > now() - 24h!
group by time(10m)!
Slide 77
Slide 77 text
Clustering Goal:
1-2M values per second
Slide 78
Slide 78 text
Potential Cluster Size:
3-5 Brokers
50 Data Nodes
Slide 79
Slide 79 text
Binary Protocol
Slide 80
Slide 80 text
Pubsub
select * from some_series
where host = “serverA”
into subscription()
select percentile(90, value) from some_series
group by time(1m)
into subscription()
Slide 81
Slide 81 text
Custom Functions
select myFunc(value) from some_series
Slide 82
Slide 82 text
Column Indexes
Slide 83
Slide 83 text
Dictionaries
Slide 84
Slide 84 text
Rack aware sharding
and querying
Slide 85
Slide 85 text
Multi-datacenter
replication
Push and bi-directional
Slide 86
Slide 86 text
Need help?
support@influxdb.com
Thanks!
paul@influxdb.com
@pauldix