Building InfluxDB, an
open source distributed
events database
Paul Dix
@pauldix
paul@influxdb.com
Slide 2
Slide 2 text
About me…
• Organizer NYC Machine Learning (4600+
members)
• Author “Service Oriented Design with Ruby and
Rails”
• Creator of Feedjira, Typhoeus, and others
• Columbia 2009* (CS)
• YC (W13)
Slide 3
Slide 3 text
Series Editor - “Data &
Analytics”
Slide 4
Slide 4 text
Our YC Company
Platform for real-time metrics
and monitoring
Slide 5
Slide 5 text
How was YC?
• Awesome
• Time to iterate
• Fantastic network
Slide 6
Slide 6 text
How we arrived here
• Had to build InfluxDB to even get to the product
• Crowded space with no clear differentiation for us
• Better opportunity as an open source platform
Slide 7
Slide 7 text
Platforms should be
open source.
Slide 8
Slide 8 text
Monitorama and the 6
week pivot
Slide 9
Slide 9 text
How was pivoting?
Slide 10
Slide 10 text
At first, it sucked.
Slide 11
Slide 11 text
And then it was
awesome!
Slide 12
Slide 12 text
InfluxDB - an events
database
Slide 13
Slide 13 text
time series, metrics, discrete events, analytics
Events Database?
Slide 14
Slide 14 text
Metrics
Slide 15
Slide 15 text
Time Series
Slide 16
Slide 16 text
Analytics
Slide 17
Slide 17 text
Events
Slide 18
Slide 18 text
What do other people do?
• DB (Hbase, Cassandra, MySQL, Redis, etc)
• Web services
• Cron jobs or long running workers for summary
tables
Slide 19
Slide 19 text
Building anything with an
analytics component is like
writing a web application in 1998.
Slide 20
Slide 20 text
Project Goals
• Store metrics AND events
• Horizontally scalable
• Nothing else to run and manage
• Shouldn’t have to write server code for
downsampling/summaries
Slide 21
Slide 21 text
InfluxDB
• Written in Go
• LevelDB for storage engine
• No dependencies (other than glibc)
• Distributed
• HTTP API
• SQL like query language
Slide 22
Slide 22 text
Cool Integrations
• StatsD Backend
• CollectD Proxy
• CLI (ruby or node)
• Libraries for many languages
• Graphite
• OpenTSDB (soon)
Storage
• Schema-less
• Indexed on (series, column, time, sequence
number)
• Microsecond precision
• Doesn’t store null column values
• Millions of series
Slide 27
Slide 27 text
SQL-ish
select * from some_series
where time > now() - 1h
Slide 28
Slide 28 text
Aggregates
select percentile(90, value) from some_series
group by time(10m)
where time > now() - 1d
Slide 29
Slide 29 text
Select from Regex
select * from /stats\.cpu\..*/
limit 1
Slide 30
Slide 30 text
Where against Regex
select value from some_log_series
where value =~ /.*ERROR.*/ and
time > "2014-03-01" and time < "2014-03-03"
Slide 31
Slide 31 text
Continuous queries
(fan out)
select * from events
into events.[user_id]
Slide 32
Slide 32 text
Continuous queries
(summaries)
select count(page_id) from events
group by time(1h), page_id
into events.[page_id]
Slide 33
Slide 33 text
Continuous queries
(regex downsampling)
select max(value), context from /stats\.*/
group by time(5m)
into max.:series_name
Slide 34
Slide 34 text
Tooling considerations
Slide 35
Slide 35 text
Why Go?
Slide 36
Slide 36 text
Quicker and easier
than C or C++
Libraries, memory managed, previous experience.
Slide 37
Slide 37 text
Existing Raft
Implementation
(goraft)
Slide 38
Slide 38 text
Compiled binary with
no dependencies
Slide 39
Slide 39 text
Good enough
performance*
Slide 40
Slide 40 text
Community and
language momentum
NEW SHINY THINGS!!1
Slide 41
Slide 41 text
Why NOT Go?
Slide 42
Slide 42 text
GC
Mark and sweep.
Large heaps if we want to cache.
Slide 43
Slide 43 text
GC
Lack of control over memory.
Even generational is sadness.
Slide 44
Slide 44 text
How data is distributed
• Shards - contiguous blocks of time
• Replication factor (read scalability)
• Split - break blocks of time into multiple shards
(write scalability)
Slide 45
Slide 45 text
Split
• Hashing db, series
• Series data always lives in the same shard
• Match regex and randomly distribute
• Writes scalability at the cost of losing data
locality
Slide 46
Slide 46 text
Now a bit of code…
• Shards - contiguous blocks of time
• Servers have multiple shards
• Query can come into any server and hit multiple
shards (local or remote)
Slide 47
Slide 47 text
responses := make([]chan *protocol.Response, 0)
// shards are arranged by time.
// set up response channels to collect results
for _, shard := range shards {
responseChan := make(chan *protocol.Response)
// parallelize it!
go shard.Query(querySpec, responseChan)
responses = append(responses, responseChan)
}
Slide 48
Slide 48 text
// loop through in order pulling results.
// Since shards are ordered they come back
// in the right order.
for _, responseChan := range responses {
for {
// ordered, processed results onto response.
response := <-responseChan
if *response.Type == endStreamResponse {
break
}
}
}
Slide 49
Slide 49 text
!
!
// shard.go
func (s *Shard) Query(q *QuerySpec,
c chan *protocol.Response) {
req := s.createRequest(q)
s.server.Request(req, c)
}
Slide 50
Slide 50 text
!
func (s *Server) Request(r *protocol.Request,
c chan *protocol.Response) {
// sends request over the wire and returns
}
!
// single goroutine handles responses from server
func (s *Server) handleResponses() {
for {
r := <-s.responses
// returns the chan that was sent with
// call to Request
c := s.getResponseChannel(r)
c <- r
}
}
Slide 51
Slide 51 text
This caused a race
condition that took days
to debug…
Slide 52
Slide 52 text
responses := make([]chan *protocol.Response, 0)
// shards are arranged by time.
// set up response channels to collect results
for _, shard := range shards {
responseChan := make(chan *protocol.Response)
// parallelize it!
go shard.Query(querySpec, responseChan)
responses = append(responses, responseChan)
}
Slide 53
Slide 53 text
Single goroutine to read
responses and send to
channels
Slide 54
Slide 54 text
responses := make([]chan *protocol.Response, 0)
// shards are arranged by time.
// set up response channels to collect results
for _, shard := range shards {
responseChan := make(chan *protocol.Response, 10)
// parallelize it!
go shard.Query(querySpec, responseChan)
responses = append(responses, responseChan)
}
Slide 55
Slide 55 text
How big should the
buffer be?
Slide 56
Slide 56 text
What if we run out of
memory?
Slide 57
Slide 57 text
Channels and Goroutines
are awesome, but they’re
not magic.
Slide 58
Slide 58 text
Open Questions…
Slide 59
Slide 59 text
Is there a better pattern
for distributed queries?
Slide 60
Slide 60 text
Is there a way to get
around GC?
Memcashier and MMap
Slide 61
Slide 61 text
Exciting features coming
up…
• Copy/move shards within the cluster
• Custom functions in JS
• Binary Protocol and Pubsub
Slide 62
Slide 62 text
Our goal is to make it as easy
to build an analytics product as
it is to write a web application.