Building InfluxDB, an open source distributed events database

Building InﬂuxDB, an open source distributed events database Paul Dix
@pauldix paul@inﬂuxdb.com

About me… • Organizer NYC Machine Learning (4600+ members) •
Author “Service Oriented Design with Ruby and Rails” • Creator of Feedjira, Typhoeus, and others • Columbia 2009* (CS) • YC (W13)

Series Editor - “Data & Analytics”

Our YC Company Platform for real-time metrics and monitoring

How was YC? • Awesome • Time to iterate •
Fantastic network

How we arrived here • Had to build InﬂuxDB to
even get to the product • Crowded space with no clear differentiation for us • Better opportunity as an open source platform

Platforms should be open source.

Monitorama and the 6 week pivot

How was pivoting?

At ﬁrst, it sucked.

And then it was awesome!

InﬂuxDB - an events database

time series, metrics, discrete events, analytics Events Database?

Metrics

Time Series

Analytics

Events

What do other people do? • DB (Hbase, Cassandra, MySQL,
Redis, etc) • Web services • Cron jobs or long running workers for summary tables

Building anything with an analytics component is like writing a
web application in 1998.

Project Goals • Store metrics AND events • Horizontally scalable
• Nothing else to run and manage • Shouldn’t have to write server code for downsampling/summaries

InﬂuxDB • Written in Go • LevelDB for storage engine
• No dependencies (other than glibc) • Distributed • HTTP API • SQL like query language

Cool Integrations • StatsD Backend • CollectD Proxy • CLI
(ruby or node) • Libraries for many languages • Graphite • OpenTSDB (soon)

Grafana

Data [ { "name": "events", "columns": ["type", "email"], "points": [
["signup", "[email protected]"], ["paid", "[email protected]"] ] } ]

Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]

Storage • Schema-less • Indexed on (series, column, time, sequence
number) • Microsecond precision • Doesn’t store null column values • Millions of series

SQL-ish select * from some_series where time > now() -
1h

Aggregates select percentile(90, value) from some_series group by time(10m) where
time > now() - 1d

Select from Regex select * from /stats\.cpu\..*/ limit 1

Where against Regex select value from some_log_series where value =~
/.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Continuous queries (fan out) select * from events into events.[user_id]

Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]

Continuous queries (regex downsampling) select max(value), context from /stats\.*/ group
by time(5m) into max.:series_name

Tooling considerations

Why Go?

Quicker and easier than C or C++ Libraries, memory managed,
previous experience.

Existing Raft Implementation (goraft)

Compiled binary with no dependencies

Good enough performance*

Community and language momentum NEW SHINY THINGS!!1

Why NOT Go?

GC Mark and sweep. Large heaps if we want to
cache.

GC Lack of control over memory. Even generational is sadness.

How data is distributed • Shards - contiguous blocks of
time • Replication factor (read scalability) • Split - break blocks of time into multiple shards (write scalability)

Split • Hashing db, series • Series data always lives
in the same shard • Match regex and randomly distribute • Writes scalability at the cost of losing data locality

Now a bit of code… • Shards - contiguous blocks
of time • Servers have multiple shards • Query can come into any server and hit multiple shards (local or remote)

responses := make([]chan *protocol.Response, 0) // shards are arranged by
time. // set up response channels to collect results for _, shard := range shards { responseChan := make(chan *protocol.Response) // parallelize it! go shard.Query(querySpec, responseChan) responses = append(responses, responseChan) }

// loop through in order pulling results. // Since shards
are ordered they come back // in the right order. for _, responseChan := range responses { for { // ordered, processed results onto response. response := <-responseChan if *response.Type == endStreamResponse { break } } }

! ! // shard.go func (s *Shard) Query(q *QuerySpec, c
chan *protocol.Response) { req := s.createRequest(q) s.server.Request(req, c) }

! func (s *Server) Request(r *protocol.Request, c chan *protocol.Response) {
// sends request over the wire and returns } ! // single goroutine handles responses from server func (s *Server) handleResponses() { for { r := <-s.responses // returns the chan that was sent with // call to Request c := s.getResponseChannel(r) c <- r } }

This caused a race condition that took days to debug…

time. // set up response channels to collect results for _, shard := range shards { responseChan := make(chan *protocol.Response) // parallelize it! go shard.Query(querySpec, responseChan) responses = append(responses, responseChan) }

Single goroutine to read responses and send to channels

time. // set up response channels to collect results for _, shard := range shards { responseChan := make(chan *protocol.Response, 10) // parallelize it! go shard.Query(querySpec, responseChan) responses = append(responses, responseChan) }

How big should the buffer be?

What if we run out of memory?

Channels and Goroutines are awesome, but they’re not magic.

Open Questions…

Is there a better pattern for distributed queries?

Is there a way to get around GC? Memcashier and
MMap

Exciting features coming up… • Copy/move shards within the cluster
• Custom functions in JS • Binary Protocol and Pubsub

Our goal is to make it as easy to build
an analytics product as it is to write a web application.

Thank you! Paul Dix @pauldix paul@inﬂuxdb.com

Building InfluxDB, an open source distributed e...

Building InfluxDB, an open source distributed events database

More Decks by Paul Dix

Other Decks in Programming

Featured

Transcript