Slide 1

Slide 1 text

Building InfluxDB, an open source distributed events database Paul Dix @pauldix paul@influxdb.com

Slide 2

Slide 2 text

About me… • Organizer NYC Machine Learning (4600+ members) • Author “Service Oriented Design with Ruby and Rails” • Creator of Feedjira, Typhoeus, and others • Columbia 2009* (CS) • YC (W13)

Slide 3

Slide 3 text

Series Editor - “Data & Analytics”

Slide 4

Slide 4 text

Our YC Company Platform for real-time metrics and monitoring

Slide 5

Slide 5 text

How was YC? • Awesome • Time to iterate • Fantastic network

Slide 6

Slide 6 text

How we arrived here • Had to build InfluxDB to even get to the product • Crowded space with no clear differentiation for us • Better opportunity as an open source platform

Slide 7

Slide 7 text

Platforms should be open source.

Slide 8

Slide 8 text

Monitorama and the 6 week pivot

Slide 9

Slide 9 text

How was pivoting?

Slide 10

Slide 10 text

At first, it sucked.

Slide 11

Slide 11 text

And then it was awesome!

Slide 12

Slide 12 text

InfluxDB - an events database

Slide 13

Slide 13 text

time series, metrics, discrete events, analytics Events Database?

Slide 14

Slide 14 text

Metrics

Slide 15

Slide 15 text

Time Series

Slide 16

Slide 16 text

Analytics

Slide 17

Slide 17 text

Events

Slide 18

Slide 18 text

What do other people do? • DB (Hbase, Cassandra, MySQL, Redis, etc) • Web services • Cron jobs or long running workers for summary tables

Slide 19

Slide 19 text

Building anything with an analytics component is like writing a web application in 1998.

Slide 20

Slide 20 text

Project Goals • Store metrics AND events • Horizontally scalable • Nothing else to run and manage • Shouldn’t have to write server code for downsampling/summaries

Slide 21

Slide 21 text

InfluxDB • Written in Go • LevelDB for storage engine • No dependencies (other than glibc) • Distributed • HTTP API • SQL like query language

Slide 22

Slide 22 text

Cool Integrations • StatsD Backend • CollectD Proxy • CLI (ruby or node) • Libraries for many languages • Graphite • OpenTSDB (soon)

Slide 23

Slide 23 text

Grafana

Slide 24

Slide 24 text

Data [ { "name": "events", "columns": ["type", "email"], "points": [ ["signup", "[email protected]"], ["paid", "[email protected]"] ] } ]

Slide 25

Slide 25 text

Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value", "host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]

Slide 26

Slide 26 text

Storage • Schema-less • Indexed on (series, column, time, sequence number) • Microsecond precision • Doesn’t store null column values • Millions of series

Slide 27

Slide 27 text

SQL-ish select * from some_series where time > now() - 1h

Slide 28

Slide 28 text

Aggregates select percentile(90, value) from some_series group by time(10m) where time > now() - 1d

Slide 29

Slide 29 text

Select from Regex select * from /stats\.cpu\..*/ limit 1

Slide 30

Slide 30 text

Where against Regex select value from some_log_series where value =~ /.*ERROR.*/ and time > "2014-03-01" and time < "2014-03-03"

Slide 31

Slide 31 text

Continuous queries (fan out) select * from events into events.[user_id]

Slide 32

Slide 32 text

Continuous queries (summaries) select count(page_id) from events group by time(1h), page_id into events.[page_id]

Slide 33

Slide 33 text

Continuous queries (regex downsampling) select max(value), context from /stats\.*/ group by time(5m) into max.:series_name

Slide 34

Slide 34 text

Tooling considerations

Slide 35

Slide 35 text

Why Go?

Slide 36

Slide 36 text

Quicker and easier than C or C++ Libraries, memory managed, previous experience.

Slide 37

Slide 37 text

Existing Raft Implementation (goraft)

Slide 38

Slide 38 text

Compiled binary with no dependencies

Slide 39

Slide 39 text

Good enough performance*

Slide 40

Slide 40 text

Community and language momentum NEW SHINY THINGS!!1

Slide 41

Slide 41 text

Why NOT Go?

Slide 42

Slide 42 text

GC Mark and sweep. Large heaps if we want to cache.

Slide 43

Slide 43 text

GC Lack of control over memory. Even generational is sadness.

Slide 44

Slide 44 text

How data is distributed • Shards - contiguous blocks of time • Replication factor (read scalability) • Split - break blocks of time into multiple shards (write scalability)

Slide 45

Slide 45 text

Split • Hashing db, series • Series data always lives in the same shard • Match regex and randomly distribute • Writes scalability at the cost of losing data locality

Slide 46

Slide 46 text

Now a bit of code… • Shards - contiguous blocks of time • Servers have multiple shards • Query can come into any server and hit multiple shards (local or remote)

Slide 47

Slide 47 text

responses := make([]chan *protocol.Response, 0) // shards are arranged by time. // set up response channels to collect results for _, shard := range shards { responseChan := make(chan *protocol.Response) // parallelize it! go shard.Query(querySpec, responseChan) responses = append(responses, responseChan) }

Slide 48

Slide 48 text

// loop through in order pulling results. // Since shards are ordered they come back // in the right order. for _, responseChan := range responses { for { // ordered, processed results onto response. response := <-responseChan if *response.Type == endStreamResponse { break } } }

Slide 49

Slide 49 text

! ! // shard.go func (s *Shard) Query(q *QuerySpec, c chan *protocol.Response) { req := s.createRequest(q) s.server.Request(req, c) }

Slide 50

Slide 50 text

! func (s *Server) Request(r *protocol.Request, c chan *protocol.Response) { // sends request over the wire and returns } ! // single goroutine handles responses from server func (s *Server) handleResponses() { for { r := <-s.responses // returns the chan that was sent with // call to Request c := s.getResponseChannel(r) c <- r } }

Slide 51

Slide 51 text

This caused a race condition that took days to debug…

Slide 52

Slide 52 text

responses := make([]chan *protocol.Response, 0) // shards are arranged by time. // set up response channels to collect results for _, shard := range shards { responseChan := make(chan *protocol.Response) // parallelize it! go shard.Query(querySpec, responseChan) responses = append(responses, responseChan) }

Slide 53

Slide 53 text

Single goroutine to read responses and send to channels

Slide 54

Slide 54 text

responses := make([]chan *protocol.Response, 0) // shards are arranged by time. // set up response channels to collect results for _, shard := range shards { responseChan := make(chan *protocol.Response, 10) // parallelize it! go shard.Query(querySpec, responseChan) responses = append(responses, responseChan) }

Slide 55

Slide 55 text

How big should the buffer be?

Slide 56

Slide 56 text

What if we run out of memory?

Slide 57

Slide 57 text

Channels and Goroutines are awesome, but they’re not magic.

Slide 58

Slide 58 text

Open Questions…

Slide 59

Slide 59 text

Is there a better pattern for distributed queries?

Slide 60

Slide 60 text

Is there a way to get around GC? Memcashier and MMap

Slide 61

Slide 61 text

Exciting features coming up… • Copy/move shards within the cluster • Custom functions in JS • Binary Protocol and Pubsub

Slide 62

Slide 62 text

Our goal is to make it as easy to build an analytics product as it is to write a web application.

Slide 63

Slide 63 text

Thank you! Paul Dix @pauldix paul@influxdb.com