Slide 1

Slide 1 text

Introducing InfluxDB, an open source distributed time series database Paul Dix @pauldix [email protected]

Slide 2

Slide 2 text

About me ● Co-founder, CEO of Errplane (YC W13) ● Organizer of NYC Machine Learning ● Series editor for Addison Wesley’s “Data & Analytics” series ● Author of “Service Oriented Design with Ruby & Rails” ● Created Feedzirra, Typhoeus, SaxMachine, and Domainatrix ● Attending NYC.rb since 2005

Slide 3

Slide 3 text

What is a time series?

Slide 4

Slide 4 text

Metrics

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Events ● Measurements ● Exceptions ● Page Views ● User actions ● Commits ● Deploys ● Things happening in time...

Slide 10

Slide 10 text

Analytics operations, developers, users, business

Slide 11

Slide 11 text

Things you want to ask questions about, visualize, or summarize over time.

Slide 12

Slide 12 text

Actually a summarization

Slide 13

Slide 13 text

Also a summarization

Slide 14

Slide 14 text

Isn’t a time series database just a regular database ordered by a time column?

Slide 15

Slide 15 text

Why a database for time series? ● Billions of data points ● Scale horizontally ● HTTP native ● API to build on ● Built in tools for downsampling/summarizing ● Automatically clear out old data if we want ● Process/monitor data as it comes in (like Storm)

Slide 16

Slide 16 text

Visualize and Summarize ● Graphs & dashboards ● Last 10 minutes ● Last 4 hours ● Last 24 hours ● Past week ● Past month ● YTD ● All Time

Slide 17

Slide 17 text

Data Collection ● Statsd (https://github.com/etsy/statsd/) ● CollectD (http://collectd.org/) ● Heka (https://github.com/mozilla- services/heka) ● l2met (https://github. com/ryandotsmith/l2met) ● Libraries ● Framework integrations ● Cloud integrations (AWS, OpenStack) ● Third-party integrations

Slide 18

Slide 18 text

Existing Tools ● RRDTool (metrics) ● Graphite (metrics) ● OpenTSDB (metrics + events) ● Kairos (metrics + events)

Slide 19

Slide 19 text

Something missing...

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

InfluxDB: harness lightning, get 1.21 gigawatts.

Slide 25

Slide 25 text

InfluxDB ● Written in Go ● Uses LevelDB for storage (may change) ● Self contained binary ● No external dependencies ● Distributed (in December)

Slide 26

Slide 26 text

HTTP Native ● Read/write data via HTTP ● Manage via HTTP ● Security model to allow access directly from browser

Slide 27

Slide 27 text

How data is organized ● Databases (like in MySQL, Postgres, etc) ● Time series (kind of like tables) ● Points or events (kind of like rows)

Slide 28

Slide 28 text

Security ● Cluster admins ● Database admins ● Database users ○ read permissions ■ only certain series ■ only queries with a column having a specific value (e.g. customer_id=32) ○ write permissions ■ only certain series ■ only with columns having a specific value

Slide 29

Slide 29 text

InfluDB Setup ● http://play.influxdb.org ● OSX ○ brew update && brew install influxdb ● http://influxdb.org/download ● Ubuntu ○ sudo dpkg -i influxdb_latest_amd64.deb ● RedHat ○ sudo rpm -ivh influxdb-latest-1.i686.rpm

Slide 30

Slide 30 text

Create a database require 'influxdb' influxdb = InfluxDB::Client.new influxdb.create_database(database)

Slide 31

Slide 31 text

Add a user database = 'site_development' username = 'foo' password = 'bar' influxdb.create_database_user( database, username, password)

Slide 32

Slide 32 text

Write points influxdb = InfluxDB::Client.new(database, :username => username, :password => password) data = { val_a: 21, some_other_val: "hi", another: 23212.1, is_awesome: true } influxdb.write_point("some_series", data)

Slide 33

Slide 33 text

Write points curl -X POST 'http://db/mydb/series?u=paul&p=pass' -d \ '[{"name":"foo", "columns":["val"], "points": [[3]]}]'

Slide 34

Slide 34 text

Querying select * from user_events where time > now() - 4h

Slide 35

Slide 35 text

[{ "name": "foo", "columns": [ "time", "sequence_number", "val1", "val2" ], "points": [ [1384295094, 3, "paul", 23], [1384295094, 2, "john", 92], [1384295094, 1, "todd", 61] ] }, {...}] JSON data returned

Slide 36

Slide 36 text

select count(state) from user_events group by time(5m), state where time > now() - 7d

Slide 37

Slide 37 text

select percentile(value, 90) from response_times group by time(30s) where time > now() - 1h

Slide 38

Slide 38 text

select percentile(value, 90) from response_times group by time(5m) into response_times.percentiles.90 Continuous Queries (downsampling)

Slide 39

Slide 39 text

Regexes select * from events where email =~ /.*gmail\.com/

Slide 40

Slide 40 text

select percentile(value, 99) from /stats\.*/ into :series_name.percentiles.99

Slide 41

Slide 41 text

select count(value) from seriesA merge seriesB

Slide 42

Slide 42 text

Querying ● Functions ○ min, max, median, mode, percentiles, derivative, standard deviation ● Where clauses ● Group by clauses (time and other columns) ● Periodically delete old raw data

Slide 43

Slide 43 text

Built in UI

Slide 44

Slide 44 text

CLI

Slide 45

Slide 45 text

Libraries ● Ruby ● Frontend JS ● Node ● Python ● PHP ● Go (soon) ● Java (soon)

Slide 46

Slide 46 text

Ideas to come... ● Custom functions ○ Embedded LUA, YARN like interface, or both? ● Custom real-time queries ○ define custom logic and InfluxDB will feed it data ● Queries triggering web hooks ○ pair with custom functions for monitoring/anomaly detection

Slide 47

Slide 47 text

Project Status ● Based on work at https://errplane.com ○ 2 billion points per month ● http://influxdb.org ● Code available at https://github.com/influxdb ● API finalized in the next month ● Clustered version in December ● Production ready by end of year

Slide 48

Slide 48 text

We need your help ● API, what else would you like to see? ● Client libraries ● Visualization tools ● Data collection integrations ● Comments/feedback on the mailing list ● http://influxdb.org/overview/

Slide 49

Slide 49 text

Share the love ● Star or watch the project on http://github. com/influxdb/influxdb ● Tweet, blog, shout, whisper

Slide 50

Slide 50 text

OSS lives and dies by adoption/popularity

Slide 51

Slide 51 text

MongoDB has 4,406 stars

Slide 52

Slide 52 text

MongoDB valued at $1.2B

Slide 53

Slide 53 text

Each star worth $272,355.00

Slide 54

Slide 54 text

Help InfluxDB get to 10k stars! go forth and build!

Slide 55

Slide 55 text

Thanks! @pauldix [email protected]