Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introducing InfluxDB, an open source distributed time series database

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
November 12, 2013

Introducing InfluxDB, an open source distributed time series database

InfluxDB is an open source metrics and analytics database based on our work at Errplane. These are the slides from the talk I gave at NYC.rb on 11/12.

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

November 12, 2013
Tweet

Transcript

  1. Introducing InfluxDB, an open source distributed time series database Paul

    Dix @pauldix paul@errplane.com
  2. About me • Co-founder, CEO of Errplane (YC W13) •

    Organizer of NYC Machine Learning • Series editor for Addison Wesley’s “Data & Analytics” series • Author of “Service Oriented Design with Ruby & Rails” • Created Feedzirra, Typhoeus, SaxMachine, and Domainatrix • Attending NYC.rb since 2005
  3. What is a time series?

  4. Metrics

  5. None
  6. None
  7. None
  8. None
  9. Events • Measurements • Exceptions • Page Views • User

    actions • Commits • Deploys • Things happening in time...
  10. Analytics operations, developers, users, business

  11. Things you want to ask questions about, visualize, or summarize

    over time.
  12. Actually a summarization

  13. Also a summarization

  14. Isn’t a time series database just a regular database ordered

    by a time column?
  15. Why a database for time series? • Billions of data

    points • Scale horizontally • HTTP native • API to build on • Built in tools for downsampling/summarizing • Automatically clear out old data if we want • Process/monitor data as it comes in (like Storm)
  16. Visualize and Summarize • Graphs & dashboards • Last 10

    minutes • Last 4 hours • Last 24 hours • Past week • Past month • YTD • All Time
  17. Data Collection • Statsd (https://github.com/etsy/statsd/) • CollectD (http://collectd.org/) • Heka

    (https://github.com/mozilla- services/heka) • l2met (https://github. com/ryandotsmith/l2met) • Libraries • Framework integrations • Cloud integrations (AWS, OpenStack) • Third-party integrations
  18. Existing Tools • RRDTool (metrics) • Graphite (metrics) • OpenTSDB

    (metrics + events) • Kairos (metrics + events)
  19. Something missing...

  20. None
  21. None
  22. None
  23. None
  24. InfluxDB: harness lightning, get 1.21 gigawatts.

  25. InfluxDB • Written in Go • Uses LevelDB for storage

    (may change) • Self contained binary • No external dependencies • Distributed (in December)
  26. HTTP Native • Read/write data via HTTP • Manage via

    HTTP • Security model to allow access directly from browser
  27. How data is organized • Databases (like in MySQL, Postgres,

    etc) • Time series (kind of like tables) • Points or events (kind of like rows)
  28. Security • Cluster admins • Database admins • Database users

    ◦ read permissions ▪ only certain series ▪ only queries with a column having a specific value (e.g. customer_id=32) ◦ write permissions ▪ only certain series ▪ only with columns having a specific value
  29. InfluDB Setup • http://play.influxdb.org • OSX ◦ brew update &&

    brew install influxdb • http://influxdb.org/download • Ubuntu ◦ sudo dpkg -i influxdb_latest_amd64.deb • RedHat ◦ sudo rpm -ivh influxdb-latest-1.i686.rpm
  30. Create a database require 'influxdb' influxdb = InfluxDB::Client.new influxdb.create_database(database)

  31. Add a user database = 'site_development' username = 'foo' password

    = 'bar' influxdb.create_database_user( database, username, password)
  32. Write points influxdb = InfluxDB::Client.new(database, :username => username, :password =>

    password) data = { val_a: 21, some_other_val: "hi", another: 23212.1, is_awesome: true } influxdb.write_point("some_series", data)
  33. Write points curl -X POST 'http://db/mydb/series?u=paul&p=pass' -d \ '[{"name":"foo", "columns":["val"],

    "points": [[3]]}]'
  34. Querying select * from user_events where time > now() -

    4h
  35. [{ "name": "foo", "columns": [ "time", "sequence_number", "val1", "val2" ],

    "points": [ [1384295094, 3, "paul", 23], [1384295094, 2, "john", 92], [1384295094, 1, "todd", 61] ] }, {...}] JSON data returned
  36. select count(state) from user_events group by time(5m), state where time

    > now() - 7d
  37. select percentile(value, 90) from response_times group by time(30s) where time

    > now() - 1h
  38. select percentile(value, 90) from response_times group by time(5m) into response_times.percentiles.90

    Continuous Queries (downsampling)
  39. Regexes select * from events where email =~ /.*gmail\.com/

  40. select percentile(value, 99) from /stats\.*/ into :series_name.percentiles.99

  41. select count(value) from seriesA merge seriesB

  42. Querying • Functions ◦ min, max, median, mode, percentiles, derivative,

    standard deviation • Where clauses • Group by clauses (time and other columns) • Periodically delete old raw data
  43. Built in UI

  44. CLI

  45. Libraries • Ruby • Frontend JS • Node • Python

    • PHP • Go (soon) • Java (soon)
  46. Ideas to come... • Custom functions ◦ Embedded LUA, YARN

    like interface, or both? • Custom real-time queries ◦ define custom logic and InfluxDB will feed it data • Queries triggering web hooks ◦ pair with custom functions for monitoring/anomaly detection
  47. Project Status • Based on work at https://errplane.com ◦ 2

    billion points per month • http://influxdb.org • Code available at https://github.com/influxdb • API finalized in the next month • Clustered version in December • Production ready by end of year
  48. We need your help • API, what else would you

    like to see? • Client libraries • Visualization tools • Data collection integrations • Comments/feedback on the mailing list • http://influxdb.org/overview/
  49. Share the love • Star or watch the project on

    http://github. com/influxdb/influxdb • Tweet, blog, shout, whisper
  50. OSS lives and dies by adoption/popularity

  51. MongoDB has 4,406 stars

  52. MongoDB valued at $1.2B

  53. Each star worth $272,355.00

  54. Help InfluxDB get to 10k stars! go forth and build!

  55. Thanks! @pauldix paul@errplane.com