Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
InfluxDB - a distributed events and time series...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Paul Dix
April 27, 2014
Technology
2k
1
Share
InfluxDB - a distributed events and time series database
Slides from my lightning talk at the GopherCon pre-party.
Paul Dix
April 27, 2014
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
270
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
690
InfluxDB 2.0 and Flux
pauldix
1
770
Flux and InfluxDB 2.0
pauldix
1
1.5k
Querying Prometheus with Flux
pauldix
1
980
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5.4k
At Scale, Everything is Hard
pauldix
2
750
IFQL and the future of InfluxData
pauldix
2
1.5k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
500
Other Decks in Technology
See All in Technology
How to learn AWS Well-Architected with AWS BuilderCards: Security Edition
coosuke
PRO
0
150
写真で見るAWS Summit Singapore 2026
k_adachi_01
0
110
20260515 ID管理は会社を守る大切な砦!〜🔰情シス向け〜
oidfj
0
580
【関西製造業祭り2026春】現場を変える技術はここまで来た〜世界最大の製造業見本市から持って帰ってきたもの〜
tanakaseiya
0
170
20260515 ログイン機能だけではないアカウント管理を全体で考える~サービス設計者向け~
oidfj
1
680
セキュリティ対策、何からはじめる? CloudNative環境の脅威モデリングと リスク評価実践入門 #cloudnativekaigi
varu3
5
960
分断された OT と IT を繋ぐ架け橋 -Kubernetes が切り拓く 産業用組み込み製品の現在地 -
yudaiono
1
120
React Compiler導入の効果と運用の工夫
kakehashi
PRO
3
160
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
100
Purview 勉強会報告 Microsoft Purview 入門しようとしてみた
masakichixo
1
430
AI-Assisted Contributions and Maintainer Load - PyCon US 2026
pauloxnet
1
150
みんなの考えた最強のデータ基盤アーキテクチャ'26前期〜前夜祭〜ルーキーズ_資料_遠藤な
endonanana
0
370
Featured
See All Featured
How to build a perfect <img>
jonoalderson
1
5.5k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
280
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.2k
GraphQLの誤解/rethinking-graphql
sonatard
75
12k
How to train your dragon (web standard)
notwaldorf
97
6.6k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
180
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
110
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
270
Skip the Path - Find Your Career Trail
mkilby
1
120
Reality Check: Gamification 10 Years Later
codingconduct
0
2.1k
Typedesign – Prime Four
hannesfritz
42
3k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
360
Transcript
InfluxDB - a distributed time series, metrics, and events database
Paul Dix paul@influxdb.com @pauldix @influxdb
YC (W13), 3 people full time: Todd Persen John Shahid
Paul Dix (me)
What it’s for…
Metrics
Time Series
Analytics
Events
Can’t you just use a regular DB?
order by time?
Doesn’t Scale
Example from metrics: ! 100 measurements per host * 10
hosts * 8640 per day (once every 10s) * 365 days ! = 3,153,600,000 records per year
Have fun with that table…
But wait, we’ll just keep the summaries!
1h averages = ! 8,760,000 per year
Lose Detail and AdHoc Queryability
So let’s use Cassandra, HBase, or Scaleasaurus!
Too much application code and complexity
Application logic and scripts to compute summaries
Application level logic for balancing
No data locality for AdHoc queries
And then there’s more…
Web services
Libraries for web services
Data collection
Visualization
–Paul Dix “Building an application with an analytics component today
is like building a web application in 1998. You spend months building infrastructure before getting to the actual thing you want to build.”
Analytics should be about analyzing and interpreting data, not the
infrastructure to store and process it.
None
HTTP API Web services built in
HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d
'[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'
SQL-ish select * from events where time > now() -
1h
SQL-ish select * from “series with weird chars ()*@#0982#$” where
time > now() - 1h
Where Regex select line from application_logs where line =~ /.*ERROR.*/
and time > "2014-03-01" and time < "2014-03-03"
Only scans the time range Series and time are the
primary index
Work with many series…
Select from Regex select * from /stats\.cpu\..*/ limit 1
Downsampling on the fly…
Aggregates select percentile(90, value) from response_times group by time(10m) where
time > now() - 1d
Continuous Downsampling…
Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]
Series per page id select count from events.67 where time
> now() - 7d
Continuous queries (regex downsampling) select percentile(value, 90) as value from
/stats\.*/ group by time(5m) into percentile.90.:series_name
Percentile series per host select value from percentile.90.stats.cpu.host1 where time
> now() - 4h
Denormalization for performance
Range scans all user events for last hour select *
from events where user_id = 3 and time > now() - 1h
Continuous queries (fan out) select * from events into events.[user_id]
Series per user id select * from events.3 where time
> now() - 1h
Distributed Scale out, data locality, high availability
Raft for metadata We owe Ben Johnson a beer or
three…
Protobuf + TCP for queries, writes
Scalable Have billions of points in 1 series* or a
million different series
Libraries Go, Ruby, Javascript, Python, Node.js, Clojure, Java, Perl, Haskell,
R, Scala, CLI (ruby and node)
Visualization
Built-in UI
Grafana
Javascript library + D3, HighCharts, Rickshaw, NVD3, etc. Definitely more
to do here!
Data Collection CollectD Proxy, StatsD backend, Carbon ingestion, OpenTSDB (soon)
Coming Soon
ugh, Documentation
Series Metadata
Binary Protocol
Pubsub select * from some_series where host = “serverA” into
subscription() select percentile(90, value) from some_series group by time(1m) into subscription()
Custom Functions select myFunc(value) from some_series
Rack aware sharding and querying
Multi-datacenter replication Push and bi-directional
Indexes?
Ponies? Tell @jvshahid that you want your pony ;)
But it’s ready to go now. Production deployments already running.
Need help? support@influxdb.com Thanks! paul@influxdb.com @pauldix