Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
InfluxDB at Paris Data Geeks
Search
Paul Dix
June 26, 2014
2
170
InfluxDB at Paris Data Geeks
Paul Dix
June 26, 2014
Tweet
Share
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
250
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
650
InfluxDB 2.0 and Flux
pauldix
1
750
Flux and InfluxDB 2.0
pauldix
1
1.5k
Querying Prometheus with Flux
pauldix
1
960
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5.3k
At Scale, Everything is Hard
pauldix
2
730
IFQL and the future of InfluxData
pauldix
2
1.4k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
490
Featured
See All Featured
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
67
Deep Space Network (abreviated)
tonyrice
0
34
Abbi's Birthday
coloredviolet
0
4.4k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
430
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
42
技術選定の審美眼(2025年版) / Understanding the Spiral of Technologies 2025 edition
twada
PRO
115
100k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
0
1.1k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.3k
Navigating Team Friction
lara
191
16k
How GitHub (no longer) Works
holman
316
140k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
420
Transcript
InfluxDB - an open source distributed time series database Paul
Dix @pauldix paul@influxdb.com
About me…
Microsoft, failed startup, Air Force Space Command, McAffee, EastMedia, Mint
Digital, KGB (kind of failed startup), failed startup, Benchmark Solutions (failed finance startup), Thomson Reuters, InfluxDB
None
Organizer NYC Machine Learning (4900+ members)
Series Editor - “Data & Analytics”
Y Combinator (W13)
Time series?
Metrics
Time Series
Analytics
Events
Measurements AND Events Over Time
Data model • Databases • Time series (or tables, but
you can have millions) • Points (or rows, but column oriented)
Data [ { "name": "cpu", "columns": [ "time", "sequence_number", "value",
"host" ], "points": [ [1395168540, 1, 56.7, "foo.influxdb.com"], [1395168540, 2, 43.9, "bar.influxdb.com"] ] } ]
Everything is indexed by series and time.
Simple Install No external dependencies
brew update brew install influxdb
RPM, Debian packages ! http://influxdb.org/download
http://localhost:8083
None
None
None
None
HTTP API Web services built in
HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d
'[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'
SQL-ish select * from events where time > now() -
1h
SQL-ish select * from “series with weird chars ()*@#0982#$” where
time > now() - 1h
Where Regex select line from application_logs where line =~ /.*ERROR.*/
and time > "2014-03-01" and time < "2014-03-03"
Only scans the time range Series and time are the
primary index
Work with many series…
Select from Regex select * from /stats\.cpu\..*/ limit 1
Downsampling on the fly…
Aggregates select percentile(90, value) from response_times group by time(10m) where
time > now() - 1d
Continuous Downsampling…
Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]
Series per page id select count from events.67 where time
> now() - 7d
Continuous queries (regex downsampling) select percentile(value, 90) as value from
/^stats\.*/ group by time(5m) into percentile.90.5m.:series_name
Percentile series per host select value from percentile.90.stats.cpu.host1 where time
> now() - 4h
Data Collection Client libraries, CollectD, StatsD, Carbon ingestion, OpenTSDB (soon),
Riemann (soon)
Built-in UI
Grafana
Behind the scenes
#golang
Garbage Collector Generational won’t save us
MMAP + Unsafe?
Storage engines LevelDB, RocksDB, HyperLevelDB, LMDB
Range Deletes Wildly Expensive
Shards & Shard Spaces
Query Parser YACC & Bison
Raft Metadata, servers, cluster state
Data replication Not write scalable!
TCP + Protobuf Intra-cluster communication, queries, replication
How data is distributed
Shard type Shard struct { Id uint32 StartTime time.Time EndTime
time.Time ServerIds []uint32 }
Multiple shards per duration Named “split” in the configuration
Data for a series for a given interval exists in
a shard* *by default, but can be modified
hash(database, series) % split
Scale out with many series
Questions?
Thank you! Paul Dix @pauldix paul@influxdb.com