Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
InfluxDB at Paris Data Geeks
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Paul Dix
June 26, 2014
2
170
InfluxDB at Paris Data Geeks
Paul Dix
June 26, 2014
Tweet
Share
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
260
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
660
InfluxDB 2.0 and Flux
pauldix
1
760
Flux and InfluxDB 2.0
pauldix
1
1.5k
Querying Prometheus with Flux
pauldix
1
960
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5.3k
At Scale, Everything is Hard
pauldix
2
740
IFQL and the future of InfluxData
pauldix
2
1.4k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
490
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
Practical Orchestrator
shlominoach
191
11k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.4k
How to build a perfect <img>
jonoalderson
1
4.9k
Visualization
eitanlees
150
17k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.4k
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
170
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
330
The Curious Case for Waylosing
cassininazir
0
240
Why Our Code Smells
bkeepers
PRO
340
58k
How to make the Groovebox
asonas
2
1.9k
Transcript
InfluxDB - an open source distributed time series database Paul
Dix @pauldix paul@influxdb.com
About me…
Microsoft, failed startup, Air Force Space Command, McAffee, EastMedia, Mint
Digital, KGB (kind of failed startup), failed startup, Benchmark Solutions (failed finance startup), Thomson Reuters, InfluxDB
None
Organizer NYC Machine Learning (4900+ members)
Series Editor - “Data & Analytics”
Y Combinator (W13)
Time series?
Metrics
Time Series
Analytics
Events
Measurements AND Events Over Time
Data model • Databases • Time series (or tables, but
you can have millions) • Points (or rows, but column oriented)
Data [ { "name": "cpu", "columns": [ "time", "sequence_number", "value",
"host" ], "points": [ [1395168540, 1, 56.7, "foo.influxdb.com"], [1395168540, 2, 43.9, "bar.influxdb.com"] ] } ]
Everything is indexed by series and time.
Simple Install No external dependencies
brew update brew install influxdb
RPM, Debian packages ! http://influxdb.org/download
http://localhost:8083
None
None
None
None
HTTP API Web services built in
HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d
'[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'
SQL-ish select * from events where time > now() -
1h
SQL-ish select * from “series with weird chars ()*@#0982#$” where
time > now() - 1h
Where Regex select line from application_logs where line =~ /.*ERROR.*/
and time > "2014-03-01" and time < "2014-03-03"
Only scans the time range Series and time are the
primary index
Work with many series…
Select from Regex select * from /stats\.cpu\..*/ limit 1
Downsampling on the fly…
Aggregates select percentile(90, value) from response_times group by time(10m) where
time > now() - 1d
Continuous Downsampling…
Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]
Series per page id select count from events.67 where time
> now() - 7d
Continuous queries (regex downsampling) select percentile(value, 90) as value from
/^stats\.*/ group by time(5m) into percentile.90.5m.:series_name
Percentile series per host select value from percentile.90.stats.cpu.host1 where time
> now() - 4h
Data Collection Client libraries, CollectD, StatsD, Carbon ingestion, OpenTSDB (soon),
Riemann (soon)
Built-in UI
Grafana
Behind the scenes
#golang
Garbage Collector Generational won’t save us
MMAP + Unsafe?
Storage engines LevelDB, RocksDB, HyperLevelDB, LMDB
Range Deletes Wildly Expensive
Shards & Shard Spaces
Query Parser YACC & Bison
Raft Metadata, servers, cluster state
Data replication Not write scalable!
TCP + Protobuf Intra-cluster communication, queries, replication
How data is distributed
Shard type Shard struct { Id uint32 StartTime time.Time EndTime
time.Time ServerIds []uint32 }
Multiple shards per duration Named “split” in the configuration
Data for a series for a given interval exists in
a shard* *by default, but can be modified
hash(database, series) % split
Scale out with many series
Questions?
Thank you! Paul Dix @pauldix paul@influxdb.com