Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
InfluxDB at Paris Data Geeks
Search
Paul Dix
June 26, 2014
2
170
InfluxDB at Paris Data Geeks
Paul Dix
June 26, 2014
Tweet
Share
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
260
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
660
InfluxDB 2.0 and Flux
pauldix
1
760
Flux and InfluxDB 2.0
pauldix
1
1.5k
Querying Prometheus with Flux
pauldix
1
960
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5.3k
At Scale, Everything is Hard
pauldix
2
740
IFQL and the future of InfluxData
pauldix
2
1.4k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
490
Featured
See All Featured
RailsConf 2023
tenderlove
30
1.3k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
66
37k
Gemini Prompt Engineering: Practical Techniques for Tangible AI Outcomes
mfonobong
2
280
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
120
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
350
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Breaking role norms: Why Content Design is so much more than writing copy - Taylor Woolridge
uxyall
0
170
Into the Great Unknown - MozCon
thekraken
40
2.3k
Design in an AI World
tapps
0
140
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
360
The Language of Interfaces
destraynor
162
26k
Transcript
InfluxDB - an open source distributed time series database Paul
Dix @pauldix paul@influxdb.com
About me…
Microsoft, failed startup, Air Force Space Command, McAffee, EastMedia, Mint
Digital, KGB (kind of failed startup), failed startup, Benchmark Solutions (failed finance startup), Thomson Reuters, InfluxDB
None
Organizer NYC Machine Learning (4900+ members)
Series Editor - “Data & Analytics”
Y Combinator (W13)
Time series?
Metrics
Time Series
Analytics
Events
Measurements AND Events Over Time
Data model • Databases • Time series (or tables, but
you can have millions) • Points (or rows, but column oriented)
Data [ { "name": "cpu", "columns": [ "time", "sequence_number", "value",
"host" ], "points": [ [1395168540, 1, 56.7, "foo.influxdb.com"], [1395168540, 2, 43.9, "bar.influxdb.com"] ] } ]
Everything is indexed by series and time.
Simple Install No external dependencies
brew update brew install influxdb
RPM, Debian packages ! http://influxdb.org/download
http://localhost:8083
None
None
None
None
HTTP API Web services built in
HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d
'[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'
SQL-ish select * from events where time > now() -
1h
SQL-ish select * from “series with weird chars ()*@#0982#$” where
time > now() - 1h
Where Regex select line from application_logs where line =~ /.*ERROR.*/
and time > "2014-03-01" and time < "2014-03-03"
Only scans the time range Series and time are the
primary index
Work with many series…
Select from Regex select * from /stats\.cpu\..*/ limit 1
Downsampling on the fly…
Aggregates select percentile(90, value) from response_times group by time(10m) where
time > now() - 1d
Continuous Downsampling…
Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]
Series per page id select count from events.67 where time
> now() - 7d
Continuous queries (regex downsampling) select percentile(value, 90) as value from
/^stats\.*/ group by time(5m) into percentile.90.5m.:series_name
Percentile series per host select value from percentile.90.stats.cpu.host1 where time
> now() - 4h
Data Collection Client libraries, CollectD, StatsD, Carbon ingestion, OpenTSDB (soon),
Riemann (soon)
Built-in UI
Grafana
Behind the scenes
#golang
Garbage Collector Generational won’t save us
MMAP + Unsafe?
Storage engines LevelDB, RocksDB, HyperLevelDB, LMDB
Range Deletes Wildly Expensive
Shards & Shard Spaces
Query Parser YACC & Bison
Raft Metadata, servers, cluster state
Data replication Not write scalable!
TCP + Protobuf Intra-cluster communication, queries, replication
How data is distributed
Shard type Shard struct { Id uint32 StartTime time.Time EndTime
time.Time ServerIds []uint32 }
Multiple shards per duration Named “split” in the configuration
Data for a series for a given interval exists in
a shard* *by default, but can be modified
hash(database, series) % split
Scale out with many series
Questions?
Thank you! Paul Dix @pauldix paul@influxdb.com