Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
InfluxDB at Paris Data Geeks
Search
Paul Dix
June 26, 2014
2
160
InfluxDB at Paris Data Geeks
Paul Dix
June 26, 2014
Tweet
Share
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
210
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
570
InfluxDB 2.0 and Flux
pauldix
1
680
Flux and InfluxDB 2.0
pauldix
1
1.3k
Querying Prometheus with Flux
pauldix
1
800
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5k
At Scale, Everything is Hard
pauldix
2
660
IFQL and the future of InfluxData
pauldix
2
1.3k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
420
Featured
See All Featured
Typedesign – Prime Four
hannesfritz
40
2.4k
Thoughts on Productivity
jonyablonski
68
4.4k
YesSQL, Process and Tooling at Scale
rocio
170
14k
Measuring & Analyzing Core Web Vitals
bluesmoon
4
180
The World Runs on Bad Software
bkeepers
PRO
66
11k
For a Future-Friendly Web
brad_frost
175
9.4k
GraphQLとの向き合い方2022年版
quramy
44
13k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.1k
Raft: Consensus for Rubyists
vanstee
137
6.7k
Statistics for Hackers
jakevdp
796
220k
Transcript
InfluxDB - an open source distributed time series database Paul
Dix @pauldix paul@influxdb.com
About me…
Microsoft, failed startup, Air Force Space Command, McAffee, EastMedia, Mint
Digital, KGB (kind of failed startup), failed startup, Benchmark Solutions (failed finance startup), Thomson Reuters, InfluxDB
None
Organizer NYC Machine Learning (4900+ members)
Series Editor - “Data & Analytics”
Y Combinator (W13)
Time series?
Metrics
Time Series
Analytics
Events
Measurements AND Events Over Time
Data model • Databases • Time series (or tables, but
you can have millions) • Points (or rows, but column oriented)
Data [ { "name": "cpu", "columns": [ "time", "sequence_number", "value",
"host" ], "points": [ [1395168540, 1, 56.7, "foo.influxdb.com"], [1395168540, 2, 43.9, "bar.influxdb.com"] ] } ]
Everything is indexed by series and time.
Simple Install No external dependencies
brew update brew install influxdb
RPM, Debian packages ! http://influxdb.org/download
http://localhost:8083
None
None
None
None
HTTP API Web services built in
HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d
'[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'
SQL-ish select * from events where time > now() -
1h
SQL-ish select * from “series with weird chars ()*@#0982#$” where
time > now() - 1h
Where Regex select line from application_logs where line =~ /.*ERROR.*/
and time > "2014-03-01" and time < "2014-03-03"
Only scans the time range Series and time are the
primary index
Work with many series…
Select from Regex select * from /stats\.cpu\..*/ limit 1
Downsampling on the fly…
Aggregates select percentile(90, value) from response_times group by time(10m) where
time > now() - 1d
Continuous Downsampling…
Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]
Series per page id select count from events.67 where time
> now() - 7d
Continuous queries (regex downsampling) select percentile(value, 90) as value from
/^stats\.*/ group by time(5m) into percentile.90.5m.:series_name
Percentile series per host select value from percentile.90.stats.cpu.host1 where time
> now() - 4h
Data Collection Client libraries, CollectD, StatsD, Carbon ingestion, OpenTSDB (soon),
Riemann (soon)
Built-in UI
Grafana
Behind the scenes
#golang
Garbage Collector Generational won’t save us
MMAP + Unsafe?
Storage engines LevelDB, RocksDB, HyperLevelDB, LMDB
Range Deletes Wildly Expensive
Shards & Shard Spaces
Query Parser YACC & Bison
Raft Metadata, servers, cluster state
Data replication Not write scalable!
TCP + Protobuf Intra-cluster communication, queries, replication
How data is distributed
Shard type Shard struct { Id uint32 StartTime time.Time EndTime
time.Time ServerIds []uint32 }
Multiple shards per duration Named “split” in the configuration
Data for a series for a given interval exists in
a shard* *by default, but can be modified
hash(database, series) % split
Scale out with many series
Questions?
Thank you! Paul Dix @pauldix paul@influxdb.com