Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
InfluxDB at Paris Data Geeks
Search
Paul Dix
June 26, 2014
2
160
InfluxDB at Paris Data Geeks
Paul Dix
June 26, 2014
Tweet
Share
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
200
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
560
InfluxDB 2.0 and Flux
pauldix
1
670
Flux and InfluxDB 2.0
pauldix
1
1.3k
Querying Prometheus with Flux
pauldix
1
790
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5k
At Scale, Everything is Hard
pauldix
2
650
IFQL and the future of InfluxData
pauldix
2
1.3k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
410
Featured
See All Featured
10 Git Anti Patterns You Should be Aware of
lemiorhan
655
59k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
44
2.2k
Speed Design
sergeychernyshev
25
620
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Producing Creativity
orderedlist
PRO
341
39k
How GitHub (no longer) Works
holman
310
140k
StorybookのUI Testing Handbookを読んだ
zakiyama
27
5.3k
Agile that works and the tools we love
rasmusluckow
327
21k
We Have a Design System, Now What?
morganepeng
50
7.2k
Designing Experiences People Love
moore
138
23k
Site-Speed That Sticks
csswizardry
0
33
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
329
21k
Transcript
InfluxDB - an open source distributed time series database Paul
Dix @pauldix paul@influxdb.com
About me…
Microsoft, failed startup, Air Force Space Command, McAffee, EastMedia, Mint
Digital, KGB (kind of failed startup), failed startup, Benchmark Solutions (failed finance startup), Thomson Reuters, InfluxDB
None
Organizer NYC Machine Learning (4900+ members)
Series Editor - “Data & Analytics”
Y Combinator (W13)
Time series?
Metrics
Time Series
Analytics
Events
Measurements AND Events Over Time
Data model • Databases • Time series (or tables, but
you can have millions) • Points (or rows, but column oriented)
Data [ { "name": "cpu", "columns": [ "time", "sequence_number", "value",
"host" ], "points": [ [1395168540, 1, 56.7, "foo.influxdb.com"], [1395168540, 2, 43.9, "bar.influxdb.com"] ] } ]
Everything is indexed by series and time.
Simple Install No external dependencies
brew update brew install influxdb
RPM, Debian packages ! http://influxdb.org/download
http://localhost:8083
None
None
None
None
HTTP API Web services built in
HTTP API (writes) curl -X POST \ 'http://localhost:8086/db/mydb/series?u=paul&p=pass' \ -d
'[{"name":"foo", "columns":["val"], "points": [[3]]}]'
Data (with timestamp) [ { "name": "cpu", "columns": ["time", "value",
"host"], "points": [ [1395168540, 56.7, "foo.influxdb.com"], [1395168540, 43.9, "bar.influxdb.com"] ] } ]
HTTP API (queries) curl 'http://localhost:8086/db/mydb/series?u=paul&p=pass&q=.'
SQL-ish select * from events where time > now() -
1h
SQL-ish select * from “series with weird chars ()*@#0982#$” where
time > now() - 1h
Where Regex select line from application_logs where line =~ /.*ERROR.*/
and time > "2014-03-01" and time < "2014-03-03"
Only scans the time range Series and time are the
primary index
Work with many series…
Select from Regex select * from /stats\.cpu\..*/ limit 1
Downsampling on the fly…
Aggregates select percentile(90, value) from response_times group by time(10m) where
time > now() - 1d
Continuous Downsampling…
Continuous queries (summaries) select count(page_id) from events group by time(1h),
page_id into events.[page_id]
Series per page id select count from events.67 where time
> now() - 7d
Continuous queries (regex downsampling) select percentile(value, 90) as value from
/^stats\.*/ group by time(5m) into percentile.90.5m.:series_name
Percentile series per host select value from percentile.90.stats.cpu.host1 where time
> now() - 4h
Data Collection Client libraries, CollectD, StatsD, Carbon ingestion, OpenTSDB (soon),
Riemann (soon)
Built-in UI
Grafana
Behind the scenes
#golang
Garbage Collector Generational won’t save us
MMAP + Unsafe?
Storage engines LevelDB, RocksDB, HyperLevelDB, LMDB
Range Deletes Wildly Expensive
Shards & Shard Spaces
Query Parser YACC & Bison
Raft Metadata, servers, cluster state
Data replication Not write scalable!
TCP + Protobuf Intra-cluster communication, queries, replication
How data is distributed
Shard type Shard struct { Id uint32 StartTime time.Time EndTime
time.Time ServerIds []uint32 }
Multiple shards per duration Named “split” in the configuration
Data for a series for a given interval exists in
a shard* *by default, but can be modified
hash(database, series) % split
Scale out with many series
Questions?
Thank you! Paul Dix @pauldix paul@influxdb.com