Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
The new InfluxDB storage engine and some query ...
Search
Paul Dix
October 15, 2015
Technology
330
1
Share
The new InfluxDB storage engine and some query language ideas
Short talk I gave at GranfaCon
Paul Dix
October 15, 2015
More Decks by Paul Dix
See All by Paul Dix
InfluxDB IOx Project Update - 2021-02-10
pauldix
0
270
InfluxDB IOx data lifecycle and object store persistence
pauldix
1
690
InfluxDB 2.0 and Flux
pauldix
1
770
Flux and InfluxDB 2.0
pauldix
1
1.5k
Querying Prometheus with Flux
pauldix
1
980
Flux (#fluxlang): a new (time series) data scripting language
pauldix
7
5.4k
At Scale, Everything is Hard
pauldix
2
750
IFQL and the future of InfluxData
pauldix
2
1.5k
Time series & monitoring with InfluxDB and the TICK stack
pauldix
0
500
Other Decks in Technology
See All in Technology
React Compiler導入から21ヶ月、いま始めるならこうやる
astatsuya
2
140
SLI/SLO、「完全に理解した」から「チョットデキル」へ
maruloop
5
540
AI飲み会幹事エージェントを作っただけなのに
ykimi
0
230
20260515 ログイン機能だけではないアカウント管理を全体で考える~サービス設計者向け~
oidfj
1
680
SREの仕事は「壊さないこと」ではなくなった 〜自律化していくシステムに、責任と判断を与えるという価値〜 / 20260515 Naoki Shimada
shift_evolve
PRO
1
180
2026-05-14 要件定義からソース管理まで!IBM Bob基礎ハンズオン
yutanonaka
0
160
CARTA HOLDINGS エンジニア向け 採用ピッチ資料 / CARTA-GUIDE-for-Engineers
carta_engineering
0
47k
サンプリングは「作る」のか「使う」のか? 分散トレースのコストと運用を両立する実践的戦略 / Why you need the tail sampling and why you don't want it
ymotongpoo
4
180
ワールドカフェ再び、そしてゴール・ルール・ロール・ツール / World Café Revisited, and the Goals-Rules-Roles-Tools
ks91
PRO
0
170
全社統制を維持しながら現場負担をどう減らすか〜プラットフォームチームとセキュリティチームで進めたSecurity Hub活用によるAWS統制の見直し〜/secjaws-security-hub-custom-insights
mhrtech
1
540
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
100
How to learn AWS Well-Architected with AWS BuilderCards: Security Edition
coosuke
PRO
0
150
Featured
See All Featured
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.8k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
540
DevOps and Value Stream Thinking: Enabling flow, efficiency and business value
helenjbeal
1
190
30 Presentation Tips
portentint
PRO
1
290
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.4k
Max Prin - Stacking Signals: How International SEO Comes Together (And Falls Apart)
techseoconnect
PRO
0
160
Imperfection Machines: The Place of Print at Facebook
scottboms
270
14k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
740
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
170
Typedesign – Prime Four
hannesfritz
42
3k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
21
1.5k
Making the Leap to Tech Lead
cromwellryan
135
9.8k
Transcript
The new InfluxDB storage engine and some query language ideas
Paul Dix CEO at InfluxDB @pauldix paul@influxdb.com
preliminary intro materials…
Everything is indexed by time and series
Shards 10/11/2015 10/12/2015 Data organized into Shards of time, each
is an underlying DB efficient to drop old data 10/13/2015 10/10/2015
InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126
InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement
InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags
InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields
InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp
InfluxDB data temperature,device=dev1,building=b1 internal=80,external=18 1443782126 Measurement Tags Fields Timestamp We
actually store up to ns scale timestamps but I couldn’t fit on the slide
Each series and field to a unique ID temperature,device=dev1,building=b1#internal temperature,device=dev1,building=b1#external
1 2
Data per ID is tuples ordered by time temperature,device=dev1,building=b1#internal temperature,device=dev1,building=b1#external
1 2 1 (1443782126,80) 2 (1443782126,18)
Storage Requirements
High write throughput to hundreds of thousands of series
Awesome read performance
Better Compression
Writes can’t block reads
Reads can’t block writes
Write multiple ranges simultaneously
Hot backups
Many databases open in a single process
InfluxDB’s Time Structured Merge Tree (TSM Tree)
InfluxDB’s Time Structured Merge Tree (TSM Tree) like LSM, but
different
Components WAL In memory cache Index Files
Components WAL In memory cache Index Files Similar to LSM
Trees
Components WAL In memory cache Index Files Similar to LSM
Trees Same
Components WAL In memory cache Index Files Similar to LSM
Trees Same like MemTables
Components WAL In memory cache Index Files Similar to LSM
Trees Same like MemTables like SSTables
awesome time series data WAL (an append only file)
awesome time series data WAL (an append only file) in
memory index
In Memory Cache // cache and flush variables cacheLock sync.RWMutex
cache map[string]Values flushCache map[string]Values temperature,device=dev1,building=b1#internal
In Memory Cache // cache and flush variables cacheLock sync.RWMutex
cache map[string]Values flushCache map[string]Values writes can come in while WAL flushes
// cache and flush variables cacheLock sync.RWMutex cache map[string]Values flushCache
map[string]Values dirtySort map[string]bool values can come in out of order. mark if so, sort at query time
Values in Memory type Value interface { Time() time.Time UnixNano()
int64 Value() interface{} Size() int }
awesome time series data WAL (an append only file) in
memory index on disk index (periodic flushes)
The Index Data File Min Time: 10000 Max Time: 29999
Data File Min Time: 30000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 Contiguous blocks of time
The Index Data File Min Time: 10000 Max Time: 29999
Data File Min Time: 15000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 can overlap
The Index cpu,host=A Min Time: 10000 Max Time: 20000 cpu,host=A
Min Time: 21000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 but a specific series must not overlap
The Index Data File Data File Data File a file
will never overlap with more than 2 others time ascending Data File Data File
Data files are read only, like LSM SSTables
The Index Data File Min Time: 10000 Max Time: 29999
Data File Min Time: 30000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999 Data File Min Time: 10000 Max Time: 99999 they periodically get compacted (like LSM)
Compacting while appending new data
Compacting while appending new data func (w *WriteLock) LockRange(min, max
int64) { // sweet code here } func (w *WriteLock) UnlockRange(min, max int64) { // sweet code here }
Compacting while appending new data func (w *WriteLock) LockRange(min, max
int64) { // sweet code here } func (w *WriteLock) UnlockRange(min, max int64) { // sweet code here } This should block until we get it
Locking happens inside each Shard
Back to the data files… Data File Min Time: 10000
Max Time: 29999 Data File Min Time: 30000 Max Time: 39999 Data File Min Time: 70000 Max Time: 99999
Data File Layout
Data File Layout Similar to SSTables
Data File Layout
Data File Layout blocks have up to 1,000 points by
default
Data File Layout
Data File Layout 4 byte position means data files can
be at most 4GB
Data Files type dataFile struct { f *os.File size uint32
mmap []byte }
Memory mapping lets the OS handle caching for you
Compressed Data Blocks
Timestamps: encoding based on precision and deltas
Timestamps (best case): Run length encoding Deltas are all the
same for a block (only requires start time, delta, and count)
Timestamps (good case): Simple8B Ann and Moffat in "Index compression
using 64-bit words"
Timestamps (worst case): raw values nano-second timestamps with large deltas
float64: double delta Facebook’s Gorilla - google: gorilla time series
facebook https://github.com/dgryski/go-tsz
booleans are bits!
int64 uses zig-zag same as from Protobufs (adding double delta
and RLE)
string uses Snappy same compression LevelDB uses (might add dictionary
compression)
How does it perform?
Compression depends greatly on the shape of your data
Write throughput depends on batching, CPU, and memory
one test: 100,000 series 100,000 points per series 10,000,000,000 total
points 5,000 points per request c3.8xlarge, writes from 4 other systems ~390,000 points/sec ~3 bytes/point (random floats, could be better)
~400 IOPS 30%-50% CPU There’s room for improvement!
Detailed writeup https://influxdb.com/docs/v0.9/concepts/storage_engine.html
Query Language Ideas
Three different kinds of functions
Aggregates select mean(value) from cpu where host = 'A' and
time > now() - 4h group by time(5m)
Transformations select derivative(value) from cpu where host = 'A' and
time > now() - 4h group by time(5m)
Selectors select min(value) from cpu where host = 'A'; and
time > now() - 4h group by time(5m)
Then there are fills select mean(value) from cpu where host
= 'A' and time > now() - 4h group by time(5m) fill(0)
How to differentiate between the different types?
How do we chain functions together? without making breaking changes
to InfluxQL
Mix jQuery style with InfluxQL SELECT mean(value).fill(previous).derivate(1s).scale(100).as(‘mvg_avg’) FROM measurement WHERE
time > now() - 4h GROUP BY time(1m)
D3 style SELECT mean(value) .fill(previous) .derivate(1s) .scale(100) .as(‘mvg_avg’) FROM measurement
WHERE time > now() - 4h GROUP BY time(1m)
Moving the FROM? SELECT from('cpu').mean(value) from('memory').mean(value) WHERE time > now()
- 4h GROUP BY time(1m)
Moving the FROM? SELECT from('cpu').mean(value) from('memory').mean(value) WHERE time > now()
- 4h GROUP BY time(1m) consistent time and filtering applied to both
JOIN SELECT join( from('errors') .count(value), from('requests') .count(value) ).fill(0) .count(value) WHERE
time > now() - 4h GROUP BY time(1m)
Thank you! Paul Dix @pauldix paul@influxdb.com