InfluxDB IOx data lifecycle and object store persistence
Talk given on 1/13/2021 at the InfluxDB IOx community tech talks. This talks about how the data lifecycle is managed with incoming and queried data in IOx. It goes into details about how data is persisted to object storage and when.
Writes fill up segment 1 • Close segment, attempt to persist in background • New writes into segment 2, which fills up • Close segment, attempt to persist in background • New writes into segment 3, which fills up • Close segment, if segment 1 persisted, clear it • New writes (will create segment 4, or not)
– Default based on time (every 2h, for instance) – Rows can only exist in a single partition • Each partition has chunks – Used to persist blocks of data within a partition – Closing a chunk and persisting it § Triggered on size § Triggered on time § Triggered by explicit API call – Chunks may have overlapping data
table (measurement) • Metadata file • Tables • Columns & types § Summary Stats (min, max, count) • Writer Stats § For each writer, min and max sequence § For each writer, min and max segment (if applicable) • Catalog file (for whole DB) • All partitions & chunks • New file each chunk persisted
Applied at query time • Apply to open chunks (cached delete results) • Apply to read-buffer incrementally as chunks get read • When next chunk persisted, mark in catalog • Tombstones matched to each chunk it applies • Compact old chunks • In background • On different servers
– Should be cheap(ish) at query time • You can do them in the background! – Even on a different server – Compact chunks to new and write catalog • Drops are incredibly cheap – New structure means drops on measurements!
In process – Query (without optimizations) • WAL Segment Persistence PR up • Chunk Persistence within two weeks • Recovery Shortly after • Arrow Flight RPC! • Some optimization and numbers • Builds in Feb (hopefully)