$30 off During Our Annual Pro Sale. View Details »

InfluxDB IOx data lifecycle and object store persistence

Paul Dix
January 13, 2021

InfluxDB IOx data lifecycle and object store persistence

Talk given on 1/13/2021 at the InfluxDB IOx community tech talks. This talks about how the data lifecycle is managed with incoming and queried data in IOx. It goes into details about how data is persisted to object storage and when.

Paul Dix

January 13, 2021
Tweet

More Decks by Paul Dix

Other Decks in Programming

Transcript

  1. Paul Dix – CTO & co-founder
    [email protected]
    @pauldix
    InfluxDB IOx data lifecycle
    and object store persistence

    View Slide

  2. Terms
    • Writer ID
    – Unique u32 server identifier
    • Mutable Buffer
    – in-memory writable & queryable DB
    • Read Buffer
    – in-memory read-only optimized DB
    • Object Store
    – storage backed by S3, Azure, Google, Local Disk, or Memory

    View Slide

  3. Terms
    • Partition
    – User defined area of non-overlapping data
    • Chunk
    – Block of data in a partition potentially overlapping
    • WAL Segment
    – Collection of writes, deletes, schema modifications

    View Slide

  4. The Write, Replication, & Subscription Lifecyle

    View Slide

  5. Input to Flatbuffers
    Writer ID
    Sequence Number

    View Slide

  6. The WAL Buffer

    View Slide

  7. WAL Buffer
    • Max Buffer Size
    • Segment Size
    • Persist?
    – On rollover and/or open time
    • Max Behavior
    – Reject write
    – Drop new write
    – Drop un-persisted Segment
    • Segments have monotonically increasing ID, writer ID

    View Slide

  8. WAL Buffer Example (success)
    • 300MB max, 100MB Segments
    • Writes fill up segment 1
    • Close segment, attempt to persist in background
    • New writes into segment 2, which fills up
    • Close segment, attempt to persist in background
    • New writes into segment 3, which fills up
    • Close segment, if segment 1 persisted, clear it
    • New writes (will create segment 4, or not)

    View Slide

  9. WAL Buffer Properties
    • Persistence optional
    – Replication for durability
    – Local disk (later and if persisting)
    • Steady-state takes max buffer memory
    • Segments captured on schedule or space
    • Serverless replication via segments in object store

    View Slide

  10. WAL Object Store Location
    • //wal///.segment
    • 1/mydb/wal/000/000/001.segment
    • Up to 999,999,998 segments
    – Maybe increase this by adding a few zero padding at top level
    • 1 list operation to get DB dirs
    • 3 list operations to get latest segment

    View Slide

  11. What Segments to
    Read on Restart?

    View Slide

  12. The Write, Replication, & Subscription Lifecyle

    View Slide

  13. Mutable Buffer to Chunk Persistence

    View Slide

  14. Mutable Buffer Structure
    • Data is partitioned
    – User configurable
    – Default based on time (every 2h, for instance)
    – Rows can only exist in a single partition
    • Each partition has chunks
    – Used to persist blocks of data within a partition
    – Closing a chunk and persisting it
    § Triggered on size
    § Triggered on time
    § Triggered by explicit API call
    – Chunks may have overlapping data

    View Slide

  15. Chunk Persistence to Object Store
    • Parquet files, 1 per table (measurement)
    • Metadata file
    • Tables
    • Columns & types
    § Summary Stats (min, max, count)
    • Writer Stats
    § For each writer, min and max sequence
    § For each writer, min and max segment (if applicable)
    • Catalog file (for whole DB)
    • All partitions & chunks
    • New file each chunk persisted

    View Slide

  16. Chunk Properties
    • Immutable once persisted
    • Can compact chunks
    • Bulk import new chunks
    • Read replicas
    • Load only recent chunks

    View Slide

  17. Chunk Object Store Location
    • //chunks///…
    • 1/mydb/chunks/2020-01-13/1/cpu.parquet

    View Slide

  18. Catalog Object Store Location
    • //catalog/<0 padded id>.checkpoint
    • 1/mydb/catalog/000000000001.checkpoint

    View Slide

  19. Catalog Data
    • Partitions
    • Chunks
    – Metadata (when persisted, when last queried)
    – Schema
    – Summary Stats
    – Writer Stats
    • Writer Stats on Open Chunks
    – Min Segment ID with Data

    View Slide

  20. Catalog Properties
    • Schema information instantly (validation!)
    • Cheap schema renames
    • Point-in-time recovery

    View Slide

  21. Restart/Recovery (single database)
    • Get checkpoint from catalog
    • Determine oldest WAL Segment to start from
    • Read WAL into Mutable Buffer
    – Only write into buffer for writes not persisted
    • Fetch recently queried chunks

    View Slide

  22. Handling Deletes
    • Tombstones in the WAL (and in-memory)
    • Applied at query time
    • Apply to open chunks (cached delete results)
    • Apply to read-buffer incrementally as chunks get read
    • When next chunk persisted, mark in catalog
    • Tombstones matched to each chunk it applies
    • Compact old chunks
    • In background
    • On different servers

    View Slide

  23. Delete Properties
    • They’re expensive!
    – Expensive to rewrite data
    – Should be cheap(ish) at query time
    • You can do them in the background!
    – Even on a different server
    – Compact chunks to new and write catalog
    • Drops are incredibly cheap
    – New structure means drops on measurements!

    View Slide

  24. Project Update
    • Mutable Buffer + Read Buffer Lifecycle
    – In process
    – Query (without optimizations)
    • WAL Segment Persistence PR up
    • Chunk Persistence within two weeks
    • Recovery Shortly after
    • Arrow Flight RPC!
    • Some optimization and numbers
    • Builds in Feb (hopefully)

    View Slide

  25. Thank You

    View Slide