Kick-off Meetup: Alex P " Cassandra internals, or how we ripped off Memtables"

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

What’s inside?

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Data Types ByteBuffer-backed ! Easy to Serialise Easy to compose AbstractType implements Comparator {! public abstract T compose(ByteBuffer bytes); ! public abstract ByteBuffer decompose(T value);! }

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Commit log

Slide 11

Slide 11 text

Commit Log Quite straight forward Appends writes & mutations Flushes Writes based on strategy (periodic, size, etc ) Used as a tmp storage for HA & performance

Slide 12

Slide 12 text

Memtable Replayed (to Memtable) if node crashed Never accessed directly for reads Always accessed for writes Maintained until Memtable is flushed to SSTable

Slide 13

Slide 13 text

Memtable

Slide 14

Slide 14 text

Memtable

Slide 15

Slide 15 text

Memtable Not a row cache Intermediate location for data until flushed to SSTable Uses SkipListMap for CFname / CF lookup CF extends AbstractColumnContainer

Slide 16

Slide 16 text

Memtable Container holds SortedColumns (atomic / thread safe / array / tree-map backed etc) Atomic is implemented with SnapTreeMap navigatable map (range queries) atomic, consistent iteration very fast snapshots / clones !

Slide 17

Slide 17 text

Memtable SnapTreeMap relies on Comparator internally, simple comparison due to binary data type uniformity simple range queries (head/tail/from/to/including/excluding) 100% complies to Cassandra interface / guarantees

Slide 18

Slide 18 text

SSTable

Slide 19

Slide 19 text

SSTable Sorted String Table Stored on Disk Uses Bloom-filter to reduce disk lookups Writes sequentially (performant) Uses compaction to reduce overhead (configurable)

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

SSTable Immutable Indexed (data location by key held in memory) >= 1 per CF !

Slide 22

Slide 22 text

SSTable Compaction Merges SSTables Discards Tombstones, reclaims space Refreshes location index

Slide 23

Slide 23 text

Lessons? Making an embedded in-memory store basic data types are expressible in terms of byte-buffer memory allocation / access patterns are to be addressed maintain a Commit Log for durability in concurrent environments, use concurrent data structures use data structures that give maximum available features: sorted index per column would allow searches on arbitrary column levelling / compaction allows simplifying sequencing data use approximate DSs (appx. histograms, bloom filters, sketches) to reduce overhead, get some big-O guarantees

Slide 24

Slide 24 text

@ifesdjeen