Kick-off Meetup: Alex P " Cassandra internals, or how we ripped off Memtables"

What’s inside?

Data Types ByteBuffer-backed ! Easy to Serialise Easy to compose
AbstractType<T> implements Comparator<ByteBuffer> {! public abstract T compose(ByteBuffer bytes); ! public abstract ByteBuffer decompose(T value);! }

Commit log

Commit Log Quite straight forward Appends writes & mutations Flushes
Writes based on strategy (periodic, size, etc ) Used as a tmp storage for HA & performance

Memtable Replayed (to Memtable) if node crashed Never accessed directly
for reads Always accessed for writes Maintained until Memtable is flushed to SSTable

Memtable

Memtable Not a row cache Intermediate location for data until
flushed to SSTable Uses SkipListMap for CFname / CF lookup CF extends AbstractColumnContainer

Memtable Container holds SortedColumns (atomic / thread safe / array
/ tree-map backed etc) Atomic is implemented with SnapTreeMap navigatable map (range queries) atomic, consistent iteration very fast snapshots / clones !

Memtable SnapTreeMap relies on Comparator internally, simple comparison due to
binary data type uniformity simple range queries (head/tail/from/to/including/excluding) 100% complies to Cassandra interface / guarantees

SSTable

SSTable Sorted String Table Stored on Disk Uses Bloom-filter to
reduce disk lookups Writes sequentially (performant) Uses compaction to reduce overhead (configurable)

SSTable Immutable Indexed (data location by key held in memory)
>= 1 per CF !

SSTable Compaction Merges SSTables Discards Tombstones, reclaims space Refreshes location
index

Lessons? Making an embedded in-memory store basic data types are
expressible in terms of byte-buffer memory allocation / access patterns are to be addressed maintain a Commit Log for durability in concurrent environments, use concurrent data structures use data structures that give maximum available features: sorted index per column would allow searches on arbitrary column levelling / compaction allows simplifying sequencing data use approximate DSs (appx. histograms, bloom filters, sketches) to reduce overhead, get some big-O guarantees

@ifesdjeen

Kick-off Meetup: Alex P " Cassandra internals, ...

Kick-off Meetup: Alex P " Cassandra internals, or how we ripped off Memtables"

Munich NoSQL

More Decks by Munich NoSQL

Featured

Transcript

What’s inside?

Data Types ByteBuffer-backed ! Easy to Serialise Easy to compose

Commit log

Commit Log Quite straight forward Appends writes & mutations Flushes

Memtable Replayed (to Memtable) if node crashed Never accessed directly

Memtable

Memtable

Memtable Not a row cache Intermediate location for data until

Memtable Container holds SortedColumns (atomic / thread safe / array

Memtable SnapTreeMap relies on Comparator internally, simple comparison due to

SSTable

SSTable Sorted String Table Stored on Disk Uses Bloom-filter to

SSTable Immutable Indexed (data location by key held in memory)

SSTable Compaction Merges SSTables Discards Tombstones, reclaims space Refreshes location

Lessons? Making an embedded in-memory store basic data types are

@ifesdjeen