Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kick-off Meetup: Alex P " Cassandra internals, ...

Munich NoSQL
April 24, 2014
37

Kick-off Meetup: Alex P " Cassandra internals, or how we ripped off Memtables"

Munich NoSQL

April 24, 2014
Tweet

Transcript

  1. Data Types ByteBuffer-backed ! Easy to Serialise Easy to compose

    AbstractType<T> implements Comparator<ByteBuffer> {! public abstract T compose(ByteBuffer bytes); ! public abstract ByteBuffer decompose(T value);! }
  2. Commit Log Quite straight forward Appends writes & mutations Flushes

    Writes based on strategy (periodic, size, etc ) Used as a tmp storage for HA & performance
  3. Memtable Replayed (to Memtable) if node crashed Never accessed directly

    for reads Always accessed for writes Maintained until Memtable is flushed to SSTable
  4. Memtable Not a row cache Intermediate location for data until

    flushed to SSTable Uses SkipListMap for CFname / CF lookup CF extends AbstractColumnContainer
  5. Memtable Container holds SortedColumns (atomic / thread safe / array

    / tree-map backed etc) Atomic is implemented with SnapTreeMap navigatable map (range queries) atomic, consistent iteration very fast snapshots / clones !
  6. Memtable SnapTreeMap relies on Comparator internally, simple comparison due to

    binary data type uniformity simple range queries (head/tail/from/to/including/excluding) 100% complies to Cassandra interface / guarantees
  7. SSTable Sorted String Table Stored on Disk Uses Bloom-filter to

    reduce disk lookups Writes sequentially (performant) Uses compaction to reduce overhead (configurable)
  8. Lessons? Making an embedded in-memory store basic data types are

    expressible in terms of byte-buffer memory allocation / access patterns are to be addressed maintain a Commit Log for durability in concurrent environments, use concurrent data structures use data structures that give maximum available features: sorted index per column would allow searches on arbitrary column levelling / compaction allows simplifying sequencing data use approximate DSs (appx. histograms, bloom filters, sketches) to reduce overhead, get some big-O guarantees