Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cassandra for Data Analytics Backends

αλεx π
September 24, 2015

Cassandra for Data Analytics Backends

αλεx π

September 24, 2015
Tweet

More Decks by αλεx π

Other Decks in Research

Transcript

  1. Used to avoid timestamp resolution collisions To ensure sub-resolution order

    Snapshot the data on overflow or timeout Ensures idempotence Sequence ID
  2. ts1 ts2 ts3 ts4 ts5 ts6 ts7 ts8 ts9 ts10

    ts11 ts12 ts13 Range Tables
  3. Full Table Scan ts1 ts2 ts3 ts4 ts5 ts6 ts7

    ts8 ts9 ts10 ts11 ts12 ts13 Start End
  4. Open Range Start End ts1 ts2 ts3 ts4 ts5 ts6

    ts7 ts8 ts9 ts10 ts11 ts12 ts13
  5. “Between” Range ts1 ts2 ts3 ts4 ts5 ts6 ts7 ts8

    ts9 ts10 ts11 ts12 ts13 Start End
  6. data Step data cursor = Yield data !cursor | Skip

    !cursor | Done data Stream data = ∃s. Stream (cursor → Step data cursor) cursor
  7. map Yield data cursor → Yield (f cursor) cursor Skip

    cursor → Skip cursor Done → Done maps :: (a → b) → Stream a → Stream b
  8. filter Yield data cursor | p data → Yield data

    cursor | otherwise → Skip cursor Skip cursor → Skip cursor Done → Done filters :: (a → Bool) → Stream a → Stream a
  9. reduce/fold Yield x cursor → loop (f data x) cursor

    Skip cursor → loop data cursor Done → z foldls :: (Monoid acc) => (acc → a → acc) → acc → Stream a → acc
  10. Append class Monoid a where mempty :: a mappend ::

    a -> a -> a -- ^ Identity of 'mappend' -- ^ An associative operation
  11. data Count = Count Int instance Monoid Count where mempty

    = Count 0 mappend (Count a) (Count b) = Count $ a + b instance Aggregate Count Int where combine (Count a) = a Count Example
  12. 0 8 16 24 32 40 n*8 +----+----+----+----+----+----+----+----+ | α

    | α | α | α | α | ... | α | +----+----+----+----+----+----+----+----+ byte address points 1 2 3 4 0 n
  13. 0 8 16 24 32 40 n*8 +----+----+----+----+----+---------+----+ | α

    | α | α | α | α | ... | α | +----+----+----+----+----+---------+----+ 01 02 03 04 00 1n n*8+ 0 8 16 24 32 40 n*8 +----+----+----+----+----+---------+----+ | α | α | α | α | α | ... | α | +----+----+----+----+----+---------+----+ 01 02 03 04 00 1n m*n*8+ 0 8 16 24 32 40 n*8 +----+----+----+----+----+---------+----+ | α | α | α | α | α | ... | α | +----+----+----+----+----+---------+----+ m1 m2 m3 m4 m0 mn
  14. Advantages No serialisation overhead Fast relative access Easy to go

    multi-dimensional Easy to implement atomic in-memory operations
  15. P(X | blue)= Number of Blue near X Total number

    of blue P(X | red)= Number of Red near X Total number of Red
  16. 0 8 16 +---------+---------+ | Mean(x )| Var(x ) |

    +---------+---------+ 0 0 16 24 32 +---------+---------+ | Mean(x )| Var(x ) | +---------+---------+ 1 1 2n*8 (2n+1)*8 +---------+---------+ | Mean(x )| Var(x ) | +---------+---------+ n n byte address payloads
  17. 0 8 +---+---+---+---+---+---+---+---+ | 0 | 0 | 0 |

    0 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+ 8 16 +---+---+---+---+---+---+---+---+ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+ 16 24 +---+---+---+---+---+---+---+---+ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+ 24 32 +---+---+---+---+---+---+---+---+ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +---+---+---+---+---+---+---+---+ bit address
  18. Advantages 64 bits per 8-byte Long Easy to represent by

    the long-array using offsets, bit shifts and masks Easy to implement atomic in-memory operations