PolyConf 2015 - Rocking the Time Series boat with C, Haskell and ClojureScript

ClojureWerkz 35+ high-quality Clojure libraries User reports from all over
the world 20+ active contributors We value documentation

Introduction

Being Ukrainian

Re-Learning everything

Evaluating Ideas

Fighting Compiler

Patching GHC

Talking to the People

Time Series Data Monotonically increasing primary key Write-only data Range
Queries Rolling Aggregates

Time Series Data LevelDB backend Tight Combination of Haskell and
C Optimising space, reads Flexible aggregates Parallel queries

Start With Querying

How do I Parallel?

3-Step Aggregate? Local Aggregate (parallel) Append Combine

Used to avoid timestamp resolution collisions To ensure sub-resolution order
Snapshot the data on overflow or timeout Ensures idempotence Sequence ID

1 2 3 4 5 6 7 8 9 10
11 12 13 Range Tables store the snapshotted ranges

Full Table Scan 1 2 3 4 5 6 7
8 9 10 11 12 13 Start End

1 2 3 4 5 6 7 8 9 10
11 12 13

Open Range 1 2 3 4 5 6 7 8
9 10 11 12 13 Start End

1 2 3 4 5 6 7 8 9 10
11 12 13

“Between” Range 1 2 3 4 5 6 7 8
9 10 11 12 13 Start End

1 2 3 4 5 6 7 8 9 10
11 12 13

Stream Fusion Reading Data from LevelDB

data Step a s = Yield a !s | Skip
!s | Done data Stream a = ∃s. Stream (s → Step a s) s

maps :: (a → b) → Stream a → Stream
b maps f (Stream next0 s0 ) = Stream next s0 where next !s = case next0 of Done → Done Skip s' → Skip s' Yield x s' → Yield (f x) s'

filters :: (a → Bool) → Stream a → Stream
a filters p (Stream next0 s0) = Stream next s0 where next !s = case next0 s of Done → Done Skip s' → Skip s' Yield x s' | p x → Yield x s' | otherwise → Skip s'

foldls :: (b → a → b) → b →
Stream a → b foldls f z (Stream next s0) = loop z s0 where loop z s = case next s of Yield x s' → loop (f z x) s' Skip s' → loop z s' Done → z

Local Aggregate (parallel) Append Combine

data (Monoid b) => Fold a b = ∃x. Fold
(x → a → x) x (x → b) step initial finalize Append class Monoid a where mempty :: a mappend :: a -> a -> a -- ^ Identity of 'mappend' -- ^ An associative operation

class (Monoid intermediate) => Aggregate intermediate end where combine ::
intermediate -> end Combine

Count data Count = Count Int op_count :: ∃a. Fold
a Count op_count = Fold (\i _ -> i + 1) 0 Count instance Monoid Count where mempty = Count 0 mappend (Count a) (Count b) = Count $ a + b instance Aggregate Count Int where combine (Count a) = a

Mean data (Num a) => Mean a = Mean [a]
op_mean :: (Integral a) => Fold a (Mean a) op_mean = Fold (flip (:)) [] Mean instance (Integral a) => Monoid (Mean a) where mempty = Mean [] mappend (Mean a) (Mean b) = Mean $ a ++ b instance (Integral a) => Aggregate (Mean a) Double where combine (Mean []) = 0 combine (Mean a) = s / l where s = fromIntegral $ sum a l = fromIntegral $ length a

Other examples Streaming Histogram Median Percentiles You name it

Combining Queries

Group op_groupBy :: (Ord a, Monoid b) => (r ->
Maybe a) -> (Fold r b) -> Fold r (MapResult a b) op_groupBy groupFn (Fold f z0 e) = let subStep n Nothing = return $! (f z0 n) subStep n (Just a) = return $! (f a n) localStep m record = maybe m (\r -> Map.alter (subStep record) r m) (groupFn record) done a = MapResult $ Map.map e a in Fold localStep Map.empty done

Other examples Several Aggregates in one run Group by field,
time or combination Nested aggregates of any type

Break Down the Queries Aggregate each part independently Combine aggregates
Transform the Result Recap

Binary Data Format Optimising even further

DbSchema [ ("field1", DbtLong) , ("field2", DbtString) , ("field3", DbtShort)]

Offset Reads

Fast reads Partial payload decoding Field Names are implicit

Schema Migrations

Adding fields appends to the end Written data is unchanged
or nullified Removed fields are ignored and unavailable

Consensus

Snapshot consensus Rolling CRC of the data Asynchronous No quorum
for snapshot reads Parallel Reads from Snapshotted Data

Future Plans

Conclusions

Partial Decoding Parallel Queries Composable, extendable query system Lightweight Consensus
Lightweight Data Format

@ifesdjeen

PolyConf 2015 - Rocking the Time Series boat wi...

PolyConf 2015 - Rocking the Time Series boat with C, Haskell and ClojureScript

More Decks by αλεx π

Other Decks in Research

Featured

Transcript