Slide 1

Slide 1 text

Alex Petrov Scalable time-series applications with Cassandra

Slide 2

Slide 2 text

Cyanite past, present and future

Slide 3

Slide 3 text

Requirements

Slide 4

Slide 4 text

Throughput Incoming data Aggregates Read queries

Slide 5

Slide 5 text

Scalability Paths / metrics Historical data Readers / writers

Slide 6

Slide 6 text

vs raw Aggregated

Slide 7

Slide 7 text

Predefined list of reports to aggregate Reducing amount of data points Faster queries Precision loss due to aggregation

Slide 8

Slide 8 text

“servers.*.workers.busyWorkers”: “10s:7d,1m:21d,15m:5y”

Slide 9

Slide 9 text

Graphite

Slide 10

Slide 10 text

Store numeric time-series data Render graphs of this data on demand https://graphiteapp.org/

Slide 11

Slide 11 text

Ecosystem

Slide 12

Slide 12 text

carbon: listens for time-series data whisper: DB for storing TS data graphite-web: UI & API for graphs

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Downsides Single-host solution Plenty of disk seeks Optimised for space Sharding / replication is “manual”

Slide 16

Slide 16 text

Scaling Cluster topology stored on every node Manual replication Changing cluster topology non-trivial

Slide 17

Slide 17 text

Scaling Stateless service Automatic shard assignment Replication Easy management Easy topology changes

Slide 18

Slide 18 text

reminds you of anything?

Slide 19

Slide 19 text

Cassandra perfect match

Slide 20

Slide 20 text

Cyanite Stateless Async I/O Custom scheduler No Whisper files Distributed Horizontally scalable

Slide 21

Slide 21 text

Cyanite responsibilities Carbon-compatible listener Aggregate data in-memory Flush aggregates to Cassandra Path storage Retrieve paths for query Aggregate the query results

Slide 22

Slide 22 text

Cassandra features Metric expiry, TTL User Defined Types SASI Indexes, LIKE queries Aggregate Functions Offline data loading Clustering range queries IN partition key locking

Slide 23

Slide 23 text

Globbing index

Slide 24

Slide 24 text

app.cluster.server.subsystem.metric

Slide 25

Slide 25 text

Built with SASI indexes Fast, scalable queries Optimised for glob

Slide 26

Slide 26 text

CREATE TABLE segment ( parent text, segment text, pos int, length int, leaf boolean, PRIMARY KEY (parent, segment))

Slide 27

Slide 27 text

app.cluster.server.subsystem.metric

Slide 28

Slide 28 text

SELECT * FROM segments WHERE parent = 'root' AND pos = 1 Wildcard: *

Slide 29

Slide 29 text

SELECT * FROM segments WHERE pos = 4 Postfix: *.*.*.*.metric

Slide 30

Slide 30 text

SELECT * FROM segments WHERE parent = 'app' AND pos = 2 ALLOW FILTERING Prefix: app.*

Slide 31

Slide 31 text

SELECT * FROM SEGMENTS WHERE pos = 3 AND segment LIKE 'abc.%' ALLOW FILTERING Suffix: abc.*.metric

Slide 32

Slide 32 text

engine Query

Slide 33

Slide 33 text

2-Step transformations: Inner Outer Cross metric engine Query

Slide 34

Slide 34 text

a.b.c1 a.b.c2 scale(a.b.*, 10.0) {"a.b.c1" [1 2 3] "a.b.c2" [5 6 7]} {"a.b.c1" [10 20 30] "a.b.c2" [50 60 70]}

Slide 35

Slide 35 text

a.b.c derivative(a.b.c) {"a.b.c" [1 3 6]} {“derivative(a.b.c)" [nil 2 3]}

Slide 36

Slide 36 text

a.b.c1 a.b.c2 sumSeries(a.b.c1,a.b.c2) {"a.b.c1" [1 1 1] "a.b.c2" [2 2 2]} {“sumSeries(a.b.c,a.b.d)" [3 3 3]}

Slide 37

Slide 37 text

model Data

Slide 38

Slide 38 text

CREATE TYPE IF NOT EXISTS metric_resolution ( precision int, period int ); CREATE TYPE IF NOT EXISTS metric_id ( path text, resolution frozen );

Slide 39

Slide 39 text

CREATE TYPE IF NOT EXISTS metric_point ( max double, mean double, min double, sum double );

Slide 40

Slide 40 text

Scaling

Slide 41

Slide 41 text

Scaling cyanite Use DTSC Cyanite is stateless For high loads, split readers and writers Load-balance with HAProxy Colocate Cyanite & Cassandra

Slide 42

Slide 42 text

soon Coming

Slide 43

Slide 43 text

P-Square histograms T-Digest quantiles Gorilla Compression Cassandra aggregates Custom glob indexes Scheduler soon Coming

Slide 44

Slide 44 text

Kafka ingester Statsd protocol support Standalone cyanite Dynamic thresholds soon Coming

Slide 45

Slide 45 text

No content