Slide 1

Slide 1 text

The Evolution of a Real-Time Analytics Platform Geoff Wagstaff @TheDeveloper

Slide 2

Slide 2 text

The Now dashboard

Slide 3

Slide 3 text

The Trends dashboard

Slide 4

Slide 4 text

Building Real-Time Analytics Behind the “Now” dashboard

Slide 5

Slide 5 text

Back in 2009 1 server LAMP stack Conventional hosting

Slide 6

Slide 6 text

LiveStats v1

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Meltdown!

Slide 9

Slide 9 text

Problem? First taste of scale WRITES

Slide 10

Slide 10 text

Reads are easy to scale Primary Writes Replica 1 Replica 2 Replica 3 Reads Reads Reads

Slide 11

Slide 11 text

Writes? Not so much. Primary MANY WRITES! Replica 1 Replica 2 Replica 3 Reads Reads Reads :(

Slide 12

Slide 12 text

Scale Horizontally

Slide 13

Slide 13 text

Node Node Node Requests Requests Requests NginX -> PHP-FPM <--> Memcache

Slide 14

Slide 14 text

Problems

Slide 15

Slide 15 text

Stupidly high data transfer: several TB per day DB -> app -> DB round trips High latency on DB ops Race conditions

Slide 16

Slide 16 text

Redis to the rescue! “Advanced in-memory key-value store”

Slide 17

Slide 17 text

Rich Data types

Slide 18

Slide 18 text

Rich Data types Keys Hashes Lists Sets Sorted Sets GET SET HGET HSET HMSET LPUSH LPOP BLPOP SADD SREM SRANGE ZADD ZREM ZRANGE ZINTERSTORE

Slide 19

Slide 19 text

Distributed locks Service Service Service Fast counters Fan-out Pub/Sub broadcast Message queues redis-1 redis-2 Solved concurrency problems

Slide 20

Slide 20 text

ACID

Slide 21

Slide 21 text

A C I D tomic onsistent solated urable MySQL MongoDB Other ACID DBs:

Slide 22

Slide 22 text

Fast

Slide 23

Slide 23 text

Fast Redis 2.6.16 on 2.4GHz i7 MBP

Slide 24

Slide 24 text

Single-process, one per core Run on m1.medium - 1 core, 3.5GB memory Redis cluster is coming! Now on Elasticache Redis deployment

Slide 25

Slide 25 text

Behind the “Trends” dashboard Building Historical Analytics

Slide 26

Slide 26 text

Trends v1

Slide 27

Slide 27 text

Sharded MySQL from outset Aging Unreliable Trends v1

Slide 28

Slide 28 text

The Trends dashboard

Slide 29

Slide 29 text

MongoDB vs Cassandra

Slide 30

Slide 30 text

MongoDB Document store: no schema, flexible Compelling replication & sharding features Fast in-place field updates similar to Redis

Slide 31

Slide 31 text

Attempt #1: Store & aggregate Document for each list item, timestamp and site Aggregation framework: match, group, sort Collection per list type Flexible Made app simpler Huge number of documents Slow aggregate queries: ~1s+ ✔ ✔ X X

Slide 32

Slide 32 text

Attempt #2 Document per list, timestamp and site Collection per list type Faster lookups (no aggregation) Fewer documents Smaller _id Document size limit Unordered High data transfer ✔ ✔ ✔ X X X

Slide 33

Slide 33 text

MongoStat

Slide 34

Slide 34 text

Downsides High random I/O Document size & relocation Fragmentation Database lock

Slide 35

Slide 35 text

K.O. MongoDB

Slide 36

Slide 36 text

Cassandra Distributed hash ring: masterless Linear scalability Built for scale + write throughput

Slide 37

Slide 37 text

CQL

Slide 38

Slide 38 text

CQL SELECT sql AS cql FROM mysql WHERE query_language = “good” Not as scary as Column Families + Thrift SQL Schemas + Querying

Slide 39

Slide 39 text

CQL CREATE TABLE d_aggregate_day ( sid int, ts int, s text, v counter PRIMARY KEY (sid, ts, s)) partition key cluster key Distributed counters!

Slide 40

Slide 40 text

B ASE

Slide 41

Slide 41 text

B A S E asically vailable oft-state ventually consistent

Slide 42

Slide 42 text

Eventual consistency isn’t a problem More efficient with the disk Low maintenance Cheap

Slide 43

Slide 43 text

Redis + Cassandra = win Redis as a speed layer + aggregator for lists Cassandra as timeseries counter storage Collector Redis Cassandra Periodic flushes to Cassandra

Slide 44

Slide 44 text

Exploit DBs strengths Build an indestructible service Use the best tools for the job

Slide 45

Slide 45 text

Thanks! Geoff Wagstaff @TheDeveloper engineering.gosquared.com