Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evolution of a Real-Time Web Analytics Platform

Evolution of a Real-Time Web Analytics Platform

Talk about data stores in use at GoSquared at the AllYourBase conference.

Geoff Wagstaff

October 18, 2013
Tweet

More Decks by Geoff Wagstaff

Other Decks in Technology

Transcript

  1. Stupidly high data transfer: several TB per day DB ->

    app -> DB round trips High latency on DB ops Race conditions
  2. Rich Data types Keys Hashes Lists Sets Sorted Sets GET

    SET HGET HSET HMSET LPUSH LPOP BLPOP SADD SREM SRANGE ZADD ZREM ZRANGE ZINTERSTORE
  3. Distributed locks Service Service Service Fast counters Fan-out Pub/Sub broadcast

    Message queues redis-1 redis-2 Solved concurrency problems
  4. Single-process, one per core Run on m1.medium - 1 core,

    3.5GB memory Redis cluster is coming! Now on Elasticache Redis deployment
  5. MongoDB Document store: no schema, flexible Compelling replication & sharding

    features Fast in-place field updates similar to Redis
  6. Attempt #1: Store & aggregate Document for each list item,

    timestamp and site Aggregation framework: match, group, sort Collection per list type Flexible Made app simpler Huge number of documents Slow aggregate queries: ~1s+ ✔ ✔ X X
  7. Attempt #2 Document per list, timestamp and site Collection per

    list type Faster lookups (no aggregation) Fewer documents Smaller _id Document size limit Unordered High data transfer ✔ ✔ ✔ X X X
  8. CQL

  9. CQL SELECT sql AS cql FROM mysql WHERE query_language =

    “good” Not as scary as Column Families + Thrift SQL Schemas + Querying
  10. CQL CREATE TABLE d_aggregate_day ( sid int, ts int, s

    text, v counter PRIMARY KEY (sid, ts, s)) partition key cluster key Distributed counters!
  11. Redis + Cassandra = win Redis as a speed layer

    + aggregator for lists Cassandra as timeseries counter storage Collector Redis Cassandra Periodic flushes to Cassandra