Evolution of a Real-Time Web Analytics Platform

Evolution of a Real-Time Web Analytics Platform

Talk about data stores in use at GoSquared at the AllYourBase conference.

5da07fa81fcb48f1146db22891cc0933?s=128

Geoff Wagstaff

October 18, 2013
Tweet

Transcript

  1. The Evolution of a Real-Time Analytics Platform Geoff Wagstaff @TheDeveloper

  2. The Now dashboard

  3. The Trends dashboard

  4. Building Real-Time Analytics Behind the “Now” dashboard

  5. Back in 2009 1 server LAMP stack Conventional hosting

  6. LiveStats v1

  7. None
  8. Meltdown!

  9. Problem? First taste of scale WRITES

  10. Reads are easy to scale Primary Writes Replica 1 Replica

    2 Replica 3 Reads Reads Reads
  11. Writes? Not so much. Primary MANY WRITES! Replica 1 Replica

    2 Replica 3 Reads Reads Reads :(
  12. Scale Horizontally

  13. Node Node Node Requests Requests Requests NginX -> PHP-FPM <-->

    Memcache
  14. Problems

  15. Stupidly high data transfer: several TB per day DB ->

    app -> DB round trips High latency on DB ops Race conditions
  16. Redis to the rescue! “Advanced in-memory key-value store”

  17. Rich Data types

  18. Rich Data types Keys Hashes Lists Sets Sorted Sets GET

    SET HGET HSET HMSET LPUSH LPOP BLPOP SADD SREM SRANGE ZADD ZREM ZRANGE ZINTERSTORE
  19. Distributed locks Service Service Service Fast counters Fan-out Pub/Sub broadcast

    Message queues redis-1 redis-2 Solved concurrency problems
  20. ACID

  21. A C I D tomic onsistent solated urable MySQL MongoDB

    Other ACID DBs:
  22. Fast

  23. Fast Redis 2.6.16 on 2.4GHz i7 MBP

  24. Single-process, one per core Run on m1.medium - 1 core,

    3.5GB memory Redis cluster is coming! Now on Elasticache Redis deployment
  25. Behind the “Trends” dashboard Building Historical Analytics

  26. Trends v1

  27. Sharded MySQL from outset Aging Unreliable Trends v1

  28. The Trends dashboard

  29. MongoDB vs Cassandra

  30. MongoDB Document store: no schema, flexible Compelling replication & sharding

    features Fast in-place field updates similar to Redis
  31. Attempt #1: Store & aggregate Document for each list item,

    timestamp and site Aggregation framework: match, group, sort Collection per list type Flexible Made app simpler Huge number of documents Slow aggregate queries: ~1s+ ✔ ✔ X X
  32. Attempt #2 Document per list, timestamp and site Collection per

    list type Faster lookups (no aggregation) Fewer documents Smaller _id Document size limit Unordered High data transfer ✔ ✔ ✔ X X X
  33. MongoStat

  34. Downsides High random I/O Document size & relocation Fragmentation Database

    lock
  35. K.O. MongoDB

  36. Cassandra Distributed hash ring: masterless Linear scalability Built for scale

    + write throughput
  37. CQL

  38. CQL SELECT sql AS cql FROM mysql WHERE query_language =

    “good” Not as scary as Column Families + Thrift SQL Schemas + Querying
  39. CQL CREATE TABLE d_aggregate_day ( sid int, ts int, s

    text, v counter PRIMARY KEY (sid, ts, s)) partition key cluster key Distributed counters!
  40. B ASE

  41. B A S E asically vailable oft-state ventually consistent

  42. Eventual consistency isn’t a problem More efficient with the disk

    Low maintenance Cheap
  43. Redis + Cassandra = win Redis as a speed layer

    + aggregator for lists Cassandra as timeseries counter storage Collector Redis Cassandra Periodic flushes to Cassandra
  44. Exploit DBs strengths Build an indestructible service Use the best

    tools for the job
  45. Thanks! Geoff Wagstaff @TheDeveloper engineering.gosquared.com