Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Evolution of a Real-Time Web Analytics Platform

Evolution of a Real-Time Web Analytics Platform

Talk about data stores in use at GoSquared at the AllYourBase conference.

Geoff Wagstaff

October 18, 2013
Tweet

More Decks by Geoff Wagstaff

Other Decks in Technology

Transcript

  1. The Evolution of a Real-Time
    Analytics Platform
    Geoff Wagstaff
    @TheDeveloper

    View Slide

  2. The Now dashboard

    View Slide

  3. The Trends dashboard

    View Slide

  4. Building Real-Time Analytics
    Behind the “Now” dashboard

    View Slide

  5. Back in 2009
    1 server
    LAMP stack
    Conventional hosting

    View Slide

  6. LiveStats v1

    View Slide

  7. View Slide

  8. Meltdown!

    View Slide

  9. Problem?
    First taste of scale
    WRITES

    View Slide

  10. Reads are easy to scale
    Primary
    Writes
    Replica 1
    Replica 2
    Replica 3
    Reads
    Reads
    Reads

    View Slide

  11. Writes? Not so much.
    Primary
    MANY WRITES!
    Replica 1
    Replica 2
    Replica 3
    Reads
    Reads
    Reads
    :(

    View Slide

  12. Scale Horizontally

    View Slide

  13. Node Node Node
    Requests Requests Requests
    NginX -> PHP-FPM Memcache

    View Slide

  14. Problems

    View Slide

  15. Stupidly high data transfer: several TB per day
    DB -> app -> DB round trips
    High latency on DB ops
    Race conditions

    View Slide

  16. Redis to the rescue!
    “Advanced in-memory key-value store”

    View Slide

  17. Rich Data types

    View Slide

  18. Rich Data types
    Keys Hashes Lists Sets Sorted Sets
    GET
    SET
    HGET
    HSET
    HMSET
    LPUSH
    LPOP
    BLPOP
    SADD
    SREM
    SRANGE
    ZADD
    ZREM
    ZRANGE
    ZINTERSTORE

    View Slide

  19. Distributed locks
    Service
    Service
    Service
    Fast counters
    Fan-out Pub/Sub broadcast
    Message queues
    redis-1
    redis-2
    Solved concurrency problems

    View Slide

  20. ACID

    View Slide

  21. A
    C
    I
    D
    tomic
    onsistent
    solated
    urable
    MySQL
    MongoDB
    Other ACID DBs:

    View Slide

  22. Fast

    View Slide

  23. Fast
    Redis 2.6.16 on 2.4GHz i7 MBP

    View Slide

  24. Single-process, one per core
    Run on m1.medium - 1 core, 3.5GB memory
    Redis cluster is coming!
    Now on Elasticache
    Redis deployment

    View Slide

  25. Behind the “Trends” dashboard
    Building Historical Analytics

    View Slide

  26. Trends v1

    View Slide

  27. Sharded MySQL from outset
    Aging
    Unreliable
    Trends v1

    View Slide

  28. The Trends dashboard

    View Slide

  29. MongoDB vs Cassandra

    View Slide

  30. MongoDB
    Document store: no schema, flexible
    Compelling replication & sharding features
    Fast in-place field updates similar to Redis

    View Slide

  31. Attempt #1: Store & aggregate
    Document for each list item,
    timestamp and site
    Aggregation framework: match, group, sort
    Collection per list type
    Flexible
    Made app simpler
    Huge number of documents
    Slow aggregate queries: ~1s+


    X
    X

    View Slide

  32. Attempt #2
    Document per list, timestamp and site
    Collection per list type
    Faster lookups (no aggregation)
    Fewer documents
    Smaller _id
    Document size limit
    Unordered
    High data transfer



    X
    X
    X

    View Slide

  33. MongoStat

    View Slide

  34. Downsides
    High random I/O
    Document size & relocation
    Fragmentation
    Database lock

    View Slide

  35. K.O. MongoDB

    View Slide

  36. Cassandra
    Distributed hash ring: masterless
    Linear scalability
    Built for scale + write throughput

    View Slide

  37. CQL

    View Slide

  38. CQL
    SELECT sql AS cql FROM mysql WHERE query_language = “good”
    Not as scary as Column Families + Thrift
    SQL Schemas + Querying

    View Slide

  39. CQL
    CREATE TABLE d_aggregate_day (
    sid int,
    ts int,
    s text,
    v counter
    PRIMARY KEY (sid, ts, s))
    partition key cluster key
    Distributed counters!

    View Slide

  40. B ASE

    View Slide

  41. B A
    S
    E
    asically vailable
    oft-state
    ventually consistent

    View Slide

  42. Eventual consistency isn’t a problem
    More efficient with the disk
    Low maintenance
    Cheap

    View Slide

  43. Redis + Cassandra = win
    Redis as a speed layer + aggregator for lists
    Cassandra as timeseries counter storage
    Collector Redis Cassandra
    Periodic flushes to Cassandra

    View Slide

  44. Exploit DBs strengths
    Build an indestructible service
    Use the best tools for the job

    View Slide

  45. Thanks!
    Geoff Wagstaff
    @TheDeveloper
    engineering.gosquared.com

    View Slide