Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Scalable Event Service with Cassandra: Design to Code

Building a Scalable Event Service with Cassandra: Design to Code

Cassandra Summit Europe 2014

Tareq Abedrabbo

December 04, 2014
Tweet

More Decks by Tareq Abedrabbo

Other Decks in Technology

Transcript

  1. Building a Scalable Event
    Service with Cassandra
    Design to Code
    David Borsos & Tareq Abedrabbo
    Cassandra Summit Europe 2014

    View full-size slide

  2. About Us
    David Borsos
    Senior Consultant at
    OpenCredo
    Tareq Abedrabbo
    CTO at OpenCredo

    View full-size slide

  3. This talk is about…
    ‣ What we built
    ‣ Why we built it
    ‣ How we built it

    View full-size slide

  4. Project background

    View full-size slide

  5. • High street retailer
    • Microservices, event-driven architecture
    • Java, Cassandra, Cloud Foundry, RabbitMQ

    View full-size slide

  6. Why do we need an
    event service?

    View full-size slide

  7. ✓ Capture millions of platform and business events
    ✓ Trigger downstream processes asynchronously
    ✓ Customise standard processes in a non-intrusive way
    ✓ Provide a system-wide transaction log
    ✓ Analytics
    ✓ Auditing
    ✓ System testing

    View full-size slide

  8. However…
    - Ambiguous requirements
    - New paradigm and emerging architecture
    - We need to look at the problem as a whole
    - We need to avoid building useless features
    - We need to avoid accumulating technical debt

    View full-size slide

  9. Design principles

    View full-size slide

  10. Design for a distributed
    system

    View full-size slide

  11. What is an event?

    View full-size slide

  12. • A simple event is an opaque value, typically a time
    series item
    • meter reading
    • A structured event can have an arbitrarily complex
    structure that can evolve over time
    • user registration event

    View full-size slide

  13. Simplify the data
    model

    View full-size slide

  14. Evolution of Event
    • Payload and meta-data as simple collections of
    key/value
    • The type is persisted with each event
    • to make events readable
    • to avoid managing schemas

    View full-size slide

  15. What does the event
    store look like?

    View full-size slide

  16. An event store should be
    - Simple request/response paradigm with clear
    guarantees
    - Accessible, ideally even from legacy services
    - Ability to query for events

    View full-size slide

  17. Event Store
    ☛ Event Service

    View full-size slide

  18. Resource-driven
    Design

    View full-size slide

  19. Event service API,
    version 1: store and read
    an event

    View full-size slide

  20. • Store an event
    • POST /api/events/
    • Read an event
    • GET /api/events/{eventId}

    View full-size slide

  21. Anatomy of an Event
    {
    "type" : "SOME.EVENT.TYPE",
    "source" : "some-component:instance",
    "metadata" : {
    "anyMetaKey" : "someMetaValue",
    "tags" : [ "tag1", "tag2" ]
    },
    "payload" : {
    "anyKey1" : "someValue",
    "anyKey2" : 3
    }
    }

    View full-size slide

  22. and the architecture to
    support the
    requirements…

    View full-size slide

  23. The Event Table
    Key Event
    id1
    timestamp type payload …
    123 X <> …
    id2
    timestamp type payload …
    456 Y <> …

    View full-size slide

  24. Event service API,
    version 2: querying
    events and notifications

    View full-size slide

  25. • Query events
    • GET /api/events?{parameters}
    • {queryString} can consist of the following fields:
    • start, end, startOffset, limit, tag, type,
    order

    View full-size slide

  26. • Examples:
    • GET /api/events?start={startTime}
    &end={endTime}
    • GET /api/events?
    startOffset=3600000&type=someType

    View full-size slide

  27. Modelling time series
    and queries

    View full-size slide

  28. Querying
    One denormalised table for each query
    Query Key
    id1 ts11 id1 ts12 id2 ts21 id2 ts22
    p1 b1 v11
    p2 b2 v12 v21
    p2 b2 v22

    View full-size slide

  29. • Cons:
    • Denormalise for each query, again and again
    • Higher disk usage
    • Disk space is cheap, but not free
    • Write latency is affected
    • Time-bucketed indexes can create hot spots (hot
    shards)

    View full-size slide

  30. Flexible, adaptable
    architecture

    View full-size slide

  31. Adapt the data model
    to real-world constraints

    View full-size slide

  32. • Same service contract
    • Indices are updated synchronously or
    asynchronously
    • Basic client guarantee: if a POST is successful the
    event has been persisted “sufficiently”
    • Events can be published to a message broker

    View full-size slide

  33. • Pro
    • Decoupling
    • client are unaware of the implementation details
    • Intuitive ReSTful interface
    • Disk consumption is more reasonable
    • Easily extensible
    • Pub/sub

    View full-size slide

  34. • Cons
    • Not primarily optimised for latency
    • Still sufficiently performant for our use-cases
    • More complex service code
    • Needs to execute multiple CQL queries in
    sequence
    • Cluster hotspots can still occur, in theory

    View full-size slide

  35. Indices
    • Ascending and descending time buckets for each
    query type
    • Index value references an event stored in the main
    table by its id

    View full-size slide

  36. Indices
    events_by_type_asc (
    tbucket text, type text, eventid timeuuid,
    primary key ((type, tbucket), eventid))
    with clustering order by (eventid asc);
    events_by_type_desc (
    tbucket text, type text, eventid timeuuid,
    primary key ((type, tbucket), eventid))
    with clustering order by (eventid desc);
    Ascending and descending time buckets for each query type

    View full-size slide

  37. Example:
    Implementing Pagination

    View full-size slide

  38. Pagination
    GET /api/events?start=141..&type=X&limit=5
    type time Ὂ
    type1 bucket1
    type1 bucket2 id1 id2
    type1 bucket3 id3 id4
    type1 bucket4 id5 id6 id7 id8
    type1 bucket5

    View full-size slide

  39. Pagination
    GET /api/events?start=141..&type=X&limit=5
    type time Ὂ
    type1 bucket1

    query
    range

    type1 bucket2 id1 id2
    type1 bucket3 id3 id4
    type1 bucket4 id5 id6 id7 id8
    type1 bucket5

    View full-size slide

  40. Pagination
    GET /api/events?start=141..&type=X&limit=5
    type time Ὂ ◀ query range ▶︎
    type1 bucket1

    query
    range

    type1 bucket2 id1 id2
    type1 bucket3 id3 id4
    type1 bucket4 id5 id6 id7 id8
    type1 bucket5

    View full-size slide

  41. Pagination
    GET /api/events?start=141..&type=X&limit=5
    type time Ὂ ◀ query range ▶︎
    type1 bucket1

    query
    range

    type1 bucket2 id1 id2
    type1 bucket3 id3 id4
    type1 bucket4 id5 id6 id7 id8
    type1 bucket5

    View full-size slide

  42. Pagination
    GET /api/events?start=141..&type=X&limit=5
    type time Ὂ ◀ query range ▶︎
    type1 bucket1

    query
    range

    type1 bucket2 id1 id2
    type1 bucket3 id3 id4
    type1 bucket4 id5 id6 id7 id8
    type1 bucket5

    View full-size slide

  43. GET /api/events?start=141..&type=X&limit=5
    {
    “events” : [
    {
    “id” : “uuid1”,
    “type” : “X”,
    “metadata” : { … },
    “payload” : { … }
    },
    { … }
    ],
    “continuation” : “/api/events?continueFrom=uuid7&type=X&limit=5”
    }

    View full-size slide

  44. Performance Characteristics
    • 70 to 85 million events per day
    • Client latency increases moderately with increased
    parallel load (40ms to 60ms, +10ms on the client)
    • Current behaviour exceeds by far current target
    volumes

    View full-size slide

  45. Lessons learnt
    • Scalability is not only about raw performance and
    latency
    • Experiment
    • Simplify
    • Understand Thrift, use CQL

    View full-size slide

  46. Links
    • http://www.opencredo.com/blog
    • @davib0
    • @tareq_abedrabbo
    Thank you! Any questions?

    View full-size slide

  47. Future improvements

    View full-size slide

  48. • Data model improvements: User Defined Types
    • DateTiered compaction
    • Analytics with Spark
    • Add other data views

    View full-size slide