Slide 1

Slide 1 text

Building a Scalable Event Service with Cassandra: Design to Code Tareq Abedrabbo - OpenCredo Code Mesh 2014

Slide 2

Slide 2 text

About Me • CTO at OpenCredo • We are a software consultancy and delivery company • Open source, NoSQL/Big Data, cloud

Slide 3

Slide 3 text

This talk is about… • What we built • Why we built it • How we built it

Slide 4

Slide 4 text

Project background

Slide 5

Slide 5 text

• High street retailer • Decoupled micro services architecture • Java-based, event-driven platform • Cassandra, Cloud Foundry, RabbitMQ

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Why do we need an event service?

Slide 8

Slide 8 text

• Capture millions of platform and business events • Trigger downstream processes asynchronously • Customise standard processes in a non-intrusive way • Provide a system-wide transaction log • Analytics • System testing

Slide 9

Slide 9 text

I know, I will use technology X!

Slide 10

Slide 10 text

However… • Ambiguous requirements • New paradigm, emerging architecture • We need to look at the problem as a whole • We need to avoid building useless features • We need to avoid accumulating technical debt

Slide 11

Slide 11 text

Design principles

Slide 12

Slide 12 text

1. Simplicity (yes, really!)

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

2. Decoupling

Slide 15

Slide 15 text

• Contract-first design • Flexibility in the implementation • Ability to evolve while minimising impact of changes

Slide 16

Slide 16 text

3. Scalability and fault- tolerance

Slide 17

Slide 17 text

• Choosing the right architecture • Choosing the right model • Choosing the right tools

Slide 18

Slide 18 text

What is an event?

Slide 19

Slide 19 text

• A simple event is an opaque value, typically a time series item • meter reading • A structured event can have an arbitrarily complex structure that can evolve over time • user registration event

Slide 20

Slide 20 text

What does the event store look like?

Slide 21

Slide 21 text

Event service API, version 1: store and read an event

Slide 22

Slide 22 text

• It needs to be simple and accessible • a service only cares about emitting events • at that stage, we didn’t care much about the structure of each individual event • accessible ideally even from outside the platform • Resource oriented design - ReST • Simple request/response paradigm

Slide 23

Slide 23 text

• Store an event • POST /api/events/ • Read an event • GET /api/events/{eventId}

Slide 24

Slide 24 text

Anatomy of an Event { "type" : "DEMOENTITY.DEMOPROCESS.DEMOTASK", "source" : "demoapp1:2.5:981a24b3-2860-40ba-90d4-c47ef1a70abe", "clientTimestamp" : 1401895567594, "serverTimestamp" : 1401895568616, "platformContext" : { "id" : "demoapp1", "version" : "2.5" }, "businessContext" : { "channel" : "WEB", }, "payload" : { "message" : "foo", "anInteger" : 33, "bool" : false } }

Slide 25

Slide 25 text

and the architecture to support the requirements…

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

The Event Table Key Event id1 timestamp type payload … 123 X <> … id2 timestamp type payload … 456 Y <> …

Slide 28

Slide 28 text

• Store payload as a blob • Established service minimal contract • Established semantics: POST

Slide 29

Slide 29 text

Event service API, version 2: querying events and notifications

Slide 30

Slide 30 text

• Query events • GET /api/events?{queryString} • {queryString} can consist of the following fields: • start, end, startOffset, limit, tag, type, order

Slide 31

Slide 31 text

• Examples: • GET /api/events?start={startTime} &end={endTime} • GET /api/events? startOffset=3600000&type=someType

Slide 32

Slide 32 text

How do we model time series?

Slide 33

Slide 33 text

Simple Time Series Modelling Using timestamps as a clustering column Key Timestamp/Value id1 ts11 ts12 ts13 v11 v12 v13 id2 ts21 ts22 ts23 v21 v22 v23

Slide 34

Slide 34 text

• Pros • Simple • Works well for simple data structures • Good read and write performance • Cons • Hard limit on partition size (2 billion cells) • Limited flexibility • Limited querying

Slide 35

Slide 35 text

Time Bucketing Adding a time bucket to the partition key Key Timestamp/Value ts11 ts12 ts13 id1 bucket1 v11 id1 bucket2 v12 v13 ts11 ts12 ts13 id2 bucket1 v21 v22 id2 bucket2 v23

Slide 36

Slide 36 text

• Mitigates the partition size issue • Queries become slightly more complex • Write performance is not affected • Reads may be slower, potentially hitting multiple buckets

Slide 37

Slide 37 text

How about querying?

Slide 38

Slide 38 text

Querying One denormalised table for each query Query Key id1 ts11 id1 ts12 id2 ts21 id2 ts22 p1 b1 v11 p2 b2 v12 v21 p2 b2 v22

Slide 39

Slide 39 text

• Denormalise for each query • Higher disk usage • Disk space is cheap, but not free • Write latency is affected • Time-bucketed indexes can create hot spots (hot shards)

Slide 40

Slide 40 text

There is obviously no optimal solution…

Slide 41

Slide 41 text

Event Store ☛ Event Service

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

• Same service contract • Basic client guarantee: if a POST is successful the event has been persisted “sufficiently” • Indices are updated asynchronously • Events can be published to a message broker

Slide 44

Slide 44 text

This is actually CQRS

Slide 45

Slide 45 text

Evolution of Event • Payload and meta-data as simple collections of key/value • The type is persisted with each event • to make events readable • to avoid managing schemas

Slide 46

Slide 46 text

Primary Event Store events (id timeuuid primary key, source text, type text, cts timestamp, sts timestamp, bct map, bcv map, pct map, pcv map, plt map, plv map ); Events are simply keyed by id

Slide 47

Slide 47 text

Indices • Ascending and descending time buckets for each query type • Index value ‘points’ to an event stored in the main table

Slide 48

Slide 48 text

Indices events_by_time_asc ( tbucket text, eventid timeuuid, primary key (tbucket, eventid)) with clustering order by (eventid asc); events_by_time_desc ( tbucket text, eventid timeuuid, primary key (tbucket, eventid)) with clustering order by (eventid desc); Ascending and descending time buckets for each query type

Slide 49

Slide 49 text

Implementing Pagination

Slide 50

Slide 50 text

Pagination GET /api/events?start=141..&type=X&limit=5 type time Ὂ type1 bucket1 type1 bucket2 id1 id2 type1 bucket3 id3 id4 type1 bucket4 id5 id6 id7 id8 type1 bucket5

Slide 51

Slide 51 text

Pagination GET /api/events?start=141..&type=X&limit=5 type time Ὂ type1 bucket1 ▲ query range ▼ type1 bucket2 id1 id2 type1 bucket3 id3 id4 type1 bucket4 id5 id6 id7 id8 type1 bucket5

Slide 52

Slide 52 text

Pagination GET /api/events?start=141..&type=X&limit=5 type time Ὂ ◀ query range ▶︎ type1 bucket1 ▲ query range ▼ type1 bucket2 id1 id2 type1 bucket3 id3 id4 type1 bucket4 id5 id6 id7 id8 type2 bucket5

Slide 53

Slide 53 text

Pagination GET /api/events?start=141..&type=X&limit=5 type time Ὂ ◀ query range ▶︎ type1 bucket1 ▲ query range ▼ type1 bucket2 id1 id2 type1 bucket3 id3 id4 type1 bucket4 id5 id6 id7 id8 type2 bucket5

Slide 54

Slide 54 text

Pagination GET /api/events?start=141..&type=X&limit=5 type time Ὂ ◀ query range ▶︎ type1 bucket1 ▲ query range ▼ type1 bucket2 id1 id2 type1 bucket3 id3 id4 type1 bucket4 id5 id6 id7 id8 type2 bucket5

Slide 55

Slide 55 text

{ "count" : 1, "continuation" : "http://event-service-location/api/events?continueFrom=9f00e9d0- ebfc-11e3-81c5-09597eb bf2cb&end=1401965827000&limit=1", "events" : [ { "type" : "DEMOENTITY.DEMOPROCESS.DEMOTASK", "source" : "demoapp1:2.5:981a24b3-2860-40ba-90d4-c47ef1a70abe", "clientTimestamp" : 1401895567594, "serverTimestamp" : 1401895568616, "platformContext" : { "id" : "demoapp1", "version" : "2.5" }, "businessContext" : { }, "payload" : { }, “self" : ”http://event-service-location/api/events/8d4ce680- ebfc-11e3-81c5-09597ebbf2cb" } ] }

Slide 56

Slide 56 text

• Pro • Decoupling • client are unaware of the implementation details • Intuitive ReSTful interface • Disk consumption is more reasonable • Easily extensible • Pub/sub

Slide 57

Slide 57 text

• Cons • Not only optimised for latency • Still sufficiently performant for our use-cases • More complex service code • Needs to execute multiple CQL queries in sequence • Cluster hotspots can still occur, in theory

Slide 58

Slide 58 text

Where do we go from here?

Slide 59

Slide 59 text

• Data model improvements: User Defined Types • More sophisticated error handling • Analytics with Spark • Add other data views

Slide 60

Slide 60 text

Lessons learnt • Scalability is not only about raw performance • Experiment • Simplify • Understand Thrift, use CQL

Slide 61

Slide 61 text

Links • OpenCredo: http://www.opencredo.com/blog • Twitter: @tareq_abedrabbo Thank you! Questions?