Event Sourcing, Stream Processing & Serverless

Slide 1

Slide 1 text

Event Sourcing, Stream Processing & Serverless Ben Stopford Office of the CTO, Confluent

Slide 2

Slide 2 text

What we’re going to talk about • Event Sourcing • What it is and how does it relate to Event Streaming? • Stream Processing as a kind of “Database” • What does this mean? • Serverless Functions • How do this relate?

Slide 3

Slide 3 text

Can you do event sourcing with Kafka?

Slide 4

Slide 4 text

Traditional Event Sourcing

Slide 5

Slide 5 text

Popular example: Shopping Cart DB Apps Search Apps Apps Database Table matches what the user sees.

Slide 6

Slide 6 text

12.42 12.44 12.49 12.50 12.59 Event Sourcing stores events, then derives the ‘current state view’ Apps Apps DERIVE Chronological Reduce Event Timeseries of user activity

Slide 7

Slide 7 text

Traditional Event Sourcing (Store immutable events in a database in time order) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Table of events Persist events Apps Apps

Slide 8

Slide 8 text

Traditional Event Sourcing (Read) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitoring Apps Apps Chronological Reduce on read (done inside the app) Query by customer Id (+session?) - No schema migration - Similar to ’schema on read’

Slide 9

Slide 9 text

3 Benefits

Slide 10

Slide 10 text

Evidentiary Accountants don’t use erasers (e.g. audit, ledger, git)

Slide 11

Slide 11 text

Replayability Recover corrupted data after a programmatic bug

Slide 12

Slide 12 text

Analytics Keep the data needed to extract trends and behaviors i.e. non-lossy (e.g. insight, metrics, ML)

Slide 13

Slide 13 text

Traditional Event Sourcing • Use a database (any one will do) • Create a table and insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance

Slide 14

Slide 14 text

Traditional Event Sourcing with Kafka • Use a database Kafka • Create a table topic insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance

Slide 15

Slide 15 text

Confusion: You can’t query Kafka by say Customer Id* *Aggregate ID in DDD parlance

Slide 16

Slide 16 text

Events are a good write model, but make a tricky read model

Slide 17

Slide 17 text

CQRS is a tonic: Cache the projection in a ‘View’ Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Query by customer Id Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Events/Command Events accumulate in the log Stream Processor Cache/DB/Ktable etc.

Slide 18

Slide 18 text

Even with CQRS, Event Sourcing is Hard CQRS helps, but it’s still quite hard if you’re a CRUD app

Slide 19

Slide 19 text

What’s the problem? Harder: • Eventually Consistent • Multi-model (Complexity ∝ #Schemas in the log) • More moving parts Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L A T F O R M CRUD System CQRS

Slide 20

Slide 20 text

Eventual Consistency is often good for serving layers Source of Truth Every article since 1851 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Normalized assets (images, articles, bylines, tags all separate messages) Denormalized into “Content View”

Slide 21

Slide 21 text

If your system is both simple and transactional: stick with CRUD and an audit/history table Trigger Evidentiary Yes Replayable N/A to web app Analytics Yes CDC

Slide 22

Slide 22 text

More advanced: Use a Bi-Temporal Database

Slide 23

Slide 23 text

Use Traditional Event Sourcing judiciously, where it makes sense

Slide 24

Slide 24 text

CQRS comes into its own when the events move data

Slide 25

Slide 25 text

Online Transaction Processing: e.g. a Flight Booking System - Flight price served 10,000 x #bookings - Consistency required only at booking time

Slide 26

Slide 26 text

CQRS with event movement Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Book Flight Events accumulate in the log Apps Search Apps S T R E A M I N G P L A View Apps Search NoSQL Apps DWH S T R E A M I N G P L A View Get Flights Get Flights Get Flights Global Read Central Write

Slide 27

Slide 27 text

The exact same logic applies to microservices

Slide 28

Slide 28 text

Microservices Orders Service Fraud Service Billing Service Email Service Orders

Slide 29

Slide 29 text

Fraud service doesn’t have to be consistent with the Orders service because it just creates new data (new events) Orders Service Fraud Service Billing Service Email Service Orders Consistent?

Slide 30

Slide 30 text

Microservices Orders Service Fraud Service Billing Service Email Service Orders Start to build things “Event Driven”

Slide 31

Slide 31 text

Event Streaming

Slide 32

Slide 32 text

Event Streaming is a more general form of Event Sourcing/CQRS Event Streaming • Events as shared data model • Many microservices • Polyglot persistence • Data-in-flight Traditional Event Sourcing • Events as a storage model • Single microservice • Single DB • data-at-rest

Slide 33

Slide 33 text

Benefits of Event Streaming stand out where there are multiple data sources.

Slide 34

Slide 34 text

Join, Filter, Transform and Summarize Events from Different Sources Fraud Service Orders Service Payment Service Customer Service Event Log Projection created in Kafka Streams API

Slide 35

Slide 35 text

KStreams & KSQL have different positioning • KStreams is a library for Dataflow programming: • App logic lives in stream processor and can use state stores • Statefulness limited by operational constraints. • KSQL is a ‘database’ for event preparation: • App logic is a separate process (can’t use state stores) • Statefulness unlimited, like a DB. • App uses consumer in any language

Slide 36

Slide 36 text

This difference makes most sense if we we look to the future.

Slide 37

Slide 37 text

Cloud & Serverless

Slide 38

Slide 38 text

Thesis • Serverless provides real-time, event-driven infrastructure and compute. • A stream processor provides the corollary: a database-equivalent for real-time, event-driven data.

Slide 39

Slide 39 text

Using FaaS • Write a function • Upload • Configure a trigger (HTTP, Event, Object Store, Database, Timer etc.)

Slide 40

Slide 40 text

FaaS in a Nutshell • Fully managed (Runs in a container pool) • Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s) • Pay for execution time (not resources used) • Auto-scales with load • 0-1000+ concurrent functions • Event driven • Stateless • Short lived (limit 5-15 mins) • Weak ordering guarantees

Slide 41

Slide 41 text

Where is FaaS useful? • Spikey workloads • Use cases that don’t typically warrant massive parallelism e.g. CI systems. • General purpose programming paradigm?

Slide 42

Slide 42 text

But there are open questions

Slide 43

Slide 43 text

Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy loop • Testing • IDE integration Currently quite poor

Slide 44

Slide 44 text

Harder than current approaches Easier than current approaches Amazon Google Microsoft Serverless programming will likely become prevalent

Slide 45

Slide 45 text

In the future it seems unlikely we’ll manage our own infrastructure.

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

Event-Streaming approaches this from a different angle

Slide 48

Slide 48 text

FaaS is event-driven But it isn’t streaming

Slide 49

Slide 49 text

Complex, Timing issues, Scaling limits Customers Event Source Orders Event Source Payments Event Source Serverless functions handle only one event source FaaS/μS FaaS/μS FaaS/μS

Slide 50

Slide 50 text

Send SQL Process boundary Orders Payments KSQL Customers Table Customers KSQL simplifies these issues by pre-preparing events from different sources into one event stream App Logic CREATE STREAM order- payments AS SELECT * FROM orders, payments, customers LEFT JOIN… Order Payment Customer

Slide 51

Slide 51 text

KSQL prepares data so, when a function is called, a single event has all the data that function needs.

Slide 52

Slide 52 text

KSQL also separates stateful operations from event-driven application logic

Slide 53

Slide 53 text

FaaS FaaS FaaS KSQL Customers Table KSQL as a “Data Layer” for Serverless Functions FaaS FaaS STATELESS Fully elastic STATEFUL Orders Payments Customers Autoscale with load Filter, transform, join, summarizations

Slide 54

Slide 54 text

Familiar Apps Search Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless

Slide 55

Slide 55 text

FaaS Traditional Application Event-Driven Application Application Database KSQL Stateful Data Layer FaaS FaaS FaaS FaaS FaaS Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

Use stream processors to make the consumption of events both simple and scalable Think Event- Driven

Slide 58

Slide 58 text

Summary • Events underpin the storage models of truthful/factful architectures. • Event sourcing is most useful when it embraces events as data-in-flight • A stream processor provides a database-like equivalent for real-time, event-driven data • Serverless provides the corollary: real-time, event-driven infrastructure and compute

Slide 59

Slide 59 text

Things I didn’t tell you 1/2 • Tools like KSQL provide data provisioning, not state mutation. • Good for offline services & data pipelines • Not good for CRUD (but it’s ok to mix and match) • Kafka’s serverless integration is in it’s early stages. • Existing connector for Kafka (Limited functionality). • Confluent connector coming. • Can KSQL handle large state? • Unintended rebalance can stall processing • Static membership (KIP-345) – name the list of stream processors • Increase the timeout for rebalance after node removal (group.max.session.timeout.ms) • Worst case reload: RocksDB ~GbE speed

Slide 60

Slide 60 text

Things I didn’t tell you 2/2 • Can Kafka be used for long term storage? • Log files are immutable once they roll (unless compacted) • Jun spent a decade working on DB2 • Careful: • Historical reads can stall real-time requests (cached) • ZFS has several page cache optimizations • Tiered storage will help

Slide 61

Slide 61 text

Find out More • Peeking Behind the Curtains of Serverless Platforms, Wang et al. • Cloud Programming Simplified: A Berkeley View on Serverless Compute • Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless. • Designing Event Driven Systems, Ben Stopford

Slide 62

Slide 62 text

Thank you @benstopford Book: https://www.confluent.io/designing-event-driven-systems Github: http://bit.ly/kafka-microservice-examples Example ecosystem built with streams. Includes KSQL, Control Centre, Elastic etc.