Slide 1

Slide 1 text

Stream Processing for the Serverless Generation Ben Stopford Office of the CTO, Confluent

Slide 2

Slide 2 text

When it comes to data, we tend to think in databases

Slide 3

Slide 3 text

Increasing Complexity Apps Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Mon Sec Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps App S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T R E A M I N Apps Search NoSQL Apps DWH S T R E A M I N G P L AT App Apps Search NoSQL Apps S T R E A M I N G P L AT Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T Apps Search NoSQL DWH S T R E App Apps Apps Search Apps App Apps Apps Apps Search Apps Apps App Apps Apps Search App Kafka Evolution of software systems Monolith Distributed Monolith Microservices Event-Driven Microservices

Slide 4

Slide 4 text

Apps Search NoSQL Mo Se Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Ap Apps Apps Apps App Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Backend services grow faster than the front end Tightly coupled Loosely coupled e.g. Netflix have ~400 backend microservices fed by Kafka

Slide 5

Slide 5 text

In the serverless world, which is inherently event driven, stream processors will become as important as databases are today.

Slide 6

Slide 6 text

Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hado S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O R M PRODUCER CONSUMER Streaming Platform

Slide 7

Slide 7 text

Event Storage Kafka stores petabytes of data Stream Processing Real-time processing over streams and tables Scalability Clusters of hundreds of machines. Global. + + + Roots in big data messaging

Slide 8

Slide 8 text

Are Serverless Functions and Stream Processors related? They are both functions we define that are triggered by streams of events.

Slide 9

Slide 9 text

FaaS in brief • Write a function • Upload • Configure a trigger (HTTP, Event, Object Store, Database, Timer etc.)

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

FaaS in a Nutshell • Fully managed (Runs in a container) • Pay as you use • Auto-scales with load ~ 0-1000 concurrent functions • Short lived (max ~5 mins) • Weak ordering guarantees • Cold start’s can be slow: 100ms – 45s (AWS 250ms-7s)

Slide 12

Slide 12 text

Where is FaaS useful? • Interesting for spikey workloads (i.e. Extremities) • Grid compute: HPC, Genomics, Finance • Interesting for use cases that wouldn’t typically warrant the cost of conventional massive parallelism e.g. CI systems. • Serverless programming model

Slide 13

Slide 13 text

But there are open questions

Slide 14

Slide 14 text

Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy loop • Testing • IDE integration Currently quite poor

Slide 15

Slide 15 text

Harder than current approaches Easier than current approaches Amazon Google Microsoft

Slide 16

Slide 16 text

We’ll come back to this one!

Slide 17

Slide 17 text

FaaS is event-driven But it isn’t streaming

Slide 18

Slide 18 text

Simple online retail example When the order is created ..and the payment has been completed => get the customer’s info and send them an email confirming the purchase.

Slide 19

Slide 19 text

Serverless Way: event-driven (not streaming) Orders Customers Payments FaaS FaaS FaaS - Too slow for high velocity use cases ~ 5-10 messages per second - Correctness: what if the payment isn’t there when the order arrives? All Customer data All Payment data

Slide 20

Slide 20 text

Process boundary Orders Payments KStreams Customers Table Customers Event-Streaming Platforms sew these operations together Stateful or Stateless • No network calls. • 50,000-100,000 messages per second, per thread • Better correctness

Slide 21

Slide 21 text

Event Driven vs Stream Processing

Slide 22

Slide 22 text

Stream processors can be considered the databases of the event driven world

Slide 23

Slide 23 text

A little detail

Slide 24

Slide 24 text

Three key features •Stream-stream join (combine in real-time) • Unlike a database join you only need consider how late data might be •Stream-table join (enrich) • More like a database join on one side only. •Aggregate (summarize) • Big data sets are too large

Slide 25

Slide 25 text

Join events that happened recently Stream-Stream Join

Slide 26

Slide 26 text

KSQL Joining two streams orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order Orders Payments

Slide 27

Slide 27 text

KSQL Join is on the key (messages have keys in Kafka) orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order

Slide 28

Slide 28 text

KSQL Joining Two Streams: Streaming systems doesn’t know when data is going to arrive orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order

Slide 29

Slide 29 text

KSQL Bob’s payment arrives – nothing to join with orders.join(payments) Bob’s Payment Bob’s Order Jill’s Payment Jill’s Order

Slide 30

Slide 30 text

KSQL Message gets buffered Key-value store Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order

Slide 31

Slide 31 text

KSQL Jill’s order arrives and gets buffered Key-value store Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Slide 32

Slide 32 text

KSQL Another non-matching record is buffered Key-value store Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Slide 33

Slide 33 text

KSQL MATCH - based on key comparison Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Slide 34

Slide 34 text

KSQL MATCH – Create output event Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Slide 35

Slide 35 text

KSQL Continue Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Slide 36

Slide 36 text

KSQL 2nd MATCH – Create another output event Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

Slide 37

Slide 37 text

KSQL 2nd MATCH – Create another output event Jill’s Payment Jill’s Order Bob’s Payment Bob’s Order

Slide 38

Slide 38 text

Enrichment of event stream using a table Stream-Table join

Slide 39

Slide 39 text

KSQL Join a Stream with a Table Customers Orders Query Cust1 Table of Customers

Slide 40

Slide 40 text

First we need some source: Database, Stream Processor etc. Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Data is now saved in a topic in Kafka Event Storage

Slide 41

Slide 41 text

KSQL Kind of data virtualizartion Table of Customers 1. Reload 2. Keep up to date Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Event Storage

Slide 42

Slide 42 text

Summarizing Event Streams Filters and Aggregations

Slide 43

Slide 43 text

Summarizing data streams (Event Sourced) (Read/Write) KV Store Event Storage Current state is streamed to Kafka KStreams Payments User-> Balance

Slide 44

Slide 44 text

Calc Balance Payments Event Storage User Balance Bob $500 Sally $250 George $32 Sum payments to get the user’s account balance SELECT user, sum(amount) FROM payments GROUP BY user;

Slide 45

Slide 45 text

Summarizing data streams (Windowed) (Read/Write) KV Store Event Storage Current state is streamed to Kafka KStreams Page Views 1 minute window Page Views per min

Slide 46

Slide 46 text

Three key features • The stream-stream join (combine in real-time) • The stream-table join (enrich) • The aggregate (summarize) (lot’s more: transactions, chained operations, queryable state etc.)

Slide 47

Slide 47 text

The operations are stateful to different degrees • The stream-stream join (state ∝ buffer) • The stream-table join (state ∝ table size) • The aggregate (state ∝ cardinality of aggregation key)

Slide 48

Slide 48 text

Two modes of operation

Slide 49

Slide 49 text

API-based (Most common today) orders .filter((id, order) -> order.state().equals(“CREATED”)) .join(payments) .transform(MyEmailer::new, STORE) .to(“sent-emails”) JVM Only Similar to Flink, Storm, Samza, Spark etc.

Slide 50

Slide 50 text

Process boundary Orders Payments KStreams Customers Table Customers Use the API Business logic

Slide 51

Slide 51 text

Orders Payments KStreams Customers Table Customers Is mixing state and business logic a good idea? Avoid Being too Stateful

Slide 52

Slide 52 text

Use KSQL CREATE STREAM order-payments AS SELECT * FROM orders, payments LEFT JOIN orders ON orders.orderId = payments.orderId; WHERE order.state = ‘CREATED’

Slide 53

Slide 53 text

Event Storage + Messaging Event Storage, query layer, App KSQL KSQL KSQL Query Layer Apps Search Business Logic Joined, enriched, summarized event stream

Slide 54

Slide 54 text

Orders Payments KSQL Customers Table Customers Use KSQL Server (or Cloud Service) My code SELECT * FROM orders, payments, customers WHERE … Joined, enriched event stream Business Logic

Slide 55

Slide 55 text

Stateless layer scales easily KSQL Stateful Data Layer Stateless Application layer (any language) Scales quickly Event Storage Denormalized Events Three event streams from different event sources

Slide 56

Slide 56 text

Pattern Should be Familiar Apps Search Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless

Slide 57

Slide 57 text

Comparing back to Serverless Functions FaaS - Autoscale based on demand - Pay as you use - Simple programming model - Stateless - High latency - High throughput (if batched) - One event source Stream Processors - Stateful & Stateless operations at high throughputs. - Join different event sources /enrich - Correctness even after failure - Rich semantics for dataflow programming. - Effectively infinite storage in Kafka - Doesn’t autoscale (Scale manually / programmatically) - More complex

Slide 58

Slide 58 text

FaaS FaaS FaaS Transaction KSQL Customers Table They compliment: Stream processors act as a “data layer” for FaaS FaaS FaaS Stateless Stateful Orders Payments Customers AWS Lambda Connector

Slide 59

Slide 59 text

Broader pattern (easier to consume, keep apps stateless) Orders Service Payment Service Customer Service Denormalized Events Apps Apps Apps Search NoSQL Apps S T R E A M I N G P L AT F NoSQL Order Payment Customer Most languages supported Denormalized Events

Slide 60

Slide 60 text

Event Streams Orders Payments Customers Distinct Visits Destination C* Postgres Lambda Other Kafka Select Organizational Events Stream Processing SELECT * FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’ Msgs/Day Customers Stream Processing C* Lambda Orders History 1w All Event storage + Stream Processing make data self service (real time & historical)

Slide 61

Slide 61 text

Steaming Platforms apply these patterns across ecosystems Event Streaming Platform (Storage + Stream Processing)

Slide 62

Slide 62 text

The Future It’s an evolving field

Slide 63

Slide 63 text

FaaS FaaS FaaS Orders Payments KSQL Customers Table Customers Connector FaaS FaaS • Ordering (by partition) • Batching

Slide 64

Slide 64 text

FaaS FaaS FaaS Transaction Orders Payments KSQL Customers Table Customers Stateless Stateful Maybe: Inherit Kakfa’s Transactional Guarantees FaaS FaaS

Slide 65

Slide 65 text

In Summary • Stream processors can operate like databases for this event driven world. • FaaS is one of many “end points” • FaaS has unique properties: • Pay as you use • Load driven autoscaling • Programming model •Trick: split application logic from data preparation & adopt event-first model.

Slide 66

Slide 66 text

In this increasingly event driven world stream processors become as important as databases are today.

Slide 67

Slide 67 text

FaaS CRUD Event-Driven Application Database KSQL Stateful Data Layer FaaS FaaS FaaS FaaS FaaS Event-Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

Streaming platforms provide a unique alternative. Billing Shipping Fraud Fraud Fulfilment Streaming Platform

Slide 70

Slide 70 text

Thank you @benstopford Book: https://www.confluent.io/designing-event-driven-systems

Slide 71

Slide 71 text

Rate today’s session Session page on oreillysacon.com/ny O’Reilly Events App