Event Sourcing, Stream Processing and Serverless (Kafka Summit SF19)

C6598a8b085d0c720cde07f89768a80c?s=47 benstopford
October 01, 2019

Event Sourcing, Stream Processing and Serverless (Kafka Summit SF19)

In this talk we’ll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We’ll debunk some of the myths around event sourcing. We’ll look at the inevitability of event-driven programming in the serverless space and we’ll see how stream processing links these two concepts together with a single ‘database for events’. As the story unfolds we’ll dive into some use cases, examine the practicalities of each approach, particularly the stateful elements, and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL.

C6598a8b085d0c720cde07f89768a80c?s=128

benstopford

October 01, 2019
Tweet

Transcript

  1. Event Sourcing, Stream Processing & Serverless Ben Stopford Office of

    the CTO, Confluent
  2. What we’re going to talk about • Event Sourcing •

    What it is and how does it relate to Event Streaming? • Stream Processing as a kind of “Database” • What does this mean? • Serverless Functions • How do this relate?
  3. Can you do event sourcing with Kafka?

  4. Traditional Event Sourcing

  5. Popular example: Shopping Cart DB Apps Search Apps Apps Database

    Table matches what the user sees.
  6. 12.42 12.44 12.49 12.50 12.59 Event Sourcing stores events, then

    derives the ‘current state view’ Apps Apps DERIVE Chronological Reduce Event Timeseries of user activity
  7. Traditional Event Sourcing (Store immutable events in a database in

    time order) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Table of events Persist events Apps Apps
  8. Traditional Event Sourcing (Read) Apps Search NoSQL Monitoring Security Apps

    Apps S T R E A M I N G P L AT F O R M Apps Search Monitoring Apps Apps Chronological Reduce on read (done inside the app) Query by customer Id (+session?) - No schema migration - Similar to ’schema on read’
  9. 3 Benefits

  10. Evidentiary Accountants don’t use erasers (e.g. audit, ledger, git)

  11. Replayability Recover corrupted data after a programmatic bug

  12. Analytics Keep the data needed to extract trends and behaviors

    i.e. non-lossy (e.g. insight, metrics, ML)
  13. Traditional Event Sourcing • Use a database (any one will

    do) • Create a table and insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance
  14. Traditional Event Sourcing with Kafka • Use a database Kafka

    • Create a table topic insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance
  15. Confusion: You can’t query Kafka by say Customer Id* *Aggregate

    ID in DDD parlance
  16. If we can’t query by Customer ID then what do

    we do?
  17. CQRS is a tonic: Cache the projection in a ‘View’

    Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Query by customer Id Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Events/Command Events are the Storage Model Stream Processor Cache/DB/Ktable etc. Regenerate the view rather than doing schema migration
  18. CQRS provides the benefits of event sourcing using a “Materialized

    View”
  19. Even with CQRS, Event Sourcing is Hard CQRS helps, but

    it’s still quite hard if you’re a CRUD app
  20. What’s the problem? Harder: • Eventually Consistent • Multi-model (Complexity

    ∝ #Schemas in the log) • More moving parts Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L A T F O R M CRUD System CQRS
  21. New York Times Website Source of Truth Every article since

    1851 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Normalized assets (images, articles, bylines, tags all separate messages) Denormalized into “Content View”
  22. If CRUD makes sense there are other ways: audit tables,

    CDC, etc. Trigger Evidentiary Replayable N/A to web app Analytics CDC
  23. More advanced: Use a Bi-Temporal Database

  24. Events make most sense where data has to move

  25. This is where CQRS comes into its own!

  26. Online Transaction Processing: e.g. a Flight Booking System - Flight

    price served 10,000 x #bookings - Consistency required only at booking time
  27. CQRS with event movement Apps Search Monitoring Apps Apps S

    T R E A M I N G P L AT F O R M Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Book Flight Apps Search Apps S T R E A M I N G P L A View Apps Search NoSQL Apps DWH S T R E A M I N G P L A View Get Flights Get Flights Get Flights Global Read Central Write
  28. The exact same logic applies to microservices

  29. Event Sourcing for Microservices Basket Service Fraud Service Billing Service

    Email Service Basket Events
  30. Event Sourcing for Microservices Basket Service Fraud Service Billing Service

    Email Service Basket Events Events are the storage model Each microservice creates a view that suits its use case
  31. Event Sourcing “with a DB” for monoliths. Event Streaming for

    Microservices & Scale. (Often via. CQRS)
  32. Event Streaming

  33. Event Streaming is a more general form of Event Sourcing/CQRS

    Event Streaming • Events as shared data model • Many microservices • Polyglot persistence • Event-Driven processing Traditional Event Sourcing • Events as a storage model • Single microservice • Single DB • data-at-rest
  34. Event Streams is about many event sources (Join, Filter, Transform

    and Summarize) Fraud Service Orders Service Payment Service Customer Service Event Log Projection created in Kafka Streams API
  35. KStreams & KSQL have different positioning • KStreams is a

    library for Dataflow programming: • App Logic & Stream Processor (including state) are combined. • Apps are stateful. • JVM only. • KSQL is a ‘database’ for event preparation: • App sends SQL to a separate process • Apps are stateless • Connect from any language
  36. This difference makes most sense if we we look to

    the future.
  37. Cloud & Serverless

  38. Thesis • Serverless provides event-driven infrastructure • KSQL is the

    corollary: an event-driven database
  39. Serverless Functions (FaaS) • Write a function • Upload •

    Configure a trigger (HTTP, Messaging, Object Store, Database, Timer etc.) Request Respond Event Source
  40. FaaS in a Nutshell • Fully managed (Runs in a

    container pool) • Pay for execution time (not resources used) • Auto-scales with load • 0-1000+ concurrent functions • Stateless • Short lived (limit 5-15 mins) • Weak ordering guarantees • Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s)
  41. Where is FaaS useful? • Spikey workloads and ‘occasional’ use

    cases • Use cases that don’t typically warrant massive parallelism e.g. CI systems. • General purpose programming paradigm?
  42. But there are open questions

  43. Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy

    loop • Testing • IDE integration Currently quite poor
  44. Harder than current approaches Easier than current approaches Amazon Google

    Microsoft Serverless programming will likely become prevalent
  45. In the future it seems unlikely we’ll manage our own

    infrastructure. But where will we manage our data?
  46. None
  47. Event-Streaming approaches this from a different angle

  48. FaaS is event-driven But it isn’t streaming

  49. Complex, Timing issues, Scaling limits Customers Event Source Orders Event

    Source Payments Event Source Serverless functions handle only one event source FaaS/μS FaaS/μS FaaS/μS
  50. A slightly more complex example: Send email only to platinum

    customers
  51. Payments Event Source Event is received by serverless function FaaS/μS

  52. Payments Event Source Block and calls the database to get

    customer+order FaaS/μS Get customer Get order
  53. Payments Event Source Is it a ‘Platinum’ customer? FaaS/μS Get

    customer Get order Is the customer platinum?
  54. Payments Event Source Send email if ‘Platinum’ FaaS/μS Get customer

    Get order Maybe send email
  55. Payments Event Source Increase Load: 100 concurrant functions doing IO.

    FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS
  56. Payments Event Source Only send 2 emails. FaaS/μS FaaS/μS FaaS/μS

    FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS FaaS/μS
  57. Send SQL Process boundary Orders Payments KSQL Customers Table Customers

    KSQL simplifies: App Logic CREATE STREAM foo AS SELECT * FROM orders, payments, customers LEFT JOIN… WHERE customer.type = ‘PLATINUM’ Order Payment Customer KSQL - Handle timing issues - No “per-event” IO. - Price efficient
  58. Functions have no additional data dependencies: Everything is in the

    event!
  59. Queries filter out the events you need (much like you

    filter rows in a database query)
  60. FaaS FaaS FaaS KSQL Customers Table KSQL as a “Database”

    for Event-Driven Infrastructure FaaS FaaS Stateless, elastic compute Prepare the events we need (Sateful) Orders Payments Customers Autoscale with load
  61. FaaS Traditional Application Event-Driven Application Application Database KSQL Stateful Data

    Layer FaaS FaaS FaaS FaaS FaaS Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity
  62. Event-Driven vs. Event Streaming Event Driven Event Streaming Multiple Event

    Sources Use Database + ETL + Code Handles automatically Efficiency Extract data from DB in the FaaS (IO) Only the data you need Logic-driven data requests. Call DB from the FaaS (IO) DB/KStreams KqlDB?
  63. None
  64. Event Streaming Platform

  65. Summary • Event Streaming provides the benefits of Event Sourcing

    to microservices and data pipelines. • Events are the data model. • Projections are the serving model: matching to each specific use case • Serving layer can be regenerated from the log (CQRS) • KSQL provides the same benefits for event-driven programs: e.g. preparing the event streams each FaaS application’s specific needs • In serverless architectures this drives efficiency: a ‘database- equivalent’ for event-driven infrastructure.
  66. FaaS FaaS FaaS KSQL Can I Build This? FaaS FaaS

    AWS Lambda / Azure Functions Connectors (in Preview) Hosted KSQL In Preview Confluent Cloud
  67. Things I didn’t tell you • Tools like KSQL provide

    data provisioning, not state mutation. • Use single writers. Try KSQL DB? • Can KSQL handle large state? • Unintended rebalance can stall processing • Static membership (KIP-345) – name the list of stream processors • Increase the timeout for rebalance after node removal (group.max.session.timeout.ms) • Worst case reload: RocksDB ~GbE speed • Can Kafka be used for long term storage? • Log files are immutable once they roll (unless compacted) • Jun spent a decade working on DB2 • Careful: • Historical reads can stall real-time requests (cached) • ZFS has several page cache optimizations • Tiered storage will help
  68. Find out More • Peeking Behind the Curtains of Serverless

    Platforms, Wang et al. • Cloud Programming Simplified: A Berkeley View on Serverless Compute • Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless. • Designing Event Driven Systems, Ben Stopford
  69. Thank you @benstopford Book: https://www.confluent.io/designing-event-driven-systems