Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Event Sourcing, Stream Processing & Serverless

Event Sourcing, Stream Processing & Serverless

In this talk we’ll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We’ll debunk some of the myths around event sourcing. We’ll look at the inevitability of event-driven programming in the serverless space and we’ll see how stream processing links these two concepts together with a single ‘database for events’. As the story unfolds we’ll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL.

C6598a8b085d0c720cde07f89768a80c?s=128

benstopford

May 14, 2019
Tweet

Transcript

  1. Event Sourcing, Stream Processing & Serverless Ben Stopford Office of

    the CTO, Confluent
  2. What we’re going to talk about • Event Sourcing •

    What it is and how does it relate to Event Streaming? • Stream Processing as a kind of “Database” • What does this mean? • Serverless Functions • How do this relate?
  3. Can you do event sourcing with Kafka?

  4. Traditional Event Sourcing

  5. Popular example: Shopping Cart DB Apps Search Apps Apps Database

    Table matches what the user sees.
  6. 12.42 12.44 12.49 12.50 12.59 Event Sourcing stores events, then

    derives the ‘current state view’ Apps Apps DERIVE Chronological Reduce Event Timeseries of user activity
  7. Traditional Event Sourcing (Store immutable events in a database in

    time order) Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Table of events Persist events Apps Apps
  8. Traditional Event Sourcing (Read) Apps Search NoSQL Monitoring Security Apps

    Apps S T R E A M I N G P L AT F O R M Apps Search Monitoring Apps Apps Chronological Reduce on read (done inside the app) Query by customer Id (+session?) - No schema migration - Similar to ’schema on read’
  9. 3 Benefits

  10. Evidentiary Accountants don’t use erasers (e.g. audit, ledger, git)

  11. Replayability Recover corrupted data after a programmatic bug

  12. Analytics Keep the data needed to extract trends and behaviors

    i.e. non-lossy (e.g. insight, metrics, ML)
  13. Traditional Event Sourcing • Use a database (any one will

    do) • Create a table and insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance
  14. Traditional Event Sourcing with Kafka • Use a database Kafka

    • Create a table topic insert events as they occur • Query all the events associated with your problem* • Reduce them chronologically to get the current state *Aggregate ID in DDD parlance
  15. Confusion: You can’t query Kafka by say Customer Id* *Aggregate

    ID in DDD parlance
  16. Events are a good write model, but make a tricky

    read model
  17. CQRS is a tonic: Cache the projection in a ‘View’

    Apps Search Monitoring Apps Apps S T R E A M I N G P L AT F O R M Query by customer Id Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Events/Command Events accumulate in the log Stream Processor Cache/DB/Ktable etc.
  18. Even with CQRS, Event Sourcing is Hard CQRS helps, but

    it’s still quite hard if you’re a CRUD app
  19. What’s the problem? Harder: • Eventually Consistent • Multi-model (Complexity

    ∝ #Schemas in the log) • More moving parts Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L A T F O R M CRUD System CQRS
  20. Eventual Consistency is often good for serving layers Source of

    Truth Every article since 1851 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/ Normalized assets (images, articles, bylines, tags all separate messages) Denormalized into “Content View”
  21. If your system is both simple and transactional: stick with

    CRUD and an audit/history table Trigger Evidentiary Yes Replayable N/A to web app Analytics Yes CDC
  22. More advanced: Use a Bi-Temporal Database

  23. Use Traditional Event Sourcing judiciously, where it makes sense

  24. CQRS comes into its own when the events move data

  25. Online Transaction Processing: e.g. a Flight Booking System - Flight

    price served 10,000 x #bookings - Consistency required only at booking time
  26. CQRS with event movement Apps Search Monitoring Apps Apps S

    T R E A M I N G P L AT F O R M Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M View Book Flight Events accumulate in the log Apps Search Apps S T R E A M I N G P L A View Apps Search NoSQL Apps DWH S T R E A M I N G P L A View Get Flights Get Flights Get Flights Global Read Central Write
  27. The exact same logic applies to microservices

  28. Microservices Orders Service Fraud Service Billing Service Email Service Orders

  29. Fraud service doesn’t have to be consistent with the Orders

    service because it just creates new data (new events) Orders Service Fraud Service Billing Service Email Service Orders Consistent?
  30. Microservices Orders Service Fraud Service Billing Service Email Service Orders

    Start to build things “Event Driven”
  31. Event Streaming

  32. Event Streaming is a more general form of Event Sourcing/CQRS

    Event Streaming • Events as shared data model • Many microservices • Polyglot persistence • Data-in-flight Traditional Event Sourcing • Events as a storage model • Single microservice • Single DB • data-at-rest
  33. Benefits of Event Streaming stand out where there are multiple

    data sources.
  34. Join, Filter, Transform and Summarize Events from Different Sources Fraud

    Service Orders Service Payment Service Customer Service Event Log Projection created in Kafka Streams API
  35. KStreams & KSQL have different positioning • KStreams is a

    library for Dataflow programming: • App logic lives in stream processor and can use state stores • Statefulness limited by operational constraints. • KSQL is a ‘database’ for event preparation: • App logic is a separate process (can’t use state stores) • Statefulness unlimited, like a DB. • App uses consumer in any language
  36. This difference makes most sense if we we look to

    the future.
  37. Cloud & Serverless

  38. Thesis • Serverless provides real-time, event-driven infrastructure and compute. •

    A stream processor provides the corollary: a database-equivalent for real-time, event-driven data.
  39. Using FaaS • Write a function • Upload • Configure

    a trigger (HTTP, Event, Object Store, Database, Timer etc.)
  40. FaaS in a Nutshell • Fully managed (Runs in a

    container pool) • Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s) • Pay for execution time (not resources used) • Auto-scales with load • 0-1000+ concurrent functions • Event driven • Stateless • Short lived (limit 5-15 mins) • Weak ordering guarantees
  41. Where is FaaS useful? • Spikey workloads • Use cases

    that don’t typically warrant massive parallelism e.g. CI systems. • General purpose programming paradigm?
  42. But there are open questions

  43. Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy

    loop • Testing • IDE integration Currently quite poor
  44. Harder than current approaches Easier than current approaches Amazon Google

    Microsoft Serverless programming will likely become prevalent
  45. In the future it seems unlikely we’ll manage our own

    infrastructure.
  46. None
  47. Event-Streaming approaches this from a different angle

  48. FaaS is event-driven But it isn’t streaming

  49. Complex, Timing issues, Scaling limits Customers Event Source Orders Event

    Source Payments Event Source Serverless functions handle only one event source FaaS/μS FaaS/μS FaaS/μS
  50. Send SQL Process boundary Orders Payments KSQL Customers Table Customers

    KSQL simplifies these issues by pre-preparing events from different sources into one event stream App Logic CREATE STREAM order- payments AS SELECT * FROM orders, payments, customers LEFT JOIN… Order Payment Customer
  51. KSQL prepares data so, when a function is called, a

    single event has all the data that function needs.
  52. KSQL also separates stateful operations from event-driven application logic

  53. FaaS FaaS FaaS KSQL Customers Table KSQL as a “Data

    Layer” for Serverless Functions FaaS FaaS STATELESS Fully elastic STATEFUL Orders Payments Customers Autoscale with load Filter, transform, join, summarizations
  54. Familiar Apps Search Apps Apps S T R E A

    M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless
  55. FaaS Traditional Application Event-Driven Application Application Database KSQL Stateful Data

    Layer FaaS FaaS FaaS FaaS FaaS Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity
  56. None
  57. Use stream processors to make the consumption of events both

    simple and scalable Think Event- Driven
  58. Summary • Events underpin the storage models of truthful/factful architectures.

    • Event sourcing is most useful when it embraces events as data-in-flight • A stream processor provides a database-like equivalent for real-time, event-driven data • Serverless provides the corollary: real-time, event-driven infrastructure and compute
  59. Things I didn’t tell you 1/2 • Tools like KSQL

    provide data provisioning, not state mutation. • Good for offline services & data pipelines • Not good for CRUD (but it’s ok to mix and match) • Kafka’s serverless integration is in it’s early stages. • Existing connector for Kafka (Limited functionality). • Confluent connector coming. • Can KSQL handle large state? • Unintended rebalance can stall processing • Static membership (KIP-345) – name the list of stream processors • Increase the timeout for rebalance after node removal (group.max.session.timeout.ms) • Worst case reload: RocksDB ~GbE speed
  60. Things I didn’t tell you 2/2 • Can Kafka be

    used for long term storage? • Log files are immutable once they roll (unless compacted) • Jun spent a decade working on DB2 • Careful: • Historical reads can stall real-time requests (cached) • ZFS has several page cache optimizations • Tiered storage will help
  61. Find out More • Peeking Behind the Curtains of Serverless

    Platforms, Wang et al. • Cloud Programming Simplified: A Berkeley View on Serverless Compute • Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless. • Designing Event Driven Systems, Ben Stopford
  62. Thank you @benstopford Book: https://www.confluent.io/designing-event-driven-systems Github: http://bit.ly/kafka-microservice-examples Example ecosystem built

    with streams. Includes KSQL, Control Centre, Elastic etc.