Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streams vs. Serverless: Friend or Foe?

Streams vs. Serverless: Friend or Foe?

Serverless platforms let us build fine-grained functions that are fully managed and naturally autoscale. Stream processors give us rich streaming primitives that can be stateful, stateless and scale to high throughputs. Both tie their roots back to streams of events, but are real-time event streams enough for contemporary applications?

In this talk, we’ll look at how functions and stream processors both compare and complement. We'll look at why stream processors like Kafka Streams provide both stateful and stateless operations. Finally, we'll look at how emerging architectures for event-driven systems are quite different from the ones we are used to, whether they're based on microservices, functions or full-blown streaming systems.

benstopford

May 10, 2019
Tweet

More Decks by benstopford

Other Decks in Technology

Transcript

  1. Old and emerging categories Relational Database Data warehouse Messaging Cloud

    NoSQL & Big Data Event Streaming 1970 1980 1990 2000 2010 2020 Serverless
  2. Thesis • A stream processor provides a database-equivalent for real-

    time, event-driven data • Serverless provides the corollary: real-time, event-driven infrastructure and compute
  3. Event Storage Kafka stores 100’s TBs of data Stream Processing

    Real-time processing over streams and tables Scalability Large Clusters. HA. Global. + + + Roots in big data messaging
  4. Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps

    Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O R M PRODUCER CONSUMER Apache Kafka
  5. Increasing System Complexity Apps Monitoring Security Apps Apps S T

    R E A M I N G P L AT F O R M Apps Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Mon Sec Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps App S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T R E A M I N Apps Search NoSQL Apps DWH S T R E A M I N G P L AT App Apps Search NoSQL Apps S T R E A M I N G P L AT Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T Apps Search NoSQL DWH S T R E App Apps Apps Search Apps App Apps Apps Apps Search Apps Apps App Apps Apps Search App Kafka Where it fits in: Microservices Monolith Distributed Monolith Microservices Event-Driven Microservices
  6. Apps Search Monitoring Apps Apps S T R E A

    M I N G P L AT F O R M Kafka Summarizing Data-in-Flight, Streaming View Creation, Streaming ETL Many Event Sources Read optimized format Kafka Many Event Sources Event-Driven Microservices Framework Microservices
  7. Using FaaS • Write a function • Upload • Configure

    a trigger (HTTP, Event, Object Store, Database, Timer etc.)
  8. FaaS in a Nutshell • Fully managed (Runs in a

    container) • Pay as you use • Auto-scales with load (0-1000 functions on AWS) • Event driven (allowing broad range of event sources) • Short lived (max ~5 mins) • Can be slow to start 100ms – 45s (AWS 250ms-7s)
  9. Where is FaaS useful? • Interesting for spikey workloads (i.e.

    Extremities) • Grid compute: HPC, Genomics, Finance • Interesting for use cases that wouldn’t typically warrant massive parallelism e.g. CI systems. • General application programming?
  10. Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy

    loop • Testing • IDE integration Currently quite poor
  11. Complex, Timing issues, Scaling limits Customers Event Source Orders Event

    Source Payments Event Source Event driven but not streaming FaaS/μS FaaS/μS FaaS/μS
  12. Process boundary Orders Payments KStreams API Customers Table Customers Embedded:

    blend data held in streams and tables at database speeds App Logic
  13. Kafka Streams API orders .filter((id, order) -> order.state().equals(“CREATED”)) .join(payments) .join(customers)

    .transform(MyEmailer::new, STORE) .to(“sent-emails”) JVM Only Code abridged for brevity
  14. Send SQL Process boundary Orders Payments KSQL Customers Table Customers

    Server: pre-provision data for a stateless compute layer in any language App Logic CREATE STREAM order- payments AS SELECT * FROM orders, payments, customers LEFT JOIN…
  15. First we need some source: Database, Stream Processor etc. Apps

    Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Data is now saved in a topic in Kafka Event Storage
  16. KStreams Load entire stream into Kafka/KStreams Table of Customers 1.

    Reload 2. Keep up to date Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Event Storage
  17. KStreams Once fully loaded we do the stream-table join Customers

    Orders Table Table of Customers Event Storage
  18. State Stores work like a little personal DB (Read/Write) KV

    Store Event Storage Mutations are streamed to Kafka KStreams Payments
  19. Calc Balance Payments Event Storage User Balance Bob $500 Sally

    $250 George $32 Sum payments to get the user’s account balance In KSQL: SELECT user, sum(value) FROM payments GROUP BY user;
  20. Calc Balance Payments Event Storage Chain Operations Together: Think Unix

    Pipe Balance < 0? Overdraft notification Overdraft Charge
  21. Note: Transactions only work in Kafka If you call out

    to some other service you’re on your own
  22. Comparing back to Serverless Functions Serverless - Autoscale based on

    demand - Simple programming model - Stateless - High latency - One event source Stream Processors - Stateful & Stateless operations at high throughputs. - Join different event sources - Correctness even after failure - Rich semantics for dataflow programming. - Scales automatically, but not with load - More complex
  23. Competitive Advantage Serverless caters for stateless event driven compute: •

    Resource Scheduling • VM provisioning • Instance recycling • Tenant isolation • Network/CPU/Disk • Security
  24. Competitive Advantage Stream Processors specialize in: • Combining • Filtering

    • Transforming • Summarizing real-time event data from many places
  25. They complement in much the same way that a database

    complements a traditional application Apps Search Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless
  26. State has “weight” that impedes elasticity Apps Search Apps Apps

    S T R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Low Elasticity High Elasticity
  27. Same Concept applies in Streaming Applications (Fulfilment, Logistics, Trade Lifecycle,

    Fraud, Payment Processing) Source of Truth Fraud Service Apps Search Apps S T R E A M I N G P L KSQL Customers Event Source Orders Event Source Payments Event Source Apply Projection Stateful Stateless SELECT * FROM orders, payments, customers WHERE …
  28. Stateless layer scales easily KSQL Stateful Data Layer Stateless Application

    layer Scales quickly Event Storage Denormalized Events Three event streams from different event sources
  29. Back to FaaS: event driven but not streaming Stream Processing

    • Rich functionality for handling multiple streams and tables • Often Stateful. FaaS • Unopinionated on data • Statelesss
  30. FaaS FaaS FaaS Transaction KSQL Customers Table Stream processors can

    act as a “data layer” for FaaS (pre-provisioning the data each function needs) FaaS FaaS Stateless Stateful Orders Payments Customers
  31. FaaS Traditional Application Event-Driven Application Application Database KSQL Stateful Data

    Layer FaaS FaaS FaaS FaaS FaaS Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity
  32. Original Thesis • A stream processor provides a database-equivalent for

    real-time, event-driven data • Serverless provides the corollary: real-time, event-driven infrastructure and compute
  33. Final things… • Tools like KSQL provide data provisioning, not

    state mutation. • Good for data processing / pipelines. Most backed services (i.e. offline) • Not good for CRUD. • It’s ok to mix and match. • Kafka’s serverless integration is in it’s early stages. • Existing connector for Kafka. • Limited functionality. • Confluent connector coming. • Not integrated with Confluent Cloud, yet.
  34. Find out More • Peeking Behind the Curtains of Serverless

    Platforms, Wang et al. • Cloud Programming Simplified: A Berkeley View on Serverless Compute • Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless. • Designing Event Driven Systems, Ben Stopford