Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streams vs. Serverless: Friend or Foe?

Streams vs. Serverless: Friend or Foe?

Serverless platforms let us build fine-grained functions that are fully managed and naturally autoscale. Stream processors give us rich streaming primitives that can be stateful, stateless and scale to high throughputs. Both tie their roots back to streams of events, but are real-time event streams enough for contemporary applications?

In this talk, we’ll look at how functions and stream processors both compare and complement. We'll look at why stream processors like Kafka Streams provide both stateful and stateless operations. Finally, we'll look at how emerging architectures for event-driven systems are quite different from the ones we are used to, whether they're based on microservices, functions or full-blown streaming systems.

C6598a8b085d0c720cde07f89768a80c?s=128

benstopford

May 10, 2019
Tweet

Transcript

  1. Streams vs Serverless: Friend or Foe? Ben Stopford Office of

    the CTO, Confluent
  2. Old and emerging categories Relational Database Data warehouse Messaging Cloud

    NoSQL & Big Data Event Streaming 1970 1980 1990 2000 2010 2020 Serverless
  3. In Emerging Categories take caution

  4. Thesis • A stream processor provides a database-equivalent for real-

    time, event-driven data • Serverless provides the corollary: real-time, event-driven infrastructure and compute
  5. Event-Streaming

  6. Event Storage Kafka stores 100’s TBs of data Stream Processing

    Real-time processing over streams and tables Scalability Large Clusters. HA. Global. + + + Roots in big data messaging
  7. Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps

    Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O R M PRODUCER CONSUMER Apache Kafka
  8. > 2 trillion messages per day

  9. The three big stream processors FLINK KSQL / KAFKA STREAMS

    SPARK STREAMING
  10. Increasing System Complexity Apps Monitoring Security Apps Apps S T

    R E A M I N G P L AT F O R M Apps Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Mon Sec Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps App S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T R E A M I N Apps Search NoSQL Apps DWH S T R E A M I N G P L AT App Apps Search NoSQL Apps S T R E A M I N G P L AT Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T Apps Search NoSQL DWH S T R E App Apps Apps Search Apps App Apps Apps Apps Search Apps Apps App Apps Apps Search App Kafka Where it fits in: Microservices Monolith Distributed Monolith Microservices Event-Driven Microservices
  11. Apps Search Monitoring Apps Apps S T R E A

    M I N G P L AT F O R M Kafka Summarizing Data-in-Flight, Streaming View Creation, Streaming ETL Many Event Sources Read optimized format Kafka Many Event Sources Event-Driven Microservices Framework Microservices
  12. Serverless Functions

  13. Using FaaS • Write a function • Upload • Configure

    a trigger (HTTP, Event, Object Store, Database, Timer etc.)
  14. None
  15. FaaS in a Nutshell • Fully managed (Runs in a

    container) • Pay as you use • Auto-scales with load (0-1000 functions on AWS) • Event driven (allowing broad range of event sources) • Short lived (max ~5 mins) • Can be slow to start 100ms – 45s (AWS 250ms-7s)
  16. Where is FaaS useful? • Interesting for spikey workloads (i.e.

    Extremities) • Grid compute: HPC, Genomics, Finance • Interesting for use cases that wouldn’t typically warrant massive parallelism e.g. CI systems. • General application programming?
  17. But there are open questions

  18. Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy

    loop • Testing • IDE integration Currently quite poor
  19. Harder than current approaches Easier than current approaches Amazon Google

    Microsoft
  20. In the future it seems unlikely we’ll manage our own

    infrastructure.
  21. None
  22. Event-Streaming approaches this from a different angle

  23. FaaS is event-driven But it isn’t streaming

  24. Complex, Timing issues, Scaling limits Customers Event Source Orders Event

    Source Payments Event Source Event driven but not streaming FaaS/μS FaaS/μS FaaS/μS
  25. Process boundary Orders Payments KStreams API Customers Table Customers Embedded:

    blend data held in streams and tables at database speeds App Logic
  26. Kafka Streams API orders .filter((id, order) -> order.state().equals(“CREATED”)) .join(payments) .join(customers)

    .transform(MyEmailer::new, STORE) .to(“sent-emails”) JVM Only Code abridged for brevity
  27. Streaming can also be a server process

  28. Send SQL Process boundary Orders Payments KSQL Customers Table Customers

    Server: pre-provision data for a stateless compute layer in any language App Logic CREATE STREAM order- payments AS SELECT * FROM orders, payments, customers LEFT JOIN…
  29. Event Streaming provisions real-time data directly to application logic... Let’s

    explore the internals…
  30. #1 Streaming Joins

  31. KStreams Joining two streams orders.join(payments) Bob’s Order Bob’s Payment Jill’s

    Payment Jill’s Order Orders Payments
  32. KStreams Joining two streams orders.join(payments) Bob’s Order Bob’s Payment Jill’s

    Payment Jill’s Order
  33. KStreams Joining two streams orders.join(payments) Bob’s Order Bob’s Payment Jill’s

    Payment Jill’s Order
  34. KStreams Joining two streams orders.join(payments) Bob’s Payment Bob’s Order Jill’s

    Payment Jill’s Order
  35. KStreams Joining two streams Key-value store Bob’s Order Bob’s Payment

    Jill’s Payment Jill’s Order
  36. KStreams Joining two streams Key-value store Bob’s Order Jill’s Payment

    Jill’s Order Bob’s Payment
  37. KStreams Joining two streams Key-value store Bob’s Order Jill’s Payment

    Jill’s Order Bob’s Payment
  38. KStreams Joining two streams Bob’s Order Jill’s Payment Jill’s Order

    Bob’s Payment
  39. KStreams Joining two streams Bob’s Order Jill’s Payment Jill’s Order

    Bob’s Payment
  40. KStreams Joining two streams Bob’s Order Jill’s Payment Jill’s Order

    Bob’s Payment
  41. KStreams Joining two streams Bob’s Order Jill’s Payment Jill’s Order

    Bob’s Payment
  42. KStreams Joining two streams Jill’s Payment Jill’s Order Bob’s Payment

    Bob’s Order
  43. #2 Stream-Table Joins

  44. KStreams Join a Stream with a Table Customers Orders Query

    Cust1 Table of Customers
  45. First we need some source: Database, Stream Processor etc. Apps

    Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Data is now saved in a topic in Kafka Event Storage
  46. KStreams Load entire stream into Kafka/KStreams Table of Customers 1.

    Reload 2. Keep up to date Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Event Storage
  47. KStreams Once fully loaded we do the stream-table join Customers

    Orders Table Table of Customers Event Storage
  48. N.B. Tables are Stateful

  49. #3 Local State Stores

  50. State Stores work like a little personal DB (Read/Write) KV

    Store Event Storage Mutations are streamed to Kafka KStreams Payments
  51. Calc Balance Payments Event Storage User Balance Bob $500 Sally

    $250 George $32 Sum payments to get the user’s account balance In KSQL: SELECT user, sum(value) FROM payments GROUP BY user;
  52. Fraud Detec Payments Event Storage Can also use windows e.g.

    fraud detection.
  53. #4 Operations chain

  54. Calc Balance Payments Event Storage Chain Operations Together: Think Unix

    Pipe Balance < 0? Overdraft notification Overdraft Charge
  55. #5 Transactions

  56. Calc Balance Payments Event Storage Transactions Balance < 0? Overdraft

    notification Overdraft Charge Transaction
  57. Note: Transactions only work in Kafka If you call out

    to some other service you’re on your own
  58. #6 Queryable State

  59. Query tables stored in KStreams (not supported in KSQL yet)

  60. Comparing back to Serverless Functions Serverless - Autoscale based on

    demand - Simple programming model - Stateless - High latency - One event source Stream Processors - Stateful & Stateless operations at high throughputs. - Join different event sources - Correctness even after failure - Rich semantics for dataflow programming. - Scales automatically, but not with load - More complex
  61. But the differences are arguably more compelling than the similarities

  62. Competitive Advantage Serverless caters for stateless event driven compute: •

    Resource Scheduling • VM provisioning • Instance recycling • Tenant isolation • Network/CPU/Disk • Security
  63. Competitive Advantage Stream Processors specialize in: • Combining • Filtering

    • Transforming • Summarizing real-time event data from many places
  64. How do these differences relate?

  65. They complement in much the same way that a database

    complements a traditional application Apps Search Apps Apps S T R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless
  66. State has “weight” that impedes elasticity Apps Search Apps Apps

    S T R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Low Elasticity High Elasticity
  67. Same Concept applies in Streaming Applications (Fulfilment, Logistics, Trade Lifecycle,

    Fraud, Payment Processing) Source of Truth Fraud Service Apps Search Apps S T R E A M I N G P L KSQL Customers Event Source Orders Event Source Payments Event Source Apply Projection Stateful Stateless SELECT * FROM orders, payments, customers WHERE …
  68. Stateless layer scales easily KSQL Stateful Data Layer Stateless Application

    layer Scales quickly Event Storage Denormalized Events Three event streams from different event sources
  69. Back to FaaS: event driven but not streaming Stream Processing

    • Rich functionality for handling multiple streams and tables • Often Stateful. FaaS • Unopinionated on data • Statelesss
  70. FaaS FaaS FaaS Transaction KSQL Customers Table Stream processors can

    act as a “data layer” for FaaS (pre-provisioning the data each function needs) FaaS FaaS Stateless Stateful Orders Payments Customers
  71. FaaS Traditional Application Event-Driven Application Application Database KSQL Stateful Data

    Layer FaaS FaaS FaaS FaaS FaaS Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity
  72. Original Thesis • A stream processor provides a database-equivalent for

    real-time, event-driven data • Serverless provides the corollary: real-time, event-driven infrastructure and compute
  73. Final things… • Tools like KSQL provide data provisioning, not

    state mutation. • Good for data processing / pipelines. Most backed services (i.e. offline) • Not good for CRUD. • It’s ok to mix and match. • Kafka’s serverless integration is in it’s early stages. • Existing connector for Kafka. • Limited functionality. • Confluent connector coming. • Not integrated with Confluent Cloud, yet.
  74. Find out More • Peeking Behind the Curtains of Serverless

    Platforms, Wang et al. • Cloud Programming Simplified: A Berkeley View on Serverless Compute • Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless. • Designing Event Driven Systems, Ben Stopford
  75. Thank you @benstopford Book: https://www.confluent.io/designing-event-driven-systems