Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream Processing for the Serverless Generation

C6598a8b085d0c720cde07f89768a80c?s=47 benstopford
February 06, 2019

Stream Processing for the Serverless Generation

C6598a8b085d0c720cde07f89768a80c?s=128

benstopford

February 06, 2019
Tweet

Transcript

  1. Stream Processing for the Serverless Generation Ben Stopford Office of

    the CTO, Confluent
  2. When it comes to data, we tend to think in

    databases
  3. Increasing Complexity Apps Monitoring Security Apps Apps S T R

    E A M I N G P L AT F O R M Apps Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Mon Sec Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps App S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T R E A M I N Apps Search NoSQL Apps DWH S T R E A M I N G P L AT App Apps Search NoSQL Apps S T R E A M I N G P L AT Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T Apps Search NoSQL DWH S T R E App Apps Apps Search Apps App Apps Apps Apps Search Apps Apps App Apps Apps Search App Kafka Evolution of software systems Monolith Distributed Monolith Microservices Event-Driven Microservices
  4. Apps Search NoSQL Mo Se Apps Apps S T R

    E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Ap Apps Apps Apps App Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Backend services grow faster than the front end Tightly coupled Loosely coupled e.g. Netflix have ~400 backend microservices fed by Kafka
  5. In the serverless world, which is inherently event driven, stream

    processors will become as important as databases are today.
  6. Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps

    Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hado S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O R M PRODUCER CONSUMER Streaming Platform
  7. Event Storage Kafka stores petabytes of data Stream Processing Real-time

    processing over streams and tables Scalability Clusters of hundreds of machines. Global. + + + Roots in big data messaging
  8. Are Serverless Functions and Stream Processors related? They are both

    functions we define that are triggered by streams of events.
  9. FaaS in brief • Write a function • Upload •

    Configure a trigger (HTTP, Event, Object Store, Database, Timer etc.)
  10. None
  11. FaaS in a Nutshell • Fully managed (Runs in a

    container) • Pay as you use • Auto-scales with load ~ 0-1000 concurrent functions • Short lived (max ~5 mins) • Weak ordering guarantees • Cold start’s can be slow: 100ms – 45s (AWS 250ms-7s)
  12. Where is FaaS useful? • Interesting for spikey workloads (i.e.

    Extremities) • Grid compute: HPC, Genomics, Finance • Interesting for use cases that wouldn’t typically warrant the cost of conventional massive parallelism e.g. CI systems. • Serverless programming model
  13. But there are open questions

  14. Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy

    loop • Testing • IDE integration Currently quite poor
  15. Harder than current approaches Easier than current approaches Amazon Google

    Microsoft
  16. We’ll come back to this one!

  17. FaaS is event-driven But it isn’t streaming

  18. Simple online retail example When the order is created ..and

    the payment has been completed => get the customer’s info and send them an email confirming the purchase.
  19. Serverless Way: event-driven (not streaming) Orders Customers Payments FaaS FaaS

    FaaS - Too slow for high velocity use cases ~ 5-10 messages per second - Correctness: what if the payment isn’t there when the order arrives? All Customer data All Payment data
  20. Process boundary Orders Payments KStreams Customers Table Customers Event-Streaming Platforms

    sew these operations together Stateful or Stateless • No network calls. • 50,000-100,000 messages per second, per thread • Better correctness
  21. Event Driven vs Stream Processing

  22. Stream processors can be considered the databases of the event

    driven world
  23. A little detail

  24. Three key features •Stream-stream join (combine in real-time) • Unlike

    a database join you only need consider how late data might be •Stream-table join (enrich) • More like a database join on one side only. •Aggregate (summarize) • Big data sets are too large
  25. Join events that happened recently Stream-Stream Join

  26. KSQL Joining two streams orders.join(payments) Bob’s Order Bob’s Payment Jill’s

    Payment Jill’s Order Orders Payments
  27. KSQL Join is on the key (messages have keys in

    Kafka) orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order
  28. KSQL Joining Two Streams: Streaming systems doesn’t know when data

    is going to arrive orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order
  29. KSQL Bob’s payment arrives – nothing to join with orders.join(payments)

    Bob’s Payment Bob’s Order Jill’s Payment Jill’s Order
  30. KSQL Message gets buffered Key-value store Bob’s Order Bob’s Payment

    Jill’s Payment Jill’s Order
  31. KSQL Jill’s order arrives and gets buffered Key-value store Bob’s

    Order Jill’s Payment Jill’s Order Bob’s Payment
  32. KSQL Another non-matching record is buffered Key-value store Bob’s Order

    Jill’s Payment Jill’s Order Bob’s Payment
  33. KSQL MATCH - based on key comparison Bob’s Order Jill’s

    Payment Jill’s Order Bob’s Payment
  34. KSQL MATCH – Create output event Bob’s Order Jill’s Payment

    Jill’s Order Bob’s Payment
  35. KSQL Continue Bob’s Order Jill’s Payment Jill’s Order Bob’s Payment

  36. KSQL 2nd MATCH – Create another output event Bob’s Order

    Jill’s Payment Jill’s Order Bob’s Payment
  37. KSQL 2nd MATCH – Create another output event Jill’s Payment

    Jill’s Order Bob’s Payment Bob’s Order
  38. Enrichment of event stream using a table Stream-Table join

  39. KSQL Join a Stream with a Table Customers Orders Query

    Cust1 Table of Customers
  40. First we need some source: Database, Stream Processor etc. Apps

    Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Data is now saved in a topic in Kafka Event Storage
  41. KSQL Kind of data virtualizartion Table of Customers 1. Reload

    2. Keep up to date Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Event Storage
  42. Summarizing Event Streams Filters and Aggregations

  43. Summarizing data streams (Event Sourced) (Read/Write) KV Store Event Storage

    Current state is streamed to Kafka KStreams Payments User-> Balance
  44. Calc Balance Payments Event Storage User Balance Bob $500 Sally

    $250 George $32 Sum payments to get the user’s account balance SELECT user, sum(amount) FROM payments GROUP BY user;
  45. Summarizing data streams (Windowed) (Read/Write) KV Store Event Storage Current

    state is streamed to Kafka KStreams Page Views 1 minute window Page Views per min
  46. Three key features • The stream-stream join (combine in real-time)

    • The stream-table join (enrich) • The aggregate (summarize) (lot’s more: transactions, chained operations, queryable state etc.)
  47. The operations are stateful to different degrees • The stream-stream

    join (state ∝ buffer) • The stream-table join (state ∝ table size) • The aggregate (state ∝ cardinality of aggregation key)
  48. Two modes of operation

  49. API-based (Most common today) orders .filter((id, order) -> order.state().equals(“CREATED”)) .join(payments)

    .transform(MyEmailer::new, STORE) .to(“sent-emails”) JVM Only Similar to Flink, Storm, Samza, Spark etc.
  50. Process boundary Orders Payments KStreams Customers Table Customers Use the

    API Business logic
  51. Orders Payments KStreams Customers Table Customers Is mixing state and

    business logic a good idea? Avoid Being too Stateful
  52. Use KSQL CREATE STREAM order-payments AS SELECT * FROM orders,

    payments LEFT JOIN orders ON orders.orderId = payments.orderId; WHERE order.state = ‘CREATED’
  53. Event Storage + Messaging Event Storage, query layer, App KSQL

    KSQL KSQL Query Layer Apps Search Business Logic Joined, enriched, summarized event stream
  54. Orders Payments KSQL Customers Table Customers Use KSQL Server (or

    Cloud Service) My code SELECT * FROM orders, payments, customers WHERE … Joined, enriched event stream Business Logic
  55. Stateless layer scales easily KSQL Stateful Data Layer Stateless Application

    layer (any language) Scales quickly Event Storage Denormalized Events Three event streams from different event sources
  56. Pattern Should be Familiar Apps Search Apps Apps S T

    R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless
  57. Comparing back to Serverless Functions FaaS - Autoscale based on

    demand - Pay as you use - Simple programming model - Stateless - High latency - High throughput (if batched) - One event source Stream Processors - Stateful & Stateless operations at high throughputs. - Join different event sources /enrich - Correctness even after failure - Rich semantics for dataflow programming. - Effectively infinite storage in Kafka - Doesn’t autoscale (Scale manually / programmatically) - More complex
  58. FaaS FaaS FaaS Transaction KSQL Customers Table They compliment: Stream

    processors act as a “data layer” for FaaS FaaS FaaS Stateless Stateful Orders Payments Customers AWS Lambda Connector
  59. Broader pattern (easier to consume, keep apps stateless) Orders Service

    Payment Service Customer Service Denormalized Events Apps Apps Apps Search NoSQL Apps S T R E A M I N G P L AT F NoSQL Order Payment Customer Most languages supported Denormalized Events
  60. Event Streams Orders Payments Customers Distinct Visits Destination C* Postgres

    Lambda Other Kafka Select Organizational Events Stream Processing SELECT * FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’ Msgs/Day Customers Stream Processing C* Lambda Orders History 1w All Event storage + Stream Processing make data self service (real time & historical)
  61. Steaming Platforms apply these patterns across ecosystems Event Streaming Platform

    (Storage + Stream Processing)
  62. The Future It’s an evolving field

  63. FaaS FaaS FaaS Orders Payments KSQL Customers Table Customers Connector

    FaaS FaaS • Ordering (by partition) • Batching
  64. FaaS FaaS FaaS Transaction Orders Payments KSQL Customers Table Customers

    Stateless Stateful Maybe: Inherit Kakfa’s Transactional Guarantees FaaS FaaS
  65. In Summary • Stream processors can operate like databases for

    this event driven world. • FaaS is one of many “end points” • FaaS has unique properties: • Pay as you use • Load driven autoscaling • Programming model •Trick: split application logic from data preparation & adopt event-first model.
  66. In this increasingly event driven world stream processors become as

    important as databases are today.
  67. FaaS CRUD Event-Driven Application Database KSQL Stateful Data Layer FaaS

    FaaS FaaS FaaS FaaS Event-Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity
  68. None
  69. Streaming platforms provide a unique alternative. Billing Shipping Fraud Fraud

    Fulfilment Streaming Platform
  70. Thank you @benstopford Book: https://www.confluent.io/designing-event-driven-systems

  71. Rate today’s session Session page on oreillysacon.com/ny O’Reilly Events App