Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream Processing for the Serverless Generation

benstopford
February 06, 2019

Stream Processing for the Serverless Generation

benstopford

February 06, 2019
Tweet

More Decks by benstopford

Other Decks in Technology

Transcript

  1. Increasing Complexity Apps Monitoring Security Apps Apps S T R

    E A M I N G P L AT F O R M Apps Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Mon Sec Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps App S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T R E A M I N Apps Search NoSQL Apps DWH S T R E A M I N G P L AT App Apps Search NoSQL Apps S T R E A M I N G P L AT Apps Search NoSQL Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL S T Apps Search NoSQL DWH S T R E App Apps Apps Search Apps App Apps Apps Apps Search Apps Apps App Apps Apps Search App Kafka Evolution of software systems Monolith Distributed Monolith Microservices Event-Driven Microservices
  2. Apps Search NoSQL Mo Se Apps Apps S T R

    E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Monitoring Security Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Apps Search NoSQL Monitoring Security Apps Apps DWH Hadoop S T R E A M I N G P L AT F O R M App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Ap Apps Apps Apps App Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Apps Apps Apps Apps Apps Apps App Backend services grow faster than the front end Tightly coupled Loosely coupled e.g. Netflix have ~400 backend microservices fed by Kafka
  3. In the serverless world, which is inherently event driven, stream

    processors will become as important as databases are today.
  4. Apps Apps Apps Apps Search Monitoring Apps Apps Apps Apps

    Apps Apps Search Monitoring Apps Apps Apps Search NoSQL Apps Apps DWH Hado S T R E A M I N G P L AT F O R M Apps Search NoSQL Apps DWH S T R E A M I N G P L AT F O R M PRODUCER CONSUMER Streaming Platform
  5. Event Storage Kafka stores petabytes of data Stream Processing Real-time

    processing over streams and tables Scalability Clusters of hundreds of machines. Global. + + + Roots in big data messaging
  6. Are Serverless Functions and Stream Processors related? They are both

    functions we define that are triggered by streams of events.
  7. FaaS in brief • Write a function • Upload •

    Configure a trigger (HTTP, Event, Object Store, Database, Timer etc.)
  8. FaaS in a Nutshell • Fully managed (Runs in a

    container) • Pay as you use • Auto-scales with load ~ 0-1000 concurrent functions • Short lived (max ~5 mins) • Weak ordering guarantees • Cold start’s can be slow: 100ms – 45s (AWS 250ms-7s)
  9. Where is FaaS useful? • Interesting for spikey workloads (i.e.

    Extremities) • Grid compute: HPC, Genomics, Finance • Interesting for use cases that wouldn’t typically warrant the cost of conventional massive parallelism e.g. CI systems. • Serverless programming model
  10. Serverless Developer Ecosystem • Runtime diagnostics • Monitoring • Deploy

    loop • Testing • IDE integration Currently quite poor
  11. Simple online retail example When the order is created ..and

    the payment has been completed => get the customer’s info and send them an email confirming the purchase.
  12. Serverless Way: event-driven (not streaming) Orders Customers Payments FaaS FaaS

    FaaS - Too slow for high velocity use cases ~ 5-10 messages per second - Correctness: what if the payment isn’t there when the order arrives? All Customer data All Payment data
  13. Process boundary Orders Payments KStreams Customers Table Customers Event-Streaming Platforms

    sew these operations together Stateful or Stateless • No network calls. • 50,000-100,000 messages per second, per thread • Better correctness
  14. Three key features •Stream-stream join (combine in real-time) • Unlike

    a database join you only need consider how late data might be •Stream-table join (enrich) • More like a database join on one side only. •Aggregate (summarize) • Big data sets are too large
  15. KSQL Join is on the key (messages have keys in

    Kafka) orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order
  16. KSQL Joining Two Streams: Streaming systems doesn’t know when data

    is going to arrive orders.join(payments) Bob’s Order Bob’s Payment Jill’s Payment Jill’s Order
  17. KSQL Bob’s payment arrives – nothing to join with orders.join(payments)

    Bob’s Payment Bob’s Order Jill’s Payment Jill’s Order
  18. KSQL Jill’s order arrives and gets buffered Key-value store Bob’s

    Order Jill’s Payment Jill’s Order Bob’s Payment
  19. KSQL Another non-matching record is buffered Key-value store Bob’s Order

    Jill’s Payment Jill’s Order Bob’s Payment
  20. KSQL MATCH - based on key comparison Bob’s Order Jill’s

    Payment Jill’s Order Bob’s Payment
  21. KSQL 2nd MATCH – Create another output event Bob’s Order

    Jill’s Payment Jill’s Order Bob’s Payment
  22. KSQL 2nd MATCH – Create another output event Jill’s Payment

    Jill’s Order Bob’s Payment Bob’s Order
  23. First we need some source: Database, Stream Processor etc. Apps

    Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Data is now saved in a topic in Kafka Event Storage
  24. KSQL Kind of data virtualizartion Table of Customers 1. Reload

    2. Keep up to date Apps Search NoSQL Apps Apps S T R E A M I N G P L AT F O R M Customers Event Storage
  25. Summarizing data streams (Event Sourced) (Read/Write) KV Store Event Storage

    Current state is streamed to Kafka KStreams Payments User-> Balance
  26. Calc Balance Payments Event Storage User Balance Bob $500 Sally

    $250 George $32 Sum payments to get the user’s account balance SELECT user, sum(amount) FROM payments GROUP BY user;
  27. Summarizing data streams (Windowed) (Read/Write) KV Store Event Storage Current

    state is streamed to Kafka KStreams Page Views 1 minute window Page Views per min
  28. Three key features • The stream-stream join (combine in real-time)

    • The stream-table join (enrich) • The aggregate (summarize) (lot’s more: transactions, chained operations, queryable state etc.)
  29. The operations are stateful to different degrees • The stream-stream

    join (state ∝ buffer) • The stream-table join (state ∝ table size) • The aggregate (state ∝ cardinality of aggregation key)
  30. API-based (Most common today) orders .filter((id, order) -> order.state().equals(“CREATED”)) .join(payments)

    .transform(MyEmailer::new, STORE) .to(“sent-emails”) JVM Only Similar to Flink, Storm, Samza, Spark etc.
  31. Orders Payments KStreams Customers Table Customers Is mixing state and

    business logic a good idea? Avoid Being too Stateful
  32. Use KSQL CREATE STREAM order-payments AS SELECT * FROM orders,

    payments LEFT JOIN orders ON orders.orderId = payments.orderId; WHERE order.state = ‘CREATED’
  33. Event Storage + Messaging Event Storage, query layer, App KSQL

    KSQL KSQL Query Layer Apps Search Business Logic Joined, enriched, summarized event stream
  34. Orders Payments KSQL Customers Table Customers Use KSQL Server (or

    Cloud Service) My code SELECT * FROM orders, payments, customers WHERE … Joined, enriched event stream Business Logic
  35. Stateless layer scales easily KSQL Stateful Data Layer Stateless Application

    layer (any language) Scales quickly Event Storage Denormalized Events Three event streams from different event sources
  36. Pattern Should be Familiar Apps Search Apps Apps S T

    R E A M I N G P L AT F O R M Apps Search Monitorin Apps Apps S T R E A M I N G P L AT F O R M Apps Search Apps Apps Search Monitor Apps Apps Stateful Stateless
  37. Comparing back to Serverless Functions FaaS - Autoscale based on

    demand - Pay as you use - Simple programming model - Stateless - High latency - High throughput (if batched) - One event source Stream Processors - Stateful & Stateless operations at high throughputs. - Join different event sources /enrich - Correctness even after failure - Rich semantics for dataflow programming. - Effectively infinite storage in Kafka - Doesn’t autoscale (Scale manually / programmatically) - More complex
  38. FaaS FaaS FaaS Transaction KSQL Customers Table They compliment: Stream

    processors act as a “data layer” for FaaS FaaS FaaS Stateless Stateful Orders Payments Customers AWS Lambda Connector
  39. Broader pattern (easier to consume, keep apps stateless) Orders Service

    Payment Service Customer Service Denormalized Events Apps Apps Apps Search NoSQL Apps S T R E A M I N G P L AT F NoSQL Order Payment Customer Most languages supported Denormalized Events
  40. Event Streams Orders Payments Customers Distinct Visits Destination C* Postgres

    Lambda Other Kafka Select Organizational Events Stream Processing SELECT * FROM ORDERS O, CUSTOMERS C WHERE O.REGION = ‘EU’ AND C.TYPE = ‘Platinum’ Msgs/Day Customers Stream Processing C* Lambda Orders History 1w All Event storage + Stream Processing make data self service (real time & historical)
  41. FaaS FaaS FaaS Orders Payments KSQL Customers Table Customers Connector

    FaaS FaaS • Ordering (by partition) • Batching
  42. FaaS FaaS FaaS Transaction Orders Payments KSQL Customers Table Customers

    Stateless Stateful Maybe: Inherit Kakfa’s Transactional Guarantees FaaS FaaS
  43. In Summary • Stream processors can operate like databases for

    this event driven world. • FaaS is one of many “end points” • FaaS has unique properties: • Pay as you use • Load driven autoscaling • Programming model •Trick: split application logic from data preparation & adopt event-first model.
  44. FaaS CRUD Event-Driven Application Database KSQL Stateful Data Layer FaaS

    FaaS FaaS FaaS FaaS Event-Streaming Stateless Stateless Stateless Compute Layer Massive linear scalability with elasticity