Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stateful Functions – Building general-purpose A...

Stateful Functions – Building general-purpose Applications and Services on Apache Flink

The Slides from my keynote at Flink Forward Europe 2019 in Berlin

The presentation introduces "Stateful Functions", a new library to use Apache Flink for general purpose applications. It brings together ideas from Stateful Stream Processing and FaaS to create a new way of building Stateful Applications.

As an introduction, the talk shows the growth of the Flink community in the last year, and recaps some of the work on streaming data processing.

Stephan Ewen

October 08, 2019
Tweet

Other Decks in Programming

Transcript

  1. © 2019 Ververica Stephan Ewen Co-founder, CTO @ Ververica Apache

    Flink PMC Member Stateful Functions – Building general-purpose Applications and Services on Apache Flink
  2. © 2019 Ververica Top 3 project in Apache, by mailing

    list activity … and top 7 by commit activity Source: Apache Annual Report 2018, https://s3.amazonaws.com/files-dist/AnnualReports/FY2018%20Annual%20Report.pdf
  3. © 2019 Ververica 7 Stand back! I’m going to run

    a batch job… Flink ≤ 1.8 Flink 1.9 / 1.10 Batching it like a pro there…
  4. © 2019 Ververica 8 Batch on Streaming in Apache Flink

    1.9 Fine-grained batch fault tolerance Table API Restructuring Blink Query Engine Python Table API Catalogs Hive Table Support Unified Operator Runtime
  5. © 2019 Ververica 9 Batch / Streaming - Features in

    Progress (selection) Full Hive compatibility Python UDFs Interactive Programs Better Memory Management for Streaming State Backends New Scheduler Resource Profile Support Machine Learning Pipelines Unaligned Checkpoints Unified Source API
  6. © 2019 Ververica 10 API Stack in Flink 1.9 Flink

    Task Runtime batch env. stream env. DataSet batch DataStream streaming batch & streaming StreamTransformation Old Flink Query Proc. Blink Query Proc. batch & streaming SQL / Table API batch & streaming
  7. © 2019 Ververica 11 API Stack future goal Flink Task

    Runtime DataStream batch & streaming batch & streaming StreamTransformation Blink Query Proc. batch & streaming SQL / Table API batch & streaming
  8. © 2019 Ververica 13 Stream Processing offline | real-time Data

    Processing event-driven | databases Applications Stream Processing is at the Intersection of Data Processing and Applications
  9. © 2019 Ververica 17 Functions as a Service λ λ

    λ λ λ λ λ λ λ λ λ λ λ elastically scalable “lightweight resource footprint”
  10. © 2019 Ververica 18 Functions as a Service – Handling

    State in Applications λ λ λ λ λ λ λ λ λ λ λ λ λ state consistency? scaling the database? connections, request rates, … often bottlenecked by state access & I/O
  11. © 2019 Ververica 19 Handling state remains a challenge for

    applications, also in the serverless world.
  12. © 2019 Ververica 20 Composition of Functions λ λ λ

    λ λ Not straightforward to build more complex applications Lack of messaging / composition primitives workflows of functions as a workaround, but not a general solution
  13. © 2019 Ververica 22 Stream Processing F-a-a-S λ λ λ

    λ simplicity / generality state management composability lightweight resources performance event-driven Can we combine some of these properties ?
  14. © 2019 Ververica 24 Bringing together ideas from Stateful Stream

    Processing and FaaS to create a new way of building Stateful Applications https://statefun.io/
  15. © 2019 Ververica 25 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state
  16. © 2019 Ververica 26 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Event ingresses supply events that trigger functions
  17. © 2019 Ververica 27 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Multiple functions send event to each other Arbitrary addressing, no restriction to DAG
  18. © 2019 Ververica 28 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Functions have locally embedded state
  19. © 2019 Ververica 29 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state State and messaging are consistent with exactly-once semantics
  20. © 2019 Ververica 30 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state No database required All persistence goes directly to blob storage
  21. © 2019 Ververica 31 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b)

    f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Event egresses to respond via event streams
  22. © 2019 Ververica 32 Logical/Virtual Instances A F C memory

    secondary storage Shard 1 G H I B function virtual instance Shard 2 D E K L M N
  23. © 2019 Ververica 33 Logical/Virtual Instances A F C Shard

    1 G H I B Shard 2 D E K L M N message to "K" load "K" possibly evict other K.invoke(message)
  24. © 2019 Ververica 34 Apache Flink is the State and

    Event Streaming Fabric Ingress & Router Function Dispatcher Ingress & Router Function Dispatcher Feedback Operator Feedback Operator Egress Egress (keyBy) (keyBy) (side output) (loop) Apache Flink Dataflow Graph Conceptual Dataflow Ingress/ Router Functions Ingress/ Router Functions Egress Egress
  25. © 2019 Ververica 35 Running Stateful Functions on Apache Flink

    Exactly-once checkpointing for streaming loops Function Dispatcher Feedback Operator loop feedback
  26. © 2019 Ververica 36 Example: Ride Sharing App Driver status

    updates Passenger ride requests Ride status update Driver Ride Pass- enger Geo- index update create bill Inform / book bid lookup update cell seeking confirmed riding free bidding booked
  27. © 2019 Ververica 37 data preparation combining knowledge/information filtering, enriching,

    aggregating, joining events coordination, (interacting) state machines complex event/state interactions “occasional” actions or spiky loads compute-intensive or blocking Stream Processing Streaming SQL Stateful Functions F-a-a-S f(a,b) f(a,b) f(a,b) λ λ λ λ state-centric event/stream-centric stateless / compute-centric
  28. © 2019 Ververica 38 Putting it all together: Ridesharing again

    f(a,b) f(a,b) f(a,b) λ λ λ λ FaaS render map/route image create a receipt PDF send email Stateful Functions ride life-cycle driver-to-ride matching Stream Processing traffic models demand forecast & pricing Billing Passenger updates Driver position updates Driver status updates
  29. © 2019 Ververica 39 Is Stateful Functions part of Apache

    Flink? Fully Open Source on Ververica’s GitHub under ASL 2 Propose contribution for Apache Flink (Flink Improvement Proposal) Community discussion about project proposal Upon acceptance, handover to the Flink project Not yet, but we would like it to be!
  30. © 2019 Ververica 40 The Megastars behind the Stateful Functions

    Project Daryl, Robert, Ufuk, Konstantin, Holger, Olivia, Markos, Enrico, Charles, Jamie G., Thomas, Greg, Jamie C., Ricky, … And a big “Thank you!” to everyone who helped and tried it out!
  31. © 2019 Ververica 41 Learn more at Technical deep-dive session

    • https://statefun.io/ • https://github.com/ververica/stateful-functions/ • https://ververica.com/blog/