Slide 1

Slide 1 text

© 2019 Ververica Stephan Ewen Co-founder, CTO @ Ververica Apache Flink PMC Member Stateful Functions – Building general-purpose Applications and Services on Apache Flink

Slide 2

Slide 2 text

© 2019 Ververica The State of Apache Flink

Slide 3

Slide 3 text

© 2019 Ververica Passed 10,000 Stars on GitHub in August

Slide 4

Slide 4 text

© 2019 Ververica Top 3 project in Apache, by mailing list activity … and top 7 by commit activity Source: Apache Annual Report 2018, https://s3.amazonaws.com/files-dist/AnnualReports/FY2018%20Annual%20Report.pdf

Slide 5

Slide 5 text

© 2019 Ververica Flink 1.9 is the biggest Apache Flink release to date

Slide 6

Slide 6 text

© 2019 Ververica The last six months, feature-wise State-of-the-Art Batch Processing On a Stream Processor

Slide 7

Slide 7 text

© 2019 Ververica 7 Stand back! I’m going to run a batch job… Flink ≤ 1.8 Flink 1.9 / 1.10 Batching it like a pro there…

Slide 8

Slide 8 text

© 2019 Ververica 8 Batch on Streaming in Apache Flink 1.9 Fine-grained batch fault tolerance Table API Restructuring Blink Query Engine Python Table API Catalogs Hive Table Support Unified Operator Runtime

Slide 9

Slide 9 text

© 2019 Ververica 9 Batch / Streaming - Features in Progress (selection) Full Hive compatibility Python UDFs Interactive Programs Better Memory Management for Streaming State Backends New Scheduler Resource Profile Support Machine Learning Pipelines Unaligned Checkpoints Unified Source API

Slide 10

Slide 10 text

© 2019 Ververica 10 API Stack in Flink 1.9 Flink Task Runtime batch env. stream env. DataSet batch DataStream streaming batch & streaming StreamTransformation Old Flink Query Proc. Blink Query Proc. batch & streaming SQL / Table API batch & streaming

Slide 11

Slide 11 text

© 2019 Ververica 11 API Stack future goal Flink Task Runtime DataStream batch & streaming batch & streaming StreamTransformation Blink Query Proc. batch & streaming SQL / Table API batch & streaming

Slide 12

Slide 12 text

© 2019 Ververica Let’s look at building Applications

Slide 13

Slide 13 text

© 2019 Ververica 13 Stream Processing offline | real-time Data Processing event-driven | databases Applications Stream Processing is at the Intersection of Data Processing and Applications

Slide 14

Slide 14 text

© 2019 Ververica 14 Building an Application Today

Slide 15

Slide 15 text

© 2019 Ververica 15 Building an Application Today The big trend: Serverless FaaS

Slide 16

Slide 16 text

© 2019 Ververica 16 Functions as a Service λ an event-driven function

Slide 17

Slide 17 text

© 2019 Ververica 17 Functions as a Service λ λ λ λ λ λ λ λ λ λ λ λ λ elastically scalable “lightweight resource footprint”

Slide 18

Slide 18 text

© 2019 Ververica 18 Functions as a Service – Handling State in Applications λ λ λ λ λ λ λ λ λ λ λ λ λ state consistency? scaling the database? connections, request rates, … often bottlenecked by state access & I/O

Slide 19

Slide 19 text

© 2019 Ververica 19 Handling state remains a challenge for applications, also in the serverless world.

Slide 20

Slide 20 text

© 2019 Ververica 20 Composition of Functions λ λ λ λ λ Not straightforward to build more complex applications Lack of messaging / composition primitives workflows of functions as a workaround, but not a general solution

Slide 21

Slide 21 text

© 2019 Ververica 21 state management composable Stream Processing ...that sound like… event-driven

Slide 22

Slide 22 text

© 2019 Ververica 22 Stream Processing F-a-a-S λ λ λ λ simplicity / generality state management composability lightweight resources performance event-driven Can we combine some of these properties ?

Slide 23

Slide 23 text

© 2019 Ververica 23 …we announce… Today…

Slide 24

Slide 24 text

© 2019 Ververica 24 Bringing together ideas from Stateful Stream Processing and FaaS to create a new way of building Stateful Applications https://statefun.io/

Slide 25

Slide 25 text

© 2019 Ververica 25 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state

Slide 26

Slide 26 text

© 2019 Ververica 26 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Event ingresses supply events that trigger functions

Slide 27

Slide 27 text

© 2019 Ververica 27 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Multiple functions send event to each other Arbitrary addressing, no restriction to DAG

Slide 28

Slide 28 text

© 2019 Ververica 28 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Functions have locally embedded state

Slide 29

Slide 29 text

© 2019 Ververica 29 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state State and messaging are consistent with exactly-once semantics

Slide 30

Slide 30 text

© 2019 Ververica 30 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state No database required All persistence goes directly to blob storage

Slide 31

Slide 31 text

© 2019 Ververica 31 Stateful Functions f(a,b) f(a,b) f(a,b) f(a,b) f(a,b) mass storage (S3, GCF, ECS, HDFS, …) event ingress event egress f(a,b) snapshot state Event egresses to respond via event streams

Slide 32

Slide 32 text

© 2019 Ververica 32 Logical/Virtual Instances A F C memory secondary storage Shard 1 G H I B function virtual instance Shard 2 D E K L M N

Slide 33

Slide 33 text

© 2019 Ververica 33 Logical/Virtual Instances A F C Shard 1 G H I B Shard 2 D E K L M N message to "K" load "K" possibly evict other K.invoke(message)

Slide 34

Slide 34 text

© 2019 Ververica 34 Apache Flink is the State and Event Streaming Fabric Ingress & Router Function Dispatcher Ingress & Router Function Dispatcher Feedback Operator Feedback Operator Egress Egress (keyBy) (keyBy) (side output) (loop) Apache Flink Dataflow Graph Conceptual Dataflow Ingress/ Router Functions Ingress/ Router Functions Egress Egress

Slide 35

Slide 35 text

© 2019 Ververica 35 Running Stateful Functions on Apache Flink Exactly-once checkpointing for streaming loops Function Dispatcher Feedback Operator loop feedback

Slide 36

Slide 36 text

© 2019 Ververica 36 Example: Ride Sharing App Driver status updates Passenger ride requests Ride status update Driver Ride Pass- enger Geo- index update create bill Inform / book bid lookup update cell seeking confirmed riding free bidding booked

Slide 37

Slide 37 text

© 2019 Ververica 37 data preparation combining knowledge/information filtering, enriching, aggregating, joining events coordination, (interacting) state machines complex event/state interactions “occasional” actions or spiky loads compute-intensive or blocking Stream Processing Streaming SQL Stateful Functions F-a-a-S f(a,b) f(a,b) f(a,b) λ λ λ λ state-centric event/stream-centric stateless / compute-centric

Slide 38

Slide 38 text

© 2019 Ververica 38 Putting it all together: Ridesharing again f(a,b) f(a,b) f(a,b) λ λ λ λ FaaS render map/route image create a receipt PDF send email Stateful Functions ride life-cycle driver-to-ride matching Stream Processing traffic models demand forecast & pricing Billing Passenger updates Driver position updates Driver status updates

Slide 39

Slide 39 text

© 2019 Ververica 39 Is Stateful Functions part of Apache Flink? Fully Open Source on Ververica’s GitHub under ASL 2 Propose contribution for Apache Flink (Flink Improvement Proposal) Community discussion about project proposal Upon acceptance, handover to the Flink project Not yet, but we would like it to be!

Slide 40

Slide 40 text

© 2019 Ververica 40 The Megastars behind the Stateful Functions Project Daryl, Robert, Ufuk, Konstantin, Holger, Olivia, Markos, Enrico, Charles, Jamie G., Thomas, Greg, Jamie C., Ricky, … And a big “Thank you!” to everyone who helped and tried it out!

Slide 41

Slide 41 text

© 2019 Ververica 41 Learn more at Technical deep-dive session • https://statefun.io/ • https://github.com/ververica/stateful-functions/ • https://ververica.com/blog/

Slide 42

Slide 42 text

© 2019 Ververica Enjoy the conference!