News From Flink's Engine Room: "Full Steam Ahead"

© 2020 Ververica News from Flink’s engine room: “Full steam
ahead” Till Rohrmann @stsﬀap

© 2020 Ververica Recap: Batch & Streaming Uniﬁcation One engine
to rule them all • Batch is just a bounded stream! 3

© 2020 Ververica Unbounded Stream Processing Processing Data as It
Arrives 4 older more recent Watermarks Sources

© 2020 Ververica Bounded Stream Processing Having All Data Available
5 older more recent Watermarks Sources

© 2020 Ververica How to Process Bounded Streams Fast? •
All data is available at start time ─ Massively parallel out-of-order ingestion ─ Latency not very important → eﬃcient batching of records ─ Optimized operators ─ Results are ready at the end → no watermarks, no incremental results ─ Job can be executed in stages Boundedness Allows Diﬀerent Execution Strategies 6

© 2020 Ververica Recap: Faster Failover for Bounded Streams •
Avoid redundant work due to failovers • Separating topology into pipelined regions • Store results produced by each pipelined regions • Resume computation from latest available result FLIP-1: Fine Grained Recovery 7 Src Map Sink Operator Result

© 2020 Ververica What’s The Point? • TPC-H Query 3
• Exponential failure rate Beneﬁts of FLIP-1 8 SELECT l_orderkey, SUM(l_extendedprice*(1-l_discount)) AS revenue, o_orderdate, o_shippriority FROM customer, orders, lineitem WHERE c_mktsegment = '[SEGMENT]' AND c_custkey = o_custkey AND l_orderkey = o_orderkey AND o_orderdate < date '[DATE]' AND l_shipdate > date '[DATE]' GROUP BY l_orderkey, o_orderdate, o_shippriority;

© 2020 Ververica How to Beneﬁt From FLIP-1? • FLIP-1
introduced with Flink 1.9 • FLIP-1 is used when using the Blink Table Planner • DataSet jobs use pipelined mode by default → FLIP-1 won’t have any eﬀect until changing the ExecutionMode ─ ExecutionConfig.setExecutionMode(ExecutionMode.BATCH) ─ ExecutionConfig.setExecutionMode(ExecutionMode.BATCH_FORCED) It is not always on! 9

© 2020 Ververica Problems When Scheduling Bounded Streams • Lazy-from-sources
scheduling strategy ─ Task centric view ─ Schedule tasks as soon as inputs are ready Flink’s Old Scheduler 10 SELECT customerId, name FROM customers, orders WHERE customerId = orderCustomerId Csts Ords Join Blocking Pipelined Tasks to schedule: Csts, Ords, Join #Available slots: 1 Scheduling Order 1: 1. Ords 2. ? Scheduling Order 2: 1. Csts 2. Ords 3. Join

© 2020 Ververica Pipelined Regions Scheduler • Scheduling units are
the pipelined region (all tasks which need to run at the same time) • Schedule pipelined regions as soon as all its inputs are ready Pipelined Region Centric View 11 Csts Ords Join Blocking Pipelined Pipelined region Pipelined region Scheduling order: 1. Csts 2. Ords + Join

© 2020 Ververica Beneﬁts of Pipelined Region Scheduler • Reliable
scheduling of bounded jobs under constrained resources ─ Guarantees to make progress as long as the largest pipelined region can be run ─ No more deadlocks due to bad scheduling decisions • Better resource utilization ─ Only schedule tasks which can actually make progress 12

© 2020 Ververica Uniﬁed Batch & Streaming Scheduling & Failover
• Pipelined regions are units for scheduling & failover • Generalizes well to streaming/unbounded workloads → Just a single large pipelined region which produces inﬁnite results ─ If single pipelined region: Pipelined region scheduling == “All at once” scheduling strategy Putting the Pieces Together 13 Pipelined region

© 2020 Ververica Changing Workloads Change is The Only Constant
15

© 2020 Ververica Elastic Streaming Pipelines Adjust to The Actual
Workload 16

© 2020 Ververica Deployment Modes Flink is Not Always in
Charge 17 • Yarn, Mesos, Kubernetes • Flink can ask for more resources • Standalone, Containerized • Resources are assigned by a third party Active deployments Oblivious deployments

© 2020 Ververica Reactive Execution Mode Reacting to Available Resources
18 Job Master Resource Manager Need ∞ resources TaskExecutor Register( ) Assign( ) TaskExecutor Register( )

© 2020 Ververica How Can Flink Declare ∞ Resources? Old
slot allocation protocol • Every task asks for its slot individually • Fails if we cannot obtain all slots ⇒ Won’t work if we want to react to available resources Declarative slot allocation protocol • Declare the amount of required resources • ResourceManager tries to fulﬁll the declared resources as good as possible • Reactive mode declares ∞ resource requirements → all slots go to the JobMaster as soon as they arrive • FLIP-138: Declarative Resource Management A New Slot Allocation Protocol 19

© 2020 Ververica How to Make Use of Changing Resources?
Old scheduler 1. Pre-determine the parallelism 2. Ask for slots 3. Execute the job Declarative scheduler 1. Declare required resources 2. Wait for resources to arrive 3. Decide on the parallelism based on available resources ⇒ Invert resource declaration and deciding on parallelism 4. Adjust parallelism if more resources arrive The Declarative Scheduler 20

© 2020 Ververica The Declarative Scheduler A Small Example 21
JobGraph Resources ExecutionGraph ∅ Required Available Used 4 0 0 Parallelism: 0

JobGraph ExecutionGraph Resources Required Available Used 4 2 2 Parallelism: 1

JobGraph ExecutionGraph Resources Required Available Used 4 4 2 ⇒ Take checkpoint and trigger job restart How to make use of the new resources? Parallelism: 1

JobGraph ExecutionGraph Resources Required Available Used 4 4 4 Parallelism: 2

© 2020 Ververica Outlook Autoscaling • User deﬁned RescalingPolicies set
target value ─ target: Ideal parallelism to run the job with • Periodically querying the RescalingPolicies for target values • Declare target resource requirements • Rely on declarative scheduler to rescale job when new resources arrive Enabling Flink to Scale an Application 25 t = 1 t = 1 ResourceM anager

target value ─ target: Ideal parallelism to run the job with • Periodically querying the RescalingPolicies for target values • Declare target resource requirements • Rely on declarative scheduler to rescale job when new resources arrive Enabling Flink to Scale an Application 26 t = 2 t = 1 ResourceM anager #Target Slots: 3

target value ─ target: Ideal parallelism to run the job with • Periodically querying the RescalingPolicies for target values • Declare target resource requirements • Rely on declarative scheduler to rescale job when new resources arrive Enabling Flink to Scale an Application 27 t = 2 t = 1 ResourceM anager Allocate 3rd slot

© 2020 Ververica User Beneﬁts • Better resource utilization under
changing workloads (no more under/over-provisioning) • Easier operations ─ Resources can be added on the ﬂy ─ Flink can better tolerate resource loss • Easier deployments ─ Application style deployments w/o running a cluster 28

© 2020 Ververica Conclusion • Unified scheduling and failover for
batch & streaming • Flink schedules and fails over batch jobs now more efficiently • Flink will soon support fully elastic streaming pipelines ─ Being able to better handle changing workloads • Reactive mode will ease operations and deployment significantly What to take home? 29

News From Flink's Engine Room: "Full Steam Ahead"

News From Flink's Engine Room: "Full Steam Ahead"

Till Rohrmann

More Decks by Till Rohrmann

Other Decks in Technology

Featured

Transcript

© 2020 Ververica News from Flink’s engine room: “Full steam

© 2020 Ververica Scheduling And Failover

© 2020 Ververica Recap: Batch & Streaming Uniﬁcation One engine

© 2020 Ververica Unbounded Stream Processing Processing Data as It

© 2020 Ververica Bounded Stream Processing Having All Data Available

© 2020 Ververica How to Process Bounded Streams Fast? •

© 2020 Ververica Recap: Faster Failover for Bounded Streams •

© 2020 Ververica What’s The Point? • TPC-H Query 3

© 2020 Ververica How to Beneﬁt From FLIP-1? • FLIP-1

© 2020 Ververica Problems When Scheduling Bounded Streams • Lazy-from-sources

© 2020 Ververica Pipelined Regions Scheduler • Scheduling units are

© 2020 Ververica Beneﬁts of Pipelined Region Scheduler • Reliable

© 2020 Ververica Uniﬁed Batch & Streaming Scheduling & Failover

© 2020 Ververica Elastic Streaming Pipelines

© 2020 Ververica Changing Workloads Change is The Only Constant

© 2020 Ververica Elastic Streaming Pipelines Adjust to The Actual

© 2020 Ververica Deployment Modes Flink is Not Always in

© 2020 Ververica Reactive Execution Mode Reacting to Available Resources

© 2020 Ververica How Can Flink Declare ∞ Resources? Old

© 2020 Ververica How to Make Use of Changing Resources?

© 2020 Ververica The Declarative Scheduler A Small Example 21

© 2020 Ververica The Declarative Scheduler A Small Example 22

© 2020 Ververica The Declarative Scheduler A Small Example 23

© 2020 Ververica The Declarative Scheduler A Small Example 24

© 2020 Ververica Outlook Autoscaling • User deﬁned RescalingPolicies set

© 2020 Ververica Outlook Autoscaling • User deﬁned RescalingPolicies set

© 2020 Ververica Outlook Autoscaling • User deﬁned RescalingPolicies set

© 2020 Ververica User Beneﬁts • Better resource utilization under

© 2020 Ververica Conclusion • Uniﬁed scheduling and failover for

© 2020 Ververica THANK YOU!

© 2020 Ververica Ververica is hiring! Write me ([email protected]) or

© 2020 Ververica QUESTION?