Kubernetes-like Reconciliation Protocol for Managed Flink Services

Kubernetes-like Reconciliation Protocol for Managed Flink Services Sharon Xie，Flink Babysitter
Founding Engineer @ Decodable

Our Journey to Automate Babysitting Flink

Agenda • The declarative UX for managed Flink services •
Step by Step Implementation • Q&A

Wishes for Managed Flink Services

Declarative VS Imperative Declarative (What) • I want a chocolate
cake to feed 10 people Imperative (How) • Drive to store; • Buy eggs, cocoa powder, butter, flour; • Drive home; • Preheat Oven; • Mix Ingredients; • Place in a baking tray…

Platform for Managed Flink Services

Challenges • Network is unreliable • Arbitrary network latency •
Software and hardware can fail • Flink jobs are sensitive to external changes

Kubernetes Reconciliation Protocol Step 1: Get Target & Actual State
Step 2: Reconcile If (Target State != Actual State) { // FIX IT }

V0: Get target and actual state

V0 Result ✅ Basic declarative API ✅ Single Source of
Truth Store ◦ Can always recreate the service based off the DB ❌ Can’t update Flink jobs ❌ No reconciliation

V1: Update the Flink cluster when the pipeline is updated

Flink Controller • Debezium ◦ Gets notified when the target
state changes • FlinkDeployer ◦ Take actions based on the job specification • StateWatch ◦ Implements K8S Watch API ◦ Listens to Flink state change and update DB with actual state

Flink Controller • Stateless ◦ Debezium does a full table
scan when the service starts • Idempotent ◦ FlinkDeployer can issue the same commands (create/delete a cluster) without changing the result

V1 Result ✅ Flink clusters match the target states if
no errors ❌ Lack error handling ◦ Flink job creation/deletion can fail ❌ StateWatch can miss events ◦ When a Flink cluster is deleted during the service restart/downtime

V2: Reconcile if actual state doesn’t match target

✅ Reconciler runs a scheduled task for eventual consistency •
Any transient network issues can be recovered • Missing StateWatch events can be reconciled ❌ No auto healing for Flink runtime issues ❌ No auto scaling for workload changes V2 Result

V3: Auto Healing and Scaling

Auto Controller • Stop jobs that are unrecoverable ◦ Eg:
external system issues • Rules engine to auto fix issues ◦ When Flink RPC times out, increase akka.ask.timeout • Scale up/down based on the metrics ◦ Eg: Lag is going up for an extended period of time, scale up with a larger machine

Event Order Challenges

Solution • Version: monotonically increasing with every successful update from
the API

✅ Fully managed Flink service with continuous reconciliation

One More Thing…

• Control plane must be able to authenticate the data
planes • Network communication should be encrypted BYOC - Bring Your Own Cloud

V4: Support BYOC

• Bidirectional gRPC channel over mTLS ◦ 🔒Encryption ◦ 󰠖Authentication
• DB access lives in the control plane • Data plane continues processing data in the case of a prolonged network partition BYOC

Other Benefits • Resource Efficiency ◦ A control plane can
manage multiple data planes • Can relocate data planes to different k8s clusters for ◦ Disaster recovery ◦ Better resource utilization

Summary • Users ❤ Declarative APIs • Continuous reconciliation makes
distributed error handling easier with eventual consistency • Control / Data plane separation enables a more flexible architecture

Kubernetes-like Reconciliation Protocol for Managed Flink Services Q&A @sharon_rxie

Kubernetes-like Reconciliation Protocol for Man...

Kubernetes-like Reconciliation Protocol for Managed Flink Services

Sharon Xie

More Decks by Sharon Xie

Other Decks in Technology

Featured

Transcript