Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Kubernetes-like Reconciliation Protocol for Managed Flink Services

Sharon Xie
November 13, 2023

Kubernetes-like Reconciliation Protocol for Managed Flink Services

This talk was given at Flink Forward 2023.

Want your Flink jobs to keep running without failures? Inspired by the robustness of Kubernetes, we created a managed Flink service that brings a similar experience. Users specify the desired Flink job states, and our platform ensures Flink jobs remain in that state. We embraced Kubernetes style reconciliation loops - constant monitoring, comparison of actual and desired states, and proactive actions to resolve any issues.

We've diverged from the conventional Kubernetes operator approach. Our implementation enables a single control plane to manage multiple data planes, and allows relocating Flink jobs to different Kubernetes clusters for cluster utilization and disaster recovery scenarios. With Debezium integration at its core, our reconciliation protocol guarantees efficiency and scalability.

In this talk, you will learn how we designed and implemented such a reconciliation protocol, including various reconciliation methods tailored to the unique demands of Flink.

Sharon Xie

November 13, 2023
Tweet

More Decks by Sharon Xie

Other Decks in Technology

Transcript

  1. Declarative VS Imperative Declarative (What) • I want a chocolate

    cake to feed 10 people Imperative (How) • Drive to store; • Buy eggs, cocoa powder, butter, flour; • Drive home; • Preheat Oven; • Mix Ingredients; • Place in a baking tray…
  2. Challenges • Network is unreliable • Arbitrary network latency •

    Software and hardware can fail • Flink jobs are sensitive to external changes
  3. Kubernetes Reconciliation Protocol Step 1: Get Target & Actual State

    Step 2: Reconcile If (Target State != Actual State) { // FIX IT }
  4. V0 Result ✅ Basic declarative API ✅ Single Source of

    Truth Store ◦ Can always recreate the service based off the DB ❌ Can’t update Flink jobs ❌ No reconciliation
  5. Flink Controller • Debezium ◦ Gets notified when the target

    state changes • FlinkDeployer ◦ Take actions based on the job specification • StateWatch ◦ Implements K8S Watch API ◦ Listens to Flink state change and update DB with actual state
  6. Flink Controller • Stateless ◦ Debezium does a full table

    scan when the service starts • Idempotent ◦ FlinkDeployer can issue the same commands (create/delete a cluster) without changing the result
  7. V1 Result ✅ Flink clusters match the target states if

    no errors ❌ Lack error handling ◦ Flink job creation/deletion can fail ❌ StateWatch can miss events ◦ When a Flink cluster is deleted during the service restart/downtime
  8. ✅ Reconciler runs a scheduled task for eventual consistency •

    Any transient network issues can be recovered • Missing StateWatch events can be reconciled ❌ No auto healing for Flink runtime issues ❌ No auto scaling for workload changes V2 Result
  9. Auto Controller • Stop jobs that are unrecoverable ◦ Eg:

    external system issues • Rules engine to auto fix issues ◦ When Flink RPC times out, increase akka.ask.timeout • Scale up/down based on the metrics ◦ Eg: Lag is going up for an extended period of time, scale up with a larger machine
  10. • Control plane must be able to authenticate the data

    planes • Network communication should be encrypted BYOC - Bring Your Own Cloud
  11. • Bidirectional gRPC channel over mTLS ◦ 🔒Encryption ◦ 󰠖Authentication

    • DB access lives in the control plane • Data plane continues processing data in the case of a prolonged network partition BYOC
  12. Other Benefits • Resource Efficiency ◦ A control plane can

    manage multiple data planes • Can relocate data planes to different k8s clusters for ◦ Disaster recovery ◦ Better resource utilization
  13. Summary • Users ❤ Declarative APIs • Continuous reconciliation makes

    distributed error handling easier with eventual consistency • Control / Data plane separation enables a more flexible architecture