Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Tale of Two Plugins: Safely Extending the Kub...

sanposhiho
August 26, 2024
85

A Tale of Two Plugins: Safely Extending the Kubernetes Scheduler with WebAssembly

Kubeday Japan 2024

https://sched.co/1egpd

sanposhiho

August 26, 2024
Tweet

More Decks by sanposhiho

Transcript

  1. A Tale of Two Plugins: Safely Extending the Kubernetes Scheduler

    with WebAssembly Kensei Nakada (@sanposhiho)
  2. Hello! こんにちは ! 👋 Kensei Nakada (@sanposhiho) • Software Engineer

    @ • Kubernetes maintainer (SIG-Scheduling approver, SIG-Autoscaling) • Kubernetes contributor award 2022, 2023
  3. Image Locality Taint/Toleration Kubernetes Scheduler The control plane component that

    finds the best Node for every Pod to run on. Resource Ports NodeAffinity PodAffinity/AntiAffinity etc etc… Many factors to consider…
  4. Scheduler Plugins Each scheduling factor is implemented as a plugin.

    Image Locality Plugin TaintToleration Plugin Resource Fit Plugin NodePorts Plugin NodeAffinity Plugin Inter-Pod Affinity Plugin etc etc… Kubernetes scheduler consists of many plugins:
  5. Scheduling Framework The underlying architecture for the scheduler, which is

    pluggable and extensible. A plugin works at one or more extension points in the scheduling framework. Filter Filter out Nodes that cannot run the Pod. (Insufficient resource, unmatch with NodeAffinity, etc) Score Score Nodes and determine the best one. (Image locality, etc)
  6. Extensibility matters! • The requirements on the scheduling depends the

    use case, size, etc of the cluster. ◦ We don’t want to implement all scheduling use cases. • The scheduling is complicated; users don’t want to implement their own scheduler from scratch. ◦ The extensibilities allow users to focus on writing their custom logic, and rely on the upstream scheduler for a fundamental scheduling logic.
  7. Webhook (Extender) The scheduler has a webhook based extension called

    “Extender”. Each registered webhook is called at specific point(s) during scheduling. • No need to rebuild a scheduler to extend. • The flexibility of the implementation. • It impacts the scheduling latency very badly. • The functionality is very limited.
  8. Golang plugin (Scheduling Framework) You can implement your own plugin

    based on the Scheduling Framework. It’s designed to provide a better extensibility than webhook (extender). • More extensible than the webhook (extender) • No overhead to between the scheduler and plugins. • Requires a fork/rebuild/replacement of the scheduler.
  9. Wasm extension Evolving from Golang plugin, you can write a

    plugin compiled to wasm module. • (will be) as extensible as Golang plugin. • Less troublesome to set up. (easier distribution, no rebuild, etc) • Could be written in many language. (we only have TinyGo SDK now tho.) • It impacts the scheduling latency negatively. (but less than the extender) • Wasm sandbox limitations. NEW!!
  10. So… will the wasm extension replace all plugins? Golang plugin

    can still be the best if… • The scheduler latency is critical in your cluster. ◦ The more Pods your cluster usually gets, the faster scheduling you need. • You need to handle tons of various objects. ◦ The overhead made by the object transfer and GC would be bigger. (will be discussed later)
  11. How it works Scheduling Framework Load From: • Remote hosts.

    • Local files. Load Wasm modules into the scheduler. Wasm extension Go plugin
  12. How it works Scheduling Framework Load From: • Remote hosts.

    • Local files. Load Wasm modules into the scheduler. Wasm extension Go plugin
  13. How it works Scheduling Framework Scheduling Framework Filter(...) Filter(...) •

    Filter(...) • Score(...) • etc Forwards the function calls from the Scheduling Framework to Wasm modules. Wasm extension Go plugin
  14. How it works Scheduling Framework Scheduling Framework Filter(...) Filter(...) •

    Filter(...) • Score(...) • etc Forwards the function calls from the Scheduling Framework to Wasm modules. Application Binary Interface (ABI): The contracts between the host (scheduler) and the wasm module.
  15. How it works Scheduling Framework Scheduling Framework Pod(...), Node(...) •

    Filter(...) • Score(...) • etc The wasm module fetches the additional data from the scheduler side, as necessary. Those functions exposed from the host (scheduler) are also defined with ABI.
  16. TinyGo SDK Provide a SDK to make it easier for

    non-Wasm people to create Wasm modules. It’d be hard for people to implement Wasm module from scratch, only based on ABIs. The SDK allows people to develop wasm modules with a very similar experience with Golang plugins. – Just need to implement interfaces.
  17. TinyGo SDK Provide a SDK to make it easier for

    non-Wasm people to create Wasm modules. It’d be hard for people to implement Wasm module from scratch, only based on ABIs. The SDK allows people to develop wasm modules with a very similar experience with Golang plugins. – Just need to implement interfaces. Just implement corresponding interfaces.
  18. TinyGo SDK Why TinyGo, not Go? • Golang didn’t have

    the exported function support, which we wanted for a performant design. • …but, it’s actually coming now! We can explore Go SDK with it in the future. Issue: cmd/compile: add go:wasmexport directive #65199
  19. Implement an object transfer • Only numeric types are supported.

    • For example, we cannot define Filter(pod *v1.Pod) . • The guest can only operate their memory. • The objects cannot be passed by reference. So, how to transfer objects (Pods, Nodes, etc)? Only numeric types are supported.
  20. Implement an object transfer Put the Pod to the address

    (ptr). Don’t use more than limit. I stored the Pod to your memory. The length is xxxx. Host can read/write anything in the guest’s memory.
  21. Object transfer is costly.. • Lazy Loading: get objects only

    when need them. • Cache: don’t get the same object more than twice. Example: Pod is fetched from the host at pod.Spec() or got from cache, while NodeList is not.
  22. Scheduling Framework Wasm extension Go plugin Wasm module PreScore(...) PreScore()

    Pod() Just ask to start PreScore(). At this point, any object transfer is made yet.
  23. Scheduling Framework Wasm extension Go plugin Wasm module PreScore(...) PreScore()

    Pod() (If the Pod isn’t in the cache) Request the Pod object to the host.
  24. Garbage collection overhead Wasm has only one thread and GC

    is inlined -> inlined GC overhead was over half the latency of a plugin execution. wasilibs/nottinygc High performance GC alternative for TinyGo targeting WASI, • nottinygc is awesome; made ~50% latency reduction at plugin execution in some scenarios. • But, given the repository is archived, we cannot keep relying on it anymore.
  25. Garbage collection overhead Wasm has only one thread and GC

    is inlined -> inlined GC overhead was over half the latency of a plugin execution. -gc=leaking flag Only allocate memory, never free it. • The wasm module’s memory usage would keep growing, but wouldn’t be a problem if the wasm module is short-lived. • We tried to kill -> recreate modules every after the scheduling cycle, but didn’t get a good performance because the recreation was costly.
  26. Garbage collection overhead Wasm has only one thread and GC

    is inlined… -gc=leaking flag Only allocate memory, never free it. wasilibs/nottinygc High performance GC alternative for TinyGo targeting WASI,
  27. Benchmark The performance matters for the scheduler because the scheduler

    is only one in the cluster basically. We have two layers of benchmarking in the project. • Plugin level benchmark to see how long it takes in which part. • Scheduler level benchmark to see how much wasm’s overhead actually impacts the scheduling latency.
  28. Summary Wasm is a valuable option to consider when you

    start the extensibility from your system. But, you must consider… SDK design Make it easier for people to build wasm guest. Object transfer If needing to operate large/many objects, you need effort to reduce it. Benchmark Keep taking benchmark. The performance is always a concern.