$30 off During Our Annual Pro Sale. View Details »

A tale of two plugins: safely extending the Kubernetes Scheduler with WebAssembly

sanposhiho
September 05, 2023

A tale of two plugins: safely extending the Kubernetes Scheduler with WebAssembly

sanposhiho

September 05, 2023
Tweet

More Decks by sanposhiho

Other Decks in Programming

Transcript

  1. 1
    A tale of two plugins:
    safely extending the Kubernetes
    Scheduler with WebAssembly
    Kensei Nakada (@sanposhiho)
    ↑Slide URL↑

    View Slide

  2. 2
    Get this slide!

    View Slide

  3. 3
    Platform Engineer at
    Kubernetes SIG-Scheduling approver
    Kubernetes contributor award 2022 winner
    Kensei Nakada (sanposhiho)

    Hello! Kia ora! こんにちは!

    View Slide

  4. 4
    Platform Engineer at
    Kubernetes SIG-Scheduling approver
    Kubernetes contributor award 2022 winner
    Kensei Nakada (sanposhiho)

    ↑ Slide URL!
    Hello! Kia ora! こんにちは!

    View Slide

  5. 5
    Agenda
    The scheduler extensions
    The wasm extension on the scheduler
    The wasm extension deep-dive
    Project status / What’s next
    02
    03
    04
    01

    View Slide

  6. 6
    Pod … a group of containers, which is
    the smallest execution unit in Kubernetes.
    Node … a virtual or physical machine,
    where Pods run.
    Pod・Node

    View Slide

  7. 7
    The component to literally schedule each
    Pod to Node.
    Checks many factors (resources, affinity,
    volume, etc) and decides the best Node for
    the Pod.
    Kubernetes scheduler

    View Slide

  8. 8
    The scheduler extensions

    View Slide

  9. 9
    Control your scheduling
    Category description
    Built-in
    scheduling
    constraints
    The scheduling constraints on
    Pod spec
    Control the scheduling per Pod.
    KubeSchedulerConfiguration Control the scheduling per cluster.
    Extend the
    scheduler
    Extender Via webhook
    Plugin Via your own scheduler plugin

    View Slide

  10. 10
    Control your scheduling
    Category description
    Built-in
    scheduling
    constraints
    The scheduling constraints on
    Pod spec
    Control the scheduling per Pod.
    KubeSchedulerConfiguration Control the scheduling per cluster.
    Extend the
    scheduler
    Extender Via webhook
    Plugin Via your own scheduler plugin

    View Slide

  11. 11
    NodeAffinity

    View Slide

  12. 12
    PodAffinity

    View Slide

  13. 13
    PodTopologySpread

    View Slide

  14. 14
    Control your scheduling
    Category description
    Built-in
    scheduling
    constraints
    The scheduling constraints on
    Pod spec
    Control the scheduling per Pod.
    KubeSchedulerConfiguration Control the scheduling per cluster.
    Extend the
    scheduler
    Extender Via webhook
    Plugin Via your own scheduler plugin

    View Slide

  15. 15
    Control your scheduling
    Category description
    Built-in
    scheduling
    constraints
    The scheduling constraints on
    Pod spec
    Control the scheduling per Pod.
    KubeSchedulerConfiguration Control the scheduling per cluster.
    Extend the
    scheduler
    Extender Via webhook
    Plugin Via your own scheduler plugin

    View Slide

  16. 16
    Control your scheduling
    Category description
    Built-in
    scheduling
    constraints
    The scheduling constraints on
    Pod spec
    Control the scheduling per Pod.
    KubeSchedulerConfiguration Control the scheduling per cluster.
    Extend the
    scheduler
    Extender Via webhook
    Plugin Via your own scheduler plugin
    WebAssembly Via WebAssembly plugin

    View Slide

  17. 17
    Unique use cases sometimes require extending your scheduler.
    ● The batch jobs requirements.
    ○ Start several Pods at the same time (coscheduling)
    ○ Elastic resource quota (capacityscheduling)
    ● You’ll find many other usecases in kubernetes-sigs/scheduler-plugins
    Extend your scheduler

    View Slide

  18. 18
    Control your scheduling
    Category description
    Built-in
    scheduling
    constraints
    The scheduling constraints on
    Pod spec
    Control the scheduling per Pod.
    KubeSchedulerConfiguration Control the scheduling per cluster.
    Extend the
    scheduler
    Extender Via webhook
    Plugin Via your own scheduler plugin
    WebAssembly Via WebAssembly plugin

    View Slide

  19. 19
    Webhook based extension on the scheduler.
    Each webhooks are called at a specific point in scheduling:
    ● Filter: Filter in Scheduling Framework
    ● Prioritize: Score in Scheduling Framework
    ● Preempt: PostFilter in Scheduling Framework
    ● Bind: Bind in Scheduling Framework
    Extender

    View Slide

  20. 20
    ● 👍 No need to rebuild scheduler. (Just pass URL via config)
    ● 👍 The flexibility of implementation.
    ● 👎 It affects the scheduling latency very badly.
    ● (👎 See this for more disadvantages.)
    Extender

    View Slide

  21. 21
    Default scheduler
    (with no extension)

    View Slide

  22. 22
    10 times slower
    with one extender

    View Slide

  23. 23
    Scheduling Framework: the pluggable architecture of scheduler.
    ● Decouple all scheduling logic from the scheduler’s core impl
    ● One scheduling factor = One plugin
    ● We can extend the scheduler by creating your own plugins.
    Plugin (Scheduling framework)

    View Slide

  24. 24
    Scheduling Framework

    View Slide

  25. 25
    Plugin interface
    ScorePlugin interface

    View Slide

  26. 26
    Plugin interface
    Implement the interface

    View Slide

  27. 27
    Integrate plugins into the scheduler
    Integrate & rebuild

    View Slide

  28. 28

    View Slide

  29. 29
    ● 👍 More extension points are available.
    ● 👍 No overhead to call plugins.
    ● 👎 Cannot use it casually. (requires rebuild etc)
    Plugin (Scheduling framework)

    View Slide

  30. 30
    ● 👍 More extension points are available.
    ● 👍 No overhead to call plugins.
    ● 👎 Cannot use it casually. (requires rebuild etc)
    Plugin (Scheduling framework)

    View Slide

  31. 31
    ● Maintenance cost
    ○ Need to fork the scheduler and keep consistent with your Kubernetes version.
    ● The scheduler should be only one in the cluster.
    ○ Need to let all Pods go through a new scheduler in some ways.
    ○ May need to convince people managing the scheduler in your cluster.
    (your infra team, cloud vendor, etc)
    ○ May need to maintain multiple scheduling plugins owned by different teams in
    the scheduler.
    Hurdles for Plugin extension

    View Slide

  32. 32
    ● Maintenance cost
    ○ Need to fork the scheduler and keep consistent with your Kubernetes version.
    ● The scheduler should be only one in the cluster.
    ○ Need to let all Pods go through a new scheduler in some ways.
    ○ May need to convince people managing the scheduler in your cluster.
    (your infra team, cloud vendor, etc)
    ○ May need to maintain multiple scheduling plugins owned by different teams in
    the scheduler.
    Can we call it a true pluggable system? 🤔
    Hurdles for Plugin extension

    View Slide

  33. 33
    The wasm extension on the scheduler

    View Slide

  34. 34
    WebAssembly is a way to safely run code compiled in other languages.
    ● Wasm runtimes execute wasm guests (xxxx.wasm)
    ● Wasm guests import functions from host.
    ○ = They cannot do other things.
    You may hear it around the browser stuff, but…
    WebAssembly

    View Slide

  35. 35
    WebAssembly for backend

    View Slide

  36. 36
    kubernetes-sigs/kube-scheduler-wasm-extension

    View Slide

  37. 37
    The wasm plugin is implemented
    to follow the Scheduling
    Framework.
    Using
    to host wasm runtime in it!
    How it works
    Wasm plugin

    View Slide

  38. 38
    It’s very tough for non-wasm people
    to write own wasm guest to satisfy ABIs.
    We’re providing TinyGo SDK
    so that people can develop wasm plugins
    via a similar experience with
    Golang native plugins.
    – just need to implement interfaces.
    TinyGo SDK

    View Slide

  39. 39
    Interfaces in SDK
    ScorePlugin interface

    View Slide

  40. 40
    Interfaces in SDK
    Implement the interface
    & compile it to wasm

    View Slide

  41. 41
    The wasm plugins are portable, easy to distribute from the
    communities, and easy to apply to your scheduler.
    Add your wasm plugin

    View Slide

  42. 42
    The wasm plugins are portable, easy to distribute from the
    communities, and easy to apply to your scheduler.
    Add your wasm plugin
    Wow, it can use http(s)!

    View Slide

  43. 43
    It's a balanced solution between the extender and golang plugin.
    ● 👍 All extension points are available.
    ● 👍 No need to change the scheduler’s code, no need to rebuild!
    ● 👍 Easy to distribute plugins. (via http(s))
    ● 👍 It can be written in many languages.
    ● 👎 A bad impact on the latency.
    ● 👎 Wasm peculiar limitations.
    So… will wasm extension replace all plugins?

    View Slide

  44. 44
    It's a balanced solution between the extender and golang plugin.
    ● 👍 All extension points are available.
    ● 👍 No need to change the scheduler’s code, no need to rebuild!
    ● 👍 Easy to distribute plugins. (via http(s))
    ● 👍 It can be written in many languages.
    ● 👎 A bad impact on the latency.
    ● 👎 Wasm peculiar limitations.
    So… will wasm extension replace all plugins?

    View Slide

  45. 45
    Default scheduler
    (with no extension)

    View Slide

  46. 46
    Twice slower
    with one wasm ext

    View Slide

  47. 47
    🤯
    10 times slower
    with one extender

    View Slide

  48. 48
    Golang plugin can still be the best if…
    ● The scheduler’s latency is super critical in your cluster.
    ○ The bigger cluster you get, the faster scheduling is needed.
    ● You need to do heavy-calculation or handle tons of various objects.
    ○ Due to inlined GC and the latency to pass objects from host to guest.
    (both will be discussed in later section)
    So… will wasm extension replace all plugins?

    View Slide

  49. 49
    Deep dive into the wasm extension

    View Slide

  50. 50
    The wasm plugin is implemented
    to follow the Scheduling
    Framework.
    Using
    to host wasm runtime in it!
    How it works
    Wasm plugin

    View Slide

  51. 51
    Wasm Go plugin Wasm Guest

    View Slide

  52. 52
    Contracts of how the host and the guest communicates.
    Just like API, but B instead of P (Application Binary Interface)
    ABI

    View Slide

  53. 53
    WebAssembly’s sandbox model and its limitation:
    ● The guest can only operate their memory.
    ● The guest memory is exported to host so that host can read or
    write anything.
    ● Only numeric types are supported.
    The wasm function can only operate their memory

    View Slide

  54. 54
    Example ABI to get URI from host. (http-wasm.io)
    ● Guest: allocates enough memory for URI and gives the linear
    memory offset and maximum length in bytes.
    ● Host: put the URI there and tell guest the length of it.
    How to pass things from host

    View Slide

  55. 55
    How to pass things from host

    View Slide

  56. 56
    One function to fetch protobuf encoded Pod, Node, etc.
    For the performance concern, we do
    ● Lazy loading
    ● Cache
    ● Faster garbage collection
    ● (Other future planned enhancements)
    Protobuf encoding

    View Slide

  57. 57
    Only get objects from host when actually accessing it.
    Lazy loading
    Pod is fetched.
    But NodeList isn’t.

    View Slide

  58. 58
    The scheduler refers the same resource status in one scheduling cycle.
    → Why don’t we have a cache in wasm guest!
    We reduce the number of communication as much as possible with them:
    ● Fetch objects from host only when it’s necessary for the guest.
    ● Fetch the same object from host only once during one scheduling
    cycle.
    Lazy loading with caching

    View Slide

  59. 59
    ● Wasm has only one thread and GC is inlined.
    ● Inlined GC overhead was over half the latency of a plugin
    execution 😢
    Garbage collection overhead

    View Slide

  60. 60
    wasilibs/nottinygc requires some flags, but performs much better.
    We saw around 50% latency reduction in some scenarios:
    nottinygc

    View Slide

  61. 61
    The performance is the critical factor for the project.
    We have two levels of benchmark test:
    ● Plugin-level benchmark tests, using the Golang benchmark test
    feature.
    ○ How much it takes time in which part.
    ● The scheduler_perf, running the wasm plugin in the scheduler
    and observing the scheduler’s metrics.
    ○ How much actually the wasm slows down the scheduling.
    Benchmark tests

    View Slide

  62. 62
    Project status / What’s next

    View Slide

  63. 63
    It’s an early stage, but it’s already have an enough functionality to write
    a simple wasm plugin!
    See examples if you’re interested in!
    Project status

    View Slide

  64. 64
    ● Support all extension points.
    ● Get resources other than Pods and Nodes.
    ● Further performance improvement.
    ● Other language examples for guest.
    What’s left to do

    View Slide

  65. 65
    ● Join us #sig-scheduling on Kubernetes slack!
    ● Shout out to all contributors so far, especially Adrian for tons of
    contributions, and all people, especially Chris for all helps to bring me
    here, Wellington 󰐜.
    We’re running through many things in 30 min 🏃💨, Thanks all!
    That’s all!

    View Slide

  66. 66
    Unused slides 😅
    Just leave it in case someone is interested in.

    View Slide

  67. 67
    How is the scheduler working

    View Slide

  68. 68
    Pod is created → started on Node
    Pod is created and the scheduler
    notices it.

    View Slide

  69. 69
    Pod is created → started on Node
    The scheduler decides where to
    go. (= Scheduling)
    Node A

    View Slide

  70. 70
    Pod is created → started on Node
    Node A

    View Slide

  71. 71
    Pod is created → started on Node

    View Slide

  72. 72
    Kubernetes scheduler 101

    View Slide

  73. 73
    The scheduler needs to consider many things:
    ● Resource request on Pod.
    ● Affinity requirement on Pod. (PodAffinity, NodeAffinity)
    ● How Pods are spread into each domain now. (PodTopologySpread)
    ● Taints on Nodes / Tolerations on Pod.
    ● …etc
    Scheduling factors

    View Slide

  74. 74
    The architecture inside the scheduler
    ● The scheduler is composed of many Plugins.
    ○ One scheduling factor = one plugin (NodeAffinity plugin, etc)
    ● Each plugin is created to work on one or more extension points.
    ○ Filter: filtering Nodes that don’t fit the requirements
    ○ Score: scoring the remaining Nodes
    ○ …
    Scheduling Framework

    View Slide

  75. 75
    Scheduling Framework

    View Slide

  76. 76
    Scheduling Framework

    View Slide

  77. 77
    FilterA FilterB ScoreA ScoreB
    Node1
    Node2
    Node3
    Node4
    Filter ext Score ext

    View Slide

  78. 78
    Role: Filtering out Nodes that shouldn’t/cannot run the Pod
    For example..
    ● Nodes that don’t have enough resource to run the Pod
    ● Nodes that don’t match with a required NodeAffinity on the Pod
    ● …
    Scheduling Framework - Filter

    View Slide

  79. 79
    FilterA FilterB ScoreA ScoreB
    Node1
    Node2
    Node3
    Node4
    ×

    ×

    Filter ext Score ext

    View Slide

  80. 80
    Role: Scoring each Node that passed all Filter plugins
    For example, give higher scores to Nodes
    ● already have the container image(s) of the Pod
    ● match with a preferred NodeAffinity on Pod
    ● …
    Scheduling Framework - Score

    View Slide

  81. 81
    FilterA FilterB ScoreA ScoreB
    Node1
    Node2
    Node3
    Node4
    Filter ext Score ext
    50
    70 40
    20

    ×

    ×


    View Slide

  82. 82
    The cluster level scheduling configuration:
    ● Disable/Enable plugins in the scheduler
    ● Change plugins’ behaviors of all scheduling
    KubeScheduler Configuration

    View Slide

  83. 83
    KubeScheduler Configuration
    disable/enable plugins

    View Slide

  84. 84
    Change plugin’s behavior

    View Slide

  85. 85
    We explored Golang standard package named plugin, but gave up.
    You can see our investigation and discussion here:
    ● https://github.com/kubernetes/kubernetes/issues/106705
    ● https://github.com/kubernetes/kubernetes/issues/100723
    Several attempts to make it easier

    View Slide

  86. 86
    People start to use Wasm to provide the extendability
    – load and run the wasm binary in the host.
    In Golang, there are already some:
    - Dapr Wasm middleware
    - Trivy Modules
    - knqyf263/go-plugin
    Powered by
    WebAssembly with Golang

    View Slide

  87. 87
    ● Lack of exported functions
    ● Performance problem
    See this for more information.
    Why not normal Go GOOS=wasip1 GOARCH=wasm?

    View Slide