Slide 1

Slide 1 text

1 A tale of two plugins: safely extending the Kubernetes Scheduler with WebAssembly Kensei Nakada (@sanposhiho) ↑Slide URL↑

Slide 2

Slide 2 text

2 Get this slide!

Slide 3

Slide 3 text

3 Platform Engineer at Kubernetes SIG-Scheduling approver Kubernetes contributor award 2022 winner Kensei Nakada (sanposhiho)
 Hello! Kia ora! こんにちは!

Slide 4

Slide 4 text

4 Platform Engineer at Kubernetes SIG-Scheduling approver Kubernetes contributor award 2022 winner Kensei Nakada (sanposhiho)
 ↑ Slide URL! Hello! Kia ora! こんにちは!

Slide 5

Slide 5 text

5 Agenda The scheduler extensions The wasm extension on the scheduler The wasm extension deep-dive Project status / What’s next 02 03 04 01

Slide 6

Slide 6 text

6 Pod … a group of containers, which is the smallest execution unit in Kubernetes. Node … a virtual or physical machine, where Pods run. Pod・Node

Slide 7

Slide 7 text

7 The component to literally schedule each Pod to Node. Checks many factors (resources, affinity, volume, etc) and decides the best Node for the Pod. Kubernetes scheduler

Slide 8

Slide 8 text

8 The scheduler extensions

Slide 9

Slide 9 text

9 Control your scheduling Category description Built-in scheduling constraints The scheduling constraints on Pod spec Control the scheduling per Pod. KubeSchedulerConfiguration Control the scheduling per cluster. Extend the scheduler Extender Via webhook Plugin Via your own scheduler plugin

Slide 10

Slide 10 text

10 Control your scheduling Category description Built-in scheduling constraints The scheduling constraints on Pod spec Control the scheduling per Pod. KubeSchedulerConfiguration Control the scheduling per cluster. Extend the scheduler Extender Via webhook Plugin Via your own scheduler plugin

Slide 11

Slide 11 text

11 NodeAffinity

Slide 12

Slide 12 text

12 PodAffinity

Slide 13

Slide 13 text

13 PodTopologySpread

Slide 14

Slide 14 text

14 Control your scheduling Category description Built-in scheduling constraints The scheduling constraints on Pod spec Control the scheduling per Pod. KubeSchedulerConfiguration Control the scheduling per cluster. Extend the scheduler Extender Via webhook Plugin Via your own scheduler plugin

Slide 15

Slide 15 text

15 Control your scheduling Category description Built-in scheduling constraints The scheduling constraints on Pod spec Control the scheduling per Pod. KubeSchedulerConfiguration Control the scheduling per cluster. Extend the scheduler Extender Via webhook Plugin Via your own scheduler plugin

Slide 16

Slide 16 text

16 Control your scheduling Category description Built-in scheduling constraints The scheduling constraints on Pod spec Control the scheduling per Pod. KubeSchedulerConfiguration Control the scheduling per cluster. Extend the scheduler Extender Via webhook Plugin Via your own scheduler plugin WebAssembly Via WebAssembly plugin

Slide 17

Slide 17 text

17 Unique use cases sometimes require extending your scheduler. ● The batch jobs requirements. ○ Start several Pods at the same time (coscheduling) ○ Elastic resource quota (capacityscheduling) ● You’ll find many other usecases in kubernetes-sigs/scheduler-plugins Extend your scheduler

Slide 18

Slide 18 text

18 Control your scheduling Category description Built-in scheduling constraints The scheduling constraints on Pod spec Control the scheduling per Pod. KubeSchedulerConfiguration Control the scheduling per cluster. Extend the scheduler Extender Via webhook Plugin Via your own scheduler plugin WebAssembly Via WebAssembly plugin

Slide 19

Slide 19 text

19 Webhook based extension on the scheduler. Each webhooks are called at a specific point in scheduling: ● Filter: Filter in Scheduling Framework ● Prioritize: Score in Scheduling Framework ● Preempt: PostFilter in Scheduling Framework ● Bind: Bind in Scheduling Framework Extender

Slide 20

Slide 20 text

20 ● 👍 No need to rebuild scheduler. (Just pass URL via config) ● 👍 The flexibility of implementation. ● 👎 It affects the scheduling latency very badly. ● (👎 See this for more disadvantages.) Extender

Slide 21

Slide 21 text

21 Default scheduler (with no extension)

Slide 22

Slide 22 text

22 10 times slower with one extender

Slide 23

Slide 23 text

23 Scheduling Framework: the pluggable architecture of scheduler. ● Decouple all scheduling logic from the scheduler’s core impl ● One scheduling factor = One plugin ● We can extend the scheduler by creating your own plugins. Plugin (Scheduling framework)

Slide 24

Slide 24 text

24 Scheduling Framework

Slide 25

Slide 25 text

25 Plugin interface ScorePlugin interface

Slide 26

Slide 26 text

26 Plugin interface Implement the interface

Slide 27

Slide 27 text

27 Integrate plugins into the scheduler Integrate & rebuild

Slide 28

Slide 28 text

28

Slide 29

Slide 29 text

29 ● 👍 More extension points are available. ● 👍 No overhead to call plugins. ● 👎 Cannot use it casually. (requires rebuild etc) Plugin (Scheduling framework)

Slide 30

Slide 30 text

30 ● 👍 More extension points are available. ● 👍 No overhead to call plugins. ● 👎 Cannot use it casually. (requires rebuild etc) Plugin (Scheduling framework)

Slide 31

Slide 31 text

31 ● Maintenance cost ○ Need to fork the scheduler and keep consistent with your Kubernetes version. ● The scheduler should be only one in the cluster. ○ Need to let all Pods go through a new scheduler in some ways. ○ May need to convince people managing the scheduler in your cluster. (your infra team, cloud vendor, etc) ○ May need to maintain multiple scheduling plugins owned by different teams in the scheduler. Hurdles for Plugin extension

Slide 32

Slide 32 text

32 ● Maintenance cost ○ Need to fork the scheduler and keep consistent with your Kubernetes version. ● The scheduler should be only one in the cluster. ○ Need to let all Pods go through a new scheduler in some ways. ○ May need to convince people managing the scheduler in your cluster. (your infra team, cloud vendor, etc) ○ May need to maintain multiple scheduling plugins owned by different teams in the scheduler. Can we call it a true pluggable system? 🤔 Hurdles for Plugin extension

Slide 33

Slide 33 text

33 The wasm extension on the scheduler

Slide 34

Slide 34 text

34 WebAssembly is a way to safely run code compiled in other languages. ● Wasm runtimes execute wasm guests (xxxx.wasm) ● Wasm guests import functions from host. ○ = They cannot do other things. You may hear it around the browser stuff, but… WebAssembly

Slide 35

Slide 35 text

35 WebAssembly for backend

Slide 36

Slide 36 text

36 kubernetes-sigs/kube-scheduler-wasm-extension

Slide 37

Slide 37 text

37 The wasm plugin is implemented to follow the Scheduling Framework. Using to host wasm runtime in it! How it works Wasm plugin

Slide 38

Slide 38 text

38 It’s very tough for non-wasm people to write own wasm guest to satisfy ABIs. We’re providing TinyGo SDK so that people can develop wasm plugins via a similar experience with Golang native plugins. – just need to implement interfaces. TinyGo SDK

Slide 39

Slide 39 text

39 Interfaces in SDK ScorePlugin interface

Slide 40

Slide 40 text

40 Interfaces in SDK Implement the interface & compile it to wasm

Slide 41

Slide 41 text

41 The wasm plugins are portable, easy to distribute from the communities, and easy to apply to your scheduler. Add your wasm plugin

Slide 42

Slide 42 text

42 The wasm plugins are portable, easy to distribute from the communities, and easy to apply to your scheduler. Add your wasm plugin Wow, it can use http(s)!

Slide 43

Slide 43 text

43 It's a balanced solution between the extender and golang plugin. ● 👍 All extension points are available. ● 👍 No need to change the scheduler’s code, no need to rebuild! ● 👍 Easy to distribute plugins. (via http(s)) ● 👍 It can be written in many languages. ● 👎 A bad impact on the latency. ● 👎 Wasm peculiar limitations. So… will wasm extension replace all plugins?

Slide 44

Slide 44 text

44 It's a balanced solution between the extender and golang plugin. ● 👍 All extension points are available. ● 👍 No need to change the scheduler’s code, no need to rebuild! ● 👍 Easy to distribute plugins. (via http(s)) ● 👍 It can be written in many languages. ● 👎 A bad impact on the latency. ● 👎 Wasm peculiar limitations. So… will wasm extension replace all plugins?

Slide 45

Slide 45 text

45 Default scheduler (with no extension)

Slide 46

Slide 46 text

46 Twice slower with one wasm ext

Slide 47

Slide 47 text

47 🤯 10 times slower with one extender

Slide 48

Slide 48 text

48 Golang plugin can still be the best if… ● The scheduler’s latency is super critical in your cluster. ○ The bigger cluster you get, the faster scheduling is needed. ● You need to do heavy-calculation or handle tons of various objects. ○ Due to inlined GC and the latency to pass objects from host to guest. (both will be discussed in later section) So… will wasm extension replace all plugins?

Slide 49

Slide 49 text

49 Deep dive into the wasm extension

Slide 50

Slide 50 text

50 The wasm plugin is implemented to follow the Scheduling Framework. Using to host wasm runtime in it! How it works Wasm plugin

Slide 51

Slide 51 text

51 Wasm Go plugin Wasm Guest

Slide 52

Slide 52 text

52 Contracts of how the host and the guest communicates. Just like API, but B instead of P (Application Binary Interface) ABI

Slide 53

Slide 53 text

53 WebAssembly’s sandbox model and its limitation: ● The guest can only operate their memory. ● The guest memory is exported to host so that host can read or write anything. ● Only numeric types are supported. The wasm function can only operate their memory

Slide 54

Slide 54 text

54 Example ABI to get URI from host. (http-wasm.io) ● Guest: allocates enough memory for URI and gives the linear memory offset and maximum length in bytes. ● Host: put the URI there and tell guest the length of it. How to pass things from host

Slide 55

Slide 55 text

55 How to pass things from host

Slide 56

Slide 56 text

56 One function to fetch protobuf encoded Pod, Node, etc. For the performance concern, we do ● Lazy loading ● Cache ● Faster garbage collection ● (Other future planned enhancements) Protobuf encoding

Slide 57

Slide 57 text

57 Only get objects from host when actually accessing it. Lazy loading Pod is fetched. But NodeList isn’t.

Slide 58

Slide 58 text

58 The scheduler refers the same resource status in one scheduling cycle. → Why don’t we have a cache in wasm guest! We reduce the number of communication as much as possible with them: ● Fetch objects from host only when it’s necessary for the guest. ● Fetch the same object from host only once during one scheduling cycle. Lazy loading with caching

Slide 59

Slide 59 text

59 ● Wasm has only one thread and GC is inlined. ● Inlined GC overhead was over half the latency of a plugin execution 😢 Garbage collection overhead

Slide 60

Slide 60 text

60 wasilibs/nottinygc requires some flags, but performs much better. We saw around 50% latency reduction in some scenarios: nottinygc

Slide 61

Slide 61 text

61 The performance is the critical factor for the project. We have two levels of benchmark test: ● Plugin-level benchmark tests, using the Golang benchmark test feature. ○ How much it takes time in which part. ● The scheduler_perf, running the wasm plugin in the scheduler and observing the scheduler’s metrics. ○ How much actually the wasm slows down the scheduling. Benchmark tests

Slide 62

Slide 62 text

62 Project status / What’s next

Slide 63

Slide 63 text

63 It’s an early stage, but it’s already have an enough functionality to write a simple wasm plugin! See examples if you’re interested in! Project status

Slide 64

Slide 64 text

64 ● Support all extension points. ● Get resources other than Pods and Nodes. ● Further performance improvement. ● Other language examples for guest. What’s left to do

Slide 65

Slide 65 text

65 ● Join us #sig-scheduling on Kubernetes slack! ● Shout out to all contributors so far, especially Adrian for tons of contributions, and all people, especially Chris for all helps to bring me here, Wellington 󰐜. We’re running through many things in 30 min 🏃💨, Thanks all! That’s all!

Slide 66

Slide 66 text

66 Unused slides 😅 Just leave it in case someone is interested in.

Slide 67

Slide 67 text

67 How is the scheduler working

Slide 68

Slide 68 text

68 Pod is created → started on Node Pod is created and the scheduler notices it.

Slide 69

Slide 69 text

69 Pod is created → started on Node The scheduler decides where to go. (= Scheduling) Node A

Slide 70

Slide 70 text

70 Pod is created → started on Node Node A

Slide 71

Slide 71 text

71 Pod is created → started on Node

Slide 72

Slide 72 text

72 Kubernetes scheduler 101

Slide 73

Slide 73 text

73 The scheduler needs to consider many things: ● Resource request on Pod. ● Affinity requirement on Pod. (PodAffinity, NodeAffinity) ● How Pods are spread into each domain now. (PodTopologySpread) ● Taints on Nodes / Tolerations on Pod. ● …etc Scheduling factors

Slide 74

Slide 74 text

74 The architecture inside the scheduler ● The scheduler is composed of many Plugins. ○ One scheduling factor = one plugin (NodeAffinity plugin, etc) ● Each plugin is created to work on one or more extension points. ○ Filter: filtering Nodes that don’t fit the requirements ○ Score: scoring the remaining Nodes ○ … Scheduling Framework

Slide 75

Slide 75 text

75 Scheduling Framework

Slide 76

Slide 76 text

76 Scheduling Framework

Slide 77

Slide 77 text

77 FilterA FilterB ScoreA ScoreB Node1 Node2 Node3 Node4 Filter ext Score ext

Slide 78

Slide 78 text

78 Role: Filtering out Nodes that shouldn’t/cannot run the Pod For example.. ● Nodes that don’t have enough resource to run the Pod ● Nodes that don’t match with a required NodeAffinity on the Pod ● … Scheduling Framework - Filter

Slide 79

Slide 79 text

79 FilterA FilterB ScoreA ScoreB Node1 Node2 Node3 Node4 ×
 ×
 Filter ext Score ext

Slide 80

Slide 80 text

80 Role: Scoring each Node that passed all Filter plugins For example, give higher scores to Nodes ● already have the container image(s) of the Pod ● match with a preferred NodeAffinity on Pod ● … Scheduling Framework - Score

Slide 81

Slide 81 text

81 FilterA FilterB ScoreA ScoreB Node1 Node2 Node3 Node4 Filter ext Score ext 50 70 40 20 ×
 ×


Slide 82

Slide 82 text

82 The cluster level scheduling configuration: ● Disable/Enable plugins in the scheduler ● Change plugins’ behaviors of all scheduling KubeScheduler Configuration

Slide 83

Slide 83 text

83 KubeScheduler Configuration disable/enable plugins

Slide 84

Slide 84 text

84 Change plugin’s behavior

Slide 85

Slide 85 text

85 We explored Golang standard package named plugin, but gave up. You can see our investigation and discussion here: ● https://github.com/kubernetes/kubernetes/issues/106705 ● https://github.com/kubernetes/kubernetes/issues/100723 Several attempts to make it easier

Slide 86

Slide 86 text

86 People start to use Wasm to provide the extendability – load and run the wasm binary in the host. In Golang, there are already some: - Dapr Wasm middleware - Trivy Modules - knqyf263/go-plugin Powered by WebAssembly with Golang

Slide 87

Slide 87 text

87 ● Lack of exported functions ● Performance problem See this for more information. Why not normal Go GOOS=wasip1 GOARCH=wasm?