Slide 1

Slide 1 text

Scheduling recipes Andrea Righi

Slide 2

Slide 2 text

What is a scheduler? ● Kernel component that determines ○ Where each task needs to run ○ When each task needs to run ○ For how long each task needs to run ● Scheduler ○ CPU allocator (space) ○ Task allocator (time)

Slide 3

Slide 3 text

sched_ext: extensible scheduler class ● Technology in the Linux kernel that allows to implement scheduling policies as BPF programs (GPLv2) ● Available since Linux v6.12 ● Key features ○ Bespoke scheduling policies ○ Rapid experimentation ○ Safety (can’t crash the kernel)

Slide 4

Slide 4 text

BPF + user-space schedulers Kernel BPF User space sched_ext core BPF scheduler libbpf libbpf-rs User-space scheduler sched_ext callbacks

Slide 5

Slide 5 text

Gaming: fps consistency EEVDF scx_flash

Slide 6

Slide 6 text

Server workload: Llama benchmark ~2.6x throughput

Slide 7

Slide 7 text

Ingredients

Slide 8

Slide 8 text

Shared runqueue

Slide 9

Slide 9 text

Per-CPU runqueues

Slide 10

Slide 10 text

Runtime allocation (fairness)

Slide 11

Slide 11 text

Topology complexity NUMA node 0 L3 cache Core 0 (big) CPU 0 CPU 1 Core 1 (big) CPU 0 CPU 1 L3 cache Core 2 (LITTLE) CPU 0 CPU 1 Core 3 (LITTLE) CPU 0 CPU 1 NUMA node 1 L3 cache Core 0 (big) CPU 0 CPU 1 Core 1 (big) CPU 0 CPU 1 L3 cache Core 2 (LITTLE) CPU 0 CPU 1 Core 3 (LITTLE) CPU 0 CPU 1

Slide 12

Slide 12 text

Task wake-up CPU 0 CPU 1 Waker Wakee Cache Cache Wake up

Slide 13

Slide 13 text

Recipes

Slide 14

Slide 14 text

Recipe #1: The empty scheduler ● Empty scheduler (use sched_ext default) ○ Global run-queue ○ Round-robin scheduler (20ms time slice) ○ Built-in idle CPU selection policy

Slide 15

Slide 15 text

Recipe #2: The global vs local runqueue scheduler ● Round robin scheduler that can use either a single global queue or multiple per-CPU runqueues ○ Time slice scaled proportionally to the task’s priority (weight) and inversely proportional to the amount of contending tasks (fairness) ○ Perfect load balancing (global) vs bad load balancing (local) ○ Not really good if you overload the system (round-robin) ○ Bad for cache locality and scalability (global) vs good for cache locality and scalability (local)

Slide 16

Slide 16 text

Recipe #3: The “yell at your PC” scheduler ● Adjust the number of CPUs used based on the noise level around your PC ○ Show potential BPF / user space interactions ○ Good for energy consumption (if the environment is quiet) ○ Not usable in public places (i.e., library, open space office, etc.)

Slide 17

Slide 17 text

Conclusion

Slide 18

Slide 18 text

Key takeaways ● The one-size-fits-all scheduler approach is no longer sufficient ● CPU allocation can be more effective than time allocation ● Shared queues vs local queues can be relevant ● Prioritizing waker –> wakee pipelines can help improve responsiveness ● Hybrid schedulers (BPF + user space) have great potential

Slide 19

Slide 19 text

References ● Demo schedulers: https://github.com/sched-ext/scx/tree/kr-demo

Slide 20

Slide 20 text

Questions