Slide 1

Slide 1 text

Danian: Tail latency reduction of networking application through an O(1) scheduler Gustavo Pantuza, Lucas A. C. Bleme, Marcos Augusto M. Vieira, Luiz Filipe M. Vieira 26th IEEE Symposium on Computers and Communications Athens, Greece, September 5-8, 2021 IEEE ISCC 2021

Slide 2

Slide 2 text

Agenda ■ Introduction ■ Thread scheduling ■ Caladan ■ Danian ■ Experiments ■ Results ■ Future work ■ Conclusion

Slide 3

Slide 3 text

Introduction ■ Tail at scale (2013) ■ Shenango (2019) ■ Caladan (2020) ■ Danian (2021) p50 p95 p99 1ms 5ms 10ms Hypothetical Example

Slide 4

Slide 4 text

Thread scheduling ■ Lottery (1994) ■ Scheduler Activation (1991) ■ Caladan (2020)

Slide 5

Slide 5 text

Caladan ■ Schedule threads into CPUs ■ Run on top of DPDK ■ Reads control signals every 5 μs ■ Implemented inside Shenango

Slide 6

Slide 6 text

Caladan Simplified version of Caladan architecture inspired by the Caladan original paper architecture description

Slide 7

Slide 7 text

Danian “In the 5000 years between the events of the Arrakis Revolt and the time the Lost Ones returned from The Scattering, Caladan's name was shortened to Dan, and all things pertaining to Dan were known as Danian.” Fonte: https://dune.fandom.com/wiki/Caladan

Slide 8

Slide 8 text

Danian Fonte: https://dune.fandom.com/wiki/Caladan ■ Works inside Caladan ksched ■ Adds a memoization array ■ Intercepts threads join/leave ■ Algorithm to assign CPU→thread ■ O(n) → O(1)

Slide 9

Slide 9 text

Danian struct proc { pid_t pid; ... struct thread *last_run[NCPU]; ... }

Slide 10

Slide 10 text

Danian static struct thread * sched_pick_last_kthread(struct proc *p, unsigned int core) { struct thread *th; th = p->last_run[core]; if (!th->active) { return th; } return list_tail(&p->idle_threads, struct thread, idle_link); }

Slide 11

Slide 11 text

Experiments ■ CloudLab ■ Client/Server ■ Latency percentiles ■ Varying number of threads ■ CPU usage ■ Netperf

Slide 12

Slide 12 text

Experiments

Slide 13

Slide 13 text

Results

Slide 14

Slide 14 text

Results

Slide 15

Slide 15 text

Results

Slide 16

Slide 16 text

Results

Slide 17

Slide 17 text

Results

Slide 18

Slide 18 text

Results

Slide 19

Slide 19 text

Future work Fonte: https://dune.fandom.com/wiki/Caladan ■ NFVs with lthreads inside Caladan ■ SDN using Caladan as control plane

Slide 20

Slide 20 text

Conclusion ■ Thread picking from O(n) to O(1) ■ Memoization using LRU policy ■ -5% on tail latency (p99) ■ -15% CPU usage

Slide 21

Slide 21 text

Danian: Tail latency reduction of networking application through an O(1) scheduler Gustavo Pantuza, Lucas A. C. Bleme, Marcos Augusto M. Vieira, Luiz Filipe M. Vieira 26th IEEE Symposium on Computers and Communications Athens, Greece, September 5-8, 2021 IEEE ISCC 2021