Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Danian: tail latency reduction of networking application through an O(1) scheduler

Gustavo Pantuza
September 01, 2021

Danian: tail latency reduction of networking application through an O(1) scheduler

Paper presented at: ISCC 2021

Abstract:
Core allocation for application threads is a problem of reasonable complexity and computational cost inside Unix systems. Caladan scheduler is a solution aiming to reduce the cost of how threads and cores are allocated in microsecond scale. Danian system optimizes through memoization the thread picking algorithm that picks the best thread for a given core. Such improvements have direct impact on applications distributed across networks on a data center. Thread picking operation cost dropped from O(n) to O(1), the CPU time reduced 7%, the tail latency reduced 3% on Caladan Synthetic experiment and 5% on the Netperf experiment.

Key words:
Real Time Communication Services,
Distributed Systems Architecture and Management,
Optimization and Management,
Network Reliability,
Network Design

Gustavo Pantuza

September 01, 2021
Tweet

More Decks by Gustavo Pantuza

Other Decks in Research

Transcript

  1. Danian: Tail latency reduction of networking application through an O(1)

    scheduler Gustavo Pantuza, Lucas A. C. Bleme, Marcos Augusto M. Vieira, Luiz Filipe M. Vieira 26th IEEE Symposium on Computers and Communications Athens, Greece, September 5-8, 2021 IEEE ISCC 2021
  2. Agenda ▪ Introduction ▪ Thread scheduling ▪ Caladan ▪ Danian

    ▪ Experiments ▪ Results ▪ Future work ▪ Conclusion
  3. Introduction ▪ Tail at scale (2013) ▪ Shenango (2019) ▪

    Caladan (2020) ▪ Danian (2021) p50 p95 p99 1ms 5ms 10ms Hypothetical Example
  4. Thread scheduling ▪ Lottery (1994) ▪ Scheduler Activation (1991) ▪

    Caladan (2020)
  5. Caladan ▪ Schedule threads into CPUs ▪ Run on top

    of DPDK ▪ Reads control signals every 5 μs ▪ Implemented inside Shenango
  6. Caladan Simplified version of Caladan architecture inspired by the Caladan

    original paper architecture description
  7. Danian “In the 5000 years between the events of the

    Arrakis Revolt and the time the Lost Ones returned from The Scattering, Caladan's name was shortened to Dan, and all things pertaining to Dan were known as Danian.” Fonte: https://dune.fandom.com/wiki/Caladan
  8. Danian Fonte: https://dune.fandom.com/wiki/Caladan ▪ Works inside Caladan ksched ▪ Adds

    a memoization array ▪ Intercepts threads join/leave ▪ Algorithm to assign CPU→thread ▪ O(n) → O(1)
  9. Danian struct proc { pid_t pid; ... struct thread *last_run[NCPU];

    ... }
  10. Danian static struct thread * sched_pick_last_kthread(struct proc *p, unsigned int

    core) { struct thread *th; th = p->last_run[core]; if (!th->active) { return th; } return list_tail(&p->idle_threads, struct thread, idle_link); }
  11. Experiments ▪ CloudLab ▪ Client/Server ▪ Latency percentiles ▪ Varying

    number of threads ▪ CPU usage ▪ Netperf
  12. Experiments

  13. Results

  14. Results

  15. Results

  16. Results

  17. Results

  18. Results

  19. Future work Fonte: https://dune.fandom.com/wiki/Caladan ▪ NFVs with lthreads inside Caladan

    ▪ SDN using Caladan as control plane
  20. Conclusion ▪ Thread picking from O(n) to O(1) ▪ Memoization

    using LRU policy ▪ -5% on tail latency (p99) ▪ -15% CPU usage
  21. Danian: Tail latency reduction of networking application through an O(1)

    scheduler Gustavo Pantuza, Lucas A. C. Bleme, Marcos Augusto M. Vieira, Luiz Filipe M. Vieira 26th IEEE Symposium on Computers and Communications Athens, Greece, September 5-8, 2021 IEEE ISCC 2021