Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Queue theory 101 (node.js edition)

Queue theory 101 (node.js edition)

Queueing Theory is perhaps one of the most important mathematical theories in systems design and analysis, yet only few engineers learn it. This talk introduces the basics of queueing theory and explores the ramifications of queue behavior on system performance and resiliency with emphasis on async and Node.js behavior

Avishai Ish-Shalom

November 15, 2021

More Decks by Avishai Ish-Shalom

Other Decks in Technology


  1. Attack of the killer queues They are everywhere! In your

    drivers, your sockets, your event loop! No one is safe
  2. • Distributions have width • Improbable results do happen •

    Aggregate effects, particular effects • A single numeric aggregate cannot capture the behavior The world is made of distributions
  3. Variability/Dispersion • How “wide” the distribution is • Various measures:

    stddev, Variance, IQD, MAD... • Distributions are infinite, our systems are not ⇒ cutoffs, timeouts • Easy to raise variation, hard to reduce it
  4. Variability effects on utilization Suppose you need to get from

    Jerusalem to Tel-Aviv: • Train takes 40 minutes • Mean delay = 5 minutes • Delay P90 = 30 minutes • Delay P99 = 60 minutes How early should you leave to be in Tel-Aviv by noon? With which SLA? How much time are you wasting in total?
  5. The curse of high variation • Utilization is limited by

    high variation • Group work latency follows high percentiles (think Map/Reduce, Fork/Join) • Customer satisfaction follows high percentiles • Disasters follows tail behavior • Failure demand (e.g. retries)
  6. Head of line blocking • When some task takes longer,

    service center is “blocked” • Other tasks in the queue are blocked by the “head of line” • A single slow task will cause a bunch of other tasks to wait ◦ Bad news for latency high percentiles
  7. Tasks should be independent, but... • Shared resources have queues

    ◦ Disks, CPUs, Thread pools, connection pools, DB locks, sockets, event loop… • Event loop phases share the same service center • Head-of-line blocking → cross task interaction ◦ Slow tasks raise latency of unrelated tasks ◦ Arrival spikes • High variance makes this worse
  8. Capacity & latency • Queue length (and latency) rise to

    infinity as utilization approaches 1 • Decent latency ⇒ over capacity • The slower the service, the higher the penalty ρ = arrival rate / service rate = utilization Q = Queue length http://queuemulator.gh.scylladb.com/
  9. Implications Infinite queues: • Memory pressure / OOM • High

    latency • Stale work Always limit queue size! Work item TTL*
  10. • 10% fluctuation at 𝜌 = 0.5 will hardly affects

    latency (~ 1.1x) • 10% fluctuation at 𝜌 = 0.9 will kill you (~ 10x latency) • Be careful when overloading resources • During peak load we must be extra careful • Highly varied load must be capped Utilization fluctuates
  11. Kingman formula • The higher the variance, the worse the

    latency/utilization curve gets • On both service rate and arrival rate • high variance ⇒ run at low utilization * Oh and btw your percentile curve is worse too Qeuemulator
  12. • High utilization → high latency ◦ Non-linear! • High

    variance → high latency • Never use unlimited queues • Interactive systems→ short queues; Batch systems → long queues • Maintain proper utilization Executive summary
  13. Node queueing summary • Event loop queues unlimited • Easy

    to overload • Blocking ⇒ high latency • Large microtasks kill QoS • await/.then()/process.nextTick() can still hog the event loop
  14. Avoid blocking the event loop; specifically, CPU heavy tasks •

    Immediate suspects: large JSONs, RegEx, SSR ◦ REDoS, JSON DoS ◦ Size limits ◦ Use async/stream friendly JSON parsers (bfj, JSONStream) ◦ Offload server side rendering (react/vue/angular) to workers • Offload heavy tasks to workers, remote processes (piscina) • Limit loops, recursion, etc. • Avoid sync functions Thou shalt not block!
  15. • Split to small microtasks • Use setImmediate to unblock

    the loop • Work will continue in check phase • Remember: Promise.then/await/process.nextTick() will requeue When in doubt, defer const yieldControl = new Promise((resolve) => setImmediate(resolve)) // Do something await yieldControl // Let other tasks run // Do more work after waking up
  16. Apply some backpressure baby If the upstream applies pressure on

    you, apply pressure backwards on the upstream! • Load needs to be controlled to avoid overload • How do we tell upstreams we’re overloaded? • Blocking semantics implicitly apply backpressure • Network protocols support this (TCP backpressure, HTTP 429, 503, etc)
  17. Backpressure? But how? • For the lazy: Limit HTTP connections

    (express, koa) ◦ TCP backpressure • Limit concurrency (promise-pool, token buckets) • Reject requests when event loop lag rises (node-toobusy) ◦ HTTP backpressure: 503, 429 • When in doubt, await
  18. TLDR • Never block the event loop • Break to

    small microtasks, defer • Event loop queueing will kill your latency • Monitor event loop lag • Do not overload. Use backpressure and load shedding • Maintain proper (low) utilization • Reduce variation wherever possible