Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

VEE 2019

3c332dfc0b438785cb10c5234652dd66?s=128

Yuhao Zhu

April 14, 2019
Tweet

Transcript

  1. 1.

    Wenzhi Cui Daniel Richins Yuhao Zhu Vijay Janapa Reddi Tail

    Latency in Node.js: Energy Efficient Turbo Boosting for Long Tail Requests in JavaScript
  2. 2.

    2

  3. 6.

    5 Connecting Things (2020s) 50 Billion Devices “The Internet of

    Things” — Cisco
 www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
  4. 15.

    8 Application
 Tasks Event Queue Head Tail fs.readFile(‘input.txt’,
 function (err,

    data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  5. 16.

    8 Single-threaded event loop Application
 Tasks Event Queue Head Tail

    fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  6. 17.

    8 Single-threaded event loop Application
 Tasks Event Queue Head Tail

    fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  7. 18.

    8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  8. 19.

    8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  9. 20.

    8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  10. 21.

    8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  11. 27.

    12

  12. 29.

    Experimental Setup 14 (1 Gbps Network) Intel i7-4790K, 4 physical

    cores with hyper- threading, 32 GB DRAM, 240GB SSD Wrk2: A customized Load Testing Tool, simulate real-world workloads
  13. 30.

    Applications 15 Benchmarks I/O Type #Requests Description Etherpad lite N/A

    20K Real time word processor. Todo Redis 40K Online Task Manager. Lighter Disk 40K Blogging Engine. Let’s Chat MongoDB 10K Web-based Chat Application. Client Manager MongoDB 40K Online Address book. Github Repo: https://github.com/nodebenchmark/benchmarks
  14. 31.

    Etherpad: An Example 16 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms)
  15. 32.

    Etherpad: An Example 17 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%)
  16. 33.

    Etherpad: An Example 18 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%)
  17. 34.

    Etherpad: An Example 19 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%) (36.91, 99.9%)
  18. 35.

    Etherpad: An Example 19 Tail Region 1.0 0.8 0.6 0.4

    0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%) (36.91, 99.9%)
  19. 36.

    Tail Region Tail Region Tail Region Tail Region 20 1.0

    0.8 0.6 0.4 0.2 0.0 CDF 3.0 2.0 1.0 Latency (ms) (0.47, 50%) (1.12, 90%) (1.80, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 16 12 8 4 Latency (ms) (0.81, 50%) (1.40, 90%) (8.75, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 40 30 20 10 Latency (ms) (12.52, 50%) (20.30, 90%) (39.54, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 20 15 10 5 Latency (ms) (1.65, 50%) (2.66, 90%) (12.99, 99.9%) Todo Lighter Client Manager Let’s Chat
  20. 37.

    Tail Region Tail Region Tail Region Tail Region 20 1.0

    0.8 0.6 0.4 0.2 0.0 CDF 3.0 2.0 1.0 Latency (ms) (0.47, 50%) (1.12, 90%) (1.80, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 16 12 8 4 Latency (ms) (0.81, 50%) (1.40, 90%) (8.75, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 40 30 20 10 Latency (ms) (12.52, 50%) (20.30, 90%) (39.54, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 20 15 10 5 Latency (ms) (1.65, 50%) (2.66, 90%) (12.99, 99.9%) Todo Lighter Client Manager Let’s Chat Tail latency (99.9%) is 9.1x longer
 than median request latency
  21. 39.

    System Overview 21 Step 1 Step 2 Step 3 Tools

    to Root- cause Tail in Node.js
  22. 40.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Step 1 Step 2 Step 3
  23. 41.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Step 1 Step 2 Step 3 Root-causing Tail in Node.js
  24. 42.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Step 1 Step 2 Step 3
  25. 43.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Step 1 Step 2 Step 3 Mitigating Tail in Node.js
  26. 44.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3
  27. 45.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3 Static Instrumentation
  28. 46.

    System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3 Static Instrumentation Dynamic Analysis & Optimization
  29. 47.

    Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  30. 48.

    Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  31. 49.

    Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  32. 50.

    Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  33. 51.

    Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); }); Source readdir readFile1 readFile2 Sink readFile3 readFileN
  34. 53.

    Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2

    Event Critical Path Event-Dependency Graph (EDG)
  35. 54.

    Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2

    Event Critical Path Event-Dependency Graph (EDG)
  36. 55.

    Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2

    Event Critical Path Event-Dependency Graph (EDG) How do we obtain the latency of each event?
  37. 59.

    Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O Node.js Runtime

    User Code GC Interrupt IC Miss JIT Engine Compute Time
  38. 60.

    Offline Instrumentation + Online Analysis 25 Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … JavaScript runtime (V8) File/Net JS libraries Instrument the Node.js runtime so that at runtime we could easily obtain: EDG & event latency info.
  39. 61.

    Offline Instrumentation + Online Analysis 25 Exec Queue I/O Req1

    Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … JavaScript runtime (V8) File/Net JS libraries Instrument the Node.js runtime so that at runtime we could easily obtain: EDG & event latency info. Identify root-causes of long tails at runtime.
  40. 62.

    Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail
  41. 63.

    Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail
  42. 64.

    Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail
  43. 65.

    Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail Etherpad: Queue and Exec. latency increases in tails, and
 I/O latency is not dominant for this particular application.
  44. 66.

    EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency

    (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail
  45. 67.

    EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency

    (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail Client Manager: Queue and Exec. latency dominate in tails,
 but unlike Etherpad I/O plays a notable role in the requests.
  46. 68.

    EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency

    (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail Client Manager: Queue and Exec. latency dominate in tails,
 but unlike Etherpad I/O plays a notable role in the requests. On average, queuing and native code execution time contribute to ~80% of the tail latencies.
  47. 75.

    Breakdown Within Compute 28 Compute Time We should focus optimization

    efforts on Garbage Collection and Generated Native Code
  48. 76.

    Step 3: Tail Latency Optimization ▸ Leveraging the turbo boosting

    capability of modern CPUs ▸ Key: wisely choose what to boost ▸ GC Boosting ▹ Boost GC ▸ Queue Boosting ▹ Boost when the system is “busy” ▹ Use event queue stats as “hints” 29 Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue (Intel Turbo Boosting)
  49. 78.

    Optimization 1: VM Optimization (GC Boost) ▸ Observations: ▹ GCs

    are infrequent, little overall energy overhead ▹ IPC during GC is relatively high: ~1.3
  50. 79.

    Optimization 1: VM Optimization (GC Boost) ▸ Observations: ▹ GCs

    are infrequent, little overall energy overhead ▹ IPC during GC is relatively high: ~1.3 ▸ Implementation ▹ User-space DVFS embedded in Google V8: enter boosting at GC prologues and exit boosting at GC epilogues ▹ More benefits if we have access to fine-grained per-core DVFS mechanism
  51. 81.

    Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” … Queue Monitor DVFS
  52. 82.

    Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” ▸ How do you detect that? … Queue Monitor DVFS
  53. 83.

    Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” ▸ How do you detect that? ▸ Rely on two queue-related heuristics: ▹ # of events in the queue ▹ Processing time of the head-of-line event … Queue Monitor DVFS
  54. 84.

    Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” ▸ How do you detect that? ▸ Rely on two queue-related heuristics: ▹ # of events in the queue ▹ Processing time of the head-of-line event … Queue Monitor DVFS ▸ Implementation: ▸ Periodic Sampling: Every 1 ms ▸ Dynamic Thresholding ▹ Sample the average value of event number and per event processing time ▹ Amplify the average value to decide a dynamic threshold by 2-3x
  55. 85.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost 32
  56. 86.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: 32
  57. 87.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost 32
  58. 88.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning 32
  59. 89.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost 32
  60. 90.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost 32
  61. 91.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline 32
  62. 92.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz 32
  63. 93.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] 32
  64. 94.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32
  65. 95.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  66. 96.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  67. 97.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  68. 98.

    Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  69. 99.

    4 1 8 5 2 9 32 28 24 20

    16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat
  70. 100.

    4 1 8 5 2 9 32 28 24 20

    16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat
  71. 101.

    4 1 8 5 2 9 32 28 24 20

    16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat Pareto-dominate existing solutions; 14-21% tail reduction with only 3-14% energy overhead over baseline.
  72. 102.

    Conclusions Node.js uniquely combines event-driven programming model and managed language

    runtime, presenting new landscape and challenges to tail latency optimizations. 34
  73. 103.

    Conclusions Node.js uniquely combines event-driven programming model and managed language

    runtime, presenting new landscape and challenges to tail latency optimizations. 34 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) Event-dependency graph (EDG) and event- critical path (ECP) critical to deconstruct tail latency in Node.js.
  74. 104.

    Conclusions Node.js uniquely combines event-driven programming model and managed language

    runtime, presenting new landscape and challenges to tail latency optimizations. 34 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) Event-dependency graph (EDG) and event- critical path (ECP) critical to deconstruct tail latency in Node.js. Tail Latency Optimization Queue Boosting VM Boosting VM Event Queue Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Intelligently leverage existing hardware features, turbo boosting in particular, to reduce latency with little to none energy overhead.