Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

VEE 2019

3c332dfc0b438785cb10c5234652dd66?s=128

Yuhao Zhu

April 14, 2019
Tweet

Transcript

  1. Wenzhi Cui Daniel Richins Yuhao Zhu Vijay Janapa Reddi Tail

    Latency in Node.js: Energy Efficient Turbo Boosting for Long Tail Requests in JavaScript
  2. 2

  3. 3 Connecting People (2010s)

  4. 4 Connecting People (2010s)

  5. 5 Connecting Things (2020s)

  6. 5 Connecting Things (2020s) 50 Billion Devices “The Internet of

    Things” — Cisco
 www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
  7. 6 Thread-based Programming: Traditional Approach Client Requests

  8. 6 Thread-based Programming: Traditional Approach Client Requests Blocking I/O Client

    Response
  9. 6 Thread-based Programming: Traditional Approach Client Requests Blocking I/O Client

    Response
  10. 6 Thread-based Programming: Traditional Approach Client Requests Limited resources &

    thrashing Blocking I/O Client Response
  11. 7 Thread-based Programming [Welsh et al. ’00]

  12. 8 Event-driven Programming: Emerging Approach

  13. 8 Event Queue Head Tail Event-driven Programming: Emerging Approach

  14. 8 Application
 Tasks Event Queue Head Tail Event-driven Programming: Emerging

    Approach
  15. 8 Application
 Tasks Event Queue Head Tail fs.readFile(‘input.txt’,
 function (err,

    data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  16. 8 Single-threaded event loop Application
 Tasks Event Queue Head Tail

    fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  17. 8 Single-threaded event loop Application
 Tasks Event Queue Head Tail

    fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  18. 8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  19. 8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  20. 8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  21. 8 Single-threaded event loop DB Access File I/O Network Application


    Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach
  22. 9 Thread-based Programming Event-driven Programming [Welsh et al. ’00]

  23. The Changing Programming Language Landscape 10

  24. The Changing Programming Language Landscape 10

  25. The Changing Programming Language Landscape 10

  26. 11 Managed
 Language Event-driven
 Execution Model

  27. 12

  28. Taming Tail Latencies in Event-Driven Web Services 13 Fraction of

    Requests Request Latency 
 Tail
  29. Experimental Setup 14 (1 Gbps Network) Intel i7-4790K, 4 physical

    cores with hyper- threading, 32 GB DRAM, 240GB SSD Wrk2: A customized Load Testing Tool, simulate real-world workloads
  30. Applications 15 Benchmarks I/O Type #Requests Description Etherpad lite N/A

    20K Real time word processor. Todo Redis 40K Online Task Manager. Lighter Disk 40K Blogging Engine. Let’s Chat MongoDB 10K Web-based Chat Application. Client Manager MongoDB 40K Online Address book. Github Repo: https://github.com/nodebenchmark/benchmarks
  31. Etherpad: An Example 16 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms)
  32. Etherpad: An Example 17 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%)
  33. Etherpad: An Example 18 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%)
  34. Etherpad: An Example 19 1.0 0.8 0.6 0.4 0.2 0.0

    CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%) (36.91, 99.9%)
  35. Etherpad: An Example 19 Tail Region 1.0 0.8 0.6 0.4

    0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%) (36.91, 99.9%)
  36. Tail Region Tail Region Tail Region Tail Region 20 1.0

    0.8 0.6 0.4 0.2 0.0 CDF 3.0 2.0 1.0 Latency (ms) (0.47, 50%) (1.12, 90%) (1.80, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 16 12 8 4 Latency (ms) (0.81, 50%) (1.40, 90%) (8.75, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 40 30 20 10 Latency (ms) (12.52, 50%) (20.30, 90%) (39.54, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 20 15 10 5 Latency (ms) (1.65, 50%) (2.66, 90%) (12.99, 99.9%) Todo Lighter Client Manager Let’s Chat
  37. Tail Region Tail Region Tail Region Tail Region 20 1.0

    0.8 0.6 0.4 0.2 0.0 CDF 3.0 2.0 1.0 Latency (ms) (0.47, 50%) (1.12, 90%) (1.80, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 16 12 8 4 Latency (ms) (0.81, 50%) (1.40, 90%) (8.75, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 40 30 20 10 Latency (ms) (12.52, 50%) (20.30, 90%) (39.54, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 20 15 10 5 Latency (ms) (1.65, 50%) (2.66, 90%) (12.99, 99.9%) Todo Lighter Client Manager Let’s Chat Tail latency (99.9%) is 9.1x longer
 than median request latency
  38. System Overview 21 Step 1 Step 2 Step 3

  39. System Overview 21 Step 1 Step 2 Step 3 Tools

    to Root- cause Tail in Node.js
  40. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Step 1 Step 2 Step 3
  41. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Step 1 Step 2 Step 3 Root-causing Tail in Node.js
  42. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Step 1 Step 2 Step 3
  43. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Step 1 Step 2 Step 3 Mitigating Tail in Node.js
  44. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3
  45. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3 Static Instrumentation
  46. System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3 Static Instrumentation Dynamic Analysis & Optimization
  47. Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  48. Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  49. Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  50. Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });
  51. Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”,

    function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); }); Source readdir readFile1 readFile2 Sink readFile3 readFileN
  52. Event Dependency Graph (EDG) 23

  53. Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2

    Event Critical Path Event-Dependency Graph (EDG)
  54. Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2

    Event Critical Path Event-Dependency Graph (EDG)
  55. Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2

    Event Critical Path Event-Dependency Graph (EDG) How do we obtain the latency of each event?
  56. Deconstructing Event Latency Sever-side Latency

  57. Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O

  58. Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O Node.js Runtime

    Compute Time
  59. Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O Node.js Runtime

    User Code GC Interrupt IC Miss JIT Engine Compute Time
  60. Offline Instrumentation + Online Analysis 25 Event- driven runtime (libuv)

    JS to C++ bindings Req1 Req2 … JavaScript runtime (V8) File/Net JS libraries Instrument the Node.js runtime so that at runtime we could easily obtain: EDG & event latency info.
  61. Offline Instrumentation + Online Analysis 25 Exec Queue I/O Req1

    Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … JavaScript runtime (V8) File/Net JS libraries Instrument the Node.js runtime so that at runtime we could easily obtain: EDG & event latency info. Identify root-causes of long tails at runtime.
  62. Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail
  63. Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail
  64. Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail
  65. Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20

    10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail Etherpad: Queue and Exec. latency increases in tails, and
 I/O latency is not dominant for this particular application.
  66. EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency

    (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail
  67. EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency

    (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail Client Manager: Queue and Exec. latency dominate in tails,
 but unlike Etherpad I/O plays a notable role in the requests.
  68. EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency

    (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail Client Manager: Queue and Exec. latency dominate in tails,
 but unlike Etherpad I/O plays a notable role in the requests. On average, queuing and native code execution time contribute to ~80% of the tail latencies.
  69. Breakdown Within Compute 28 Compute Time

  70. Breakdown Within Compute 28 Compute Time

  71. Breakdown Within Compute 28 Compute Time

  72. Breakdown Within Compute 28 Compute Time

  73. Breakdown Within Compute 28 Compute Time

  74. Breakdown Within Compute 28 Compute Time

  75. Breakdown Within Compute 28 Compute Time We should focus optimization

    efforts on Garbage Collection and Generated Native Code
  76. Step 3: Tail Latency Optimization ▸ Leveraging the turbo boosting

    capability of modern CPUs ▸ Key: wisely choose what to boost ▸ GC Boosting ▹ Boost GC ▸ Queue Boosting ▹ Boost when the system is “busy” ▹ Use event queue stats as “hints” 29 Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue (Intel Turbo Boosting)
  77. Optimization 1: VM Optimization (GC Boost)

  78. Optimization 1: VM Optimization (GC Boost) ▸ Observations: ▹ GCs

    are infrequent, little overall energy overhead ▹ IPC during GC is relatively high: ~1.3
  79. Optimization 1: VM Optimization (GC Boost) ▸ Observations: ▹ GCs

    are infrequent, little overall energy overhead ▹ IPC during GC is relatively high: ~1.3 ▸ Implementation ▹ User-space DVFS embedded in Google V8: enter boosting at GC prologues and exit boosting at GC epilogues ▹ More benefits if we have access to fine-grained per-core DVFS mechanism
  80. Optimization 2: Queue Boost 31 … Queue Monitor DVFS

  81. Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” … Queue Monitor DVFS
  82. Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” ▸ How do you detect that? … Queue Monitor DVFS
  83. Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” ▸ How do you detect that? ▸ Rely on two queue-related heuristics: ▹ # of events in the queue ▹ Processing time of the head-of-line event … Queue Monitor DVFS
  84. Optimization 2: Queue Boost 31 ▸ More general compute acceleration:

    Boost when the system is “busy” ▸ How do you detect that? ▸ Rely on two queue-related heuristics: ▹ # of events in the queue ▹ Processing time of the head-of-line event … Queue Monitor DVFS ▸ Implementation: ▸ Periodic Sampling: Every 1 ms ▸ Dynamic Thresholding ▹ Sample the average value of event number and per event processing time ▹ Amplify the average value to decide a dynamic threshold by 2-3x
  85. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost 32
  86. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: 32
  87. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost 32
  88. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning 32
  89. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost 32
  90. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost 32
  91. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline 32
  92. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz 32
  93. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] 32
  94. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32
  95. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  96. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  97. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  98. Evaluation ▸ Our system: normal operating frequency at 2.6 GHz,

    boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite
  99. 4 1 8 5 2 9 32 28 24 20

    16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat
  100. 4 1 8 5 2 9 32 28 24 20

    16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat
  101. 4 1 8 5 2 9 32 28 24 20

    16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat Pareto-dominate existing solutions; 14-21% tail reduction with only 3-14% energy overhead over baseline.
  102. Conclusions Node.js uniquely combines event-driven programming model and managed language

    runtime, presenting new landscape and challenges to tail latency optimizations. 34
  103. Conclusions Node.js uniquely combines event-driven programming model and managed language

    runtime, presenting new landscape and challenges to tail latency optimizations. 34 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) Event-dependency graph (EDG) and event- critical path (ECP) critical to deconstruct tail latency in Node.js.
  104. Conclusions Node.js uniquely combines event-driven programming model and managed language

    runtime, presenting new landscape and challenges to tail latency optimizations. 34 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) Event-dependency graph (EDG) and event- critical path (ECP) critical to deconstruct tail latency in Node.js. Tail Latency Optimization Queue Boosting VM Boosting VM Event Queue Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Intelligently leverage existing hardware features, turbo boosting in particular, to reduce latency with little to none energy overhead.