Slide 1

Slide 1 text

Wenzhi Cui Daniel Richins Yuhao Zhu Vijay Janapa Reddi Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Tail Requests in JavaScript

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

3 Connecting People (2010s)

Slide 4

Slide 4 text

4 Connecting People (2010s)

Slide 5

Slide 5 text

5 Connecting Things (2020s)

Slide 6

Slide 6 text

5 Connecting Things (2020s) 50 Billion Devices “The Internet of Things” — Cisco
 www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf

Slide 7

Slide 7 text

6 Thread-based Programming: Traditional Approach Client Requests

Slide 8

Slide 8 text

6 Thread-based Programming: Traditional Approach Client Requests Blocking I/O Client Response

Slide 9

Slide 9 text

6 Thread-based Programming: Traditional Approach Client Requests Blocking I/O Client Response

Slide 10

Slide 10 text

6 Thread-based Programming: Traditional Approach Client Requests Limited resources & thrashing Blocking I/O Client Response

Slide 11

Slide 11 text

7 Thread-based Programming [Welsh et al. ’00]

Slide 12

Slide 12 text

8 Event-driven Programming: Emerging Approach

Slide 13

Slide 13 text

8 Event Queue Head Tail Event-driven Programming: Emerging Approach

Slide 14

Slide 14 text

8 Application
 Tasks Event Queue Head Tail Event-driven Programming: Emerging Approach

Slide 15

Slide 15 text

8 Application
 Tasks Event Queue Head Tail fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 16

Slide 16 text

8 Single-threaded event loop Application
 Tasks Event Queue Head Tail fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 17

Slide 17 text

8 Single-threaded event loop Application
 Tasks Event Queue Head Tail fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 18

Slide 18 text

8 Single-threaded event loop DB Access File I/O Network Application
 Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 19

Slide 19 text

8 Single-threaded event loop DB Access File I/O Network Application
 Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 20

Slide 20 text

8 Single-threaded event loop DB Access File I/O Network Application
 Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 21

Slide 21 text

8 Single-threaded event loop DB Access File I/O Network Application
 Tasks Event Queue Head Tail Asynchronous I/O fs.readFile(‘input.txt’,
 function (err, data) { if (err) return console.error(err); console.log(data.toString());
 }
 ); console.log("Program continues…”); Event-driven Programming: Emerging Approach

Slide 22

Slide 22 text

9 Thread-based Programming Event-driven Programming [Welsh et al. ’00]

Slide 23

Slide 23 text

The Changing Programming Language Landscape 10

Slide 24

Slide 24 text

The Changing Programming Language Landscape 10

Slide 25

Slide 25 text

The Changing Programming Language Landscape 10

Slide 26

Slide 26 text

11 Managed
 Language Event-driven
 Execution Model

Slide 27

Slide 27 text

12

Slide 28

Slide 28 text

Taming Tail Latencies in Event-Driven Web Services 13 Fraction of Requests Request Latency 
 Tail

Slide 29

Slide 29 text

Experimental Setup 14 (1 Gbps Network) Intel i7-4790K, 4 physical cores with hyper- threading, 32 GB DRAM, 240GB SSD Wrk2: A customized Load Testing Tool, simulate real-world workloads

Slide 30

Slide 30 text

Applications 15 Benchmarks I/O Type #Requests Description Etherpad lite N/A 20K Real time word processor. Todo Redis 40K Online Task Manager. Lighter Disk 40K Blogging Engine. Let’s Chat MongoDB 10K Web-based Chat Application. Client Manager MongoDB 40K Online Address book. Github Repo: https://github.com/nodebenchmark/benchmarks

Slide 31

Slide 31 text

Etherpad: An Example 16 1.0 0.8 0.6 0.4 0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms)

Slide 32

Slide 32 text

Etherpad: An Example 17 1.0 0.8 0.6 0.4 0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%)

Slide 33

Slide 33 text

Etherpad: An Example 18 1.0 0.8 0.6 0.4 0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%)

Slide 34

Slide 34 text

Etherpad: An Example 19 1.0 0.8 0.6 0.4 0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%) (36.91, 99.9%)

Slide 35

Slide 35 text

Etherpad: An Example 19 Tail Region 1.0 0.8 0.6 0.4 0.2 0.0 CDF 60 50 40 30 20 10 Latency (ms) (1.85, 50%) (7.30, 90%) (36.91, 99.9%)

Slide 36

Slide 36 text

Tail Region Tail Region Tail Region Tail Region 20 1.0 0.8 0.6 0.4 0.2 0.0 CDF 3.0 2.0 1.0 Latency (ms) (0.47, 50%) (1.12, 90%) (1.80, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 16 12 8 4 Latency (ms) (0.81, 50%) (1.40, 90%) (8.75, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 40 30 20 10 Latency (ms) (12.52, 50%) (20.30, 90%) (39.54, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 20 15 10 5 Latency (ms) (1.65, 50%) (2.66, 90%) (12.99, 99.9%) Todo Lighter Client Manager Let’s Chat

Slide 37

Slide 37 text

Tail Region Tail Region Tail Region Tail Region 20 1.0 0.8 0.6 0.4 0.2 0.0 CDF 3.0 2.0 1.0 Latency (ms) (0.47, 50%) (1.12, 90%) (1.80, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 16 12 8 4 Latency (ms) (0.81, 50%) (1.40, 90%) (8.75, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 40 30 20 10 Latency (ms) (12.52, 50%) (20.30, 90%) (39.54, 99.9%) 1.0 0.8 0.6 0.4 0.2 0.0 CDF 20 15 10 5 Latency (ms) (1.65, 50%) (2.66, 90%) (12.99, 99.9%) Todo Lighter Client Manager Let’s Chat Tail latency (99.9%) is 9.1x longer
 than median request latency

Slide 38

Slide 38 text

System Overview 21 Step 1 Step 2 Step 3

Slide 39

Slide 39 text

System Overview 21 Step 1 Step 2 Step 3 Tools to Root- cause Tail in Node.js

Slide 40

Slide 40 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Step 1 Step 2 Step 3

Slide 41

Slide 41 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Step 1 Step 2 Step 3 Root-causing Tail in Node.js

Slide 42

Slide 42 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Step 1 Step 2 Step 3

Slide 43

Slide 43 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Step 1 Step 2 Step 3 Mitigating Tail in Node.js

Slide 44

Slide 44 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3

Slide 45

Slide 45 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3 Static Instrumentation

Slide 46

Slide 46 text

System Overview 21 Tail Latency Reconstruction Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … Event Data e1 e5 e3 e4 e2 Event Critical Path JavaScript runtime (V8) File/Net JS libraries Event-Dependency Graph (EDG) Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Step 1 Step 2 Step 3 Static Instrumentation Dynamic Analysis & Optimization

Slide 47

Slide 47 text

Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”, function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });

Slide 48

Slide 48 text

Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”, function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });

Slide 49

Slide 49 text

Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”, function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });

Slide 50

Slide 50 text

Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”, function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); });

Slide 51

Slide 51 text

Step 1: Latency Reconstruction 22 var count = N; fs.readdir(“/data”, function dir(err, files) { files.foreach(function file(f, index) { var fname = …; fs.readFile(fname, function read(err, data) { count -= 1; if (count == 0) sendResponse(); }); }); }); Source readdir readFile1 readFile2 Sink readFile3 readFileN

Slide 52

Slide 52 text

Event Dependency Graph (EDG) 23

Slide 53

Slide 53 text

Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG)

Slide 54

Slide 54 text

Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG)

Slide 55

Slide 55 text

Event Dependency Graph (EDG) 23 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) How do we obtain the latency of each event?

Slide 56

Slide 56 text

Deconstructing Event Latency Sever-side Latency

Slide 57

Slide 57 text

Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O

Slide 58

Slide 58 text

Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O Node.js Runtime Compute Time

Slide 59

Slide 59 text

Deconstructing Event Latency Sever-side Latency Queue. Exec. I/O Node.js Runtime User Code GC Interrupt IC Miss JIT Engine Compute Time

Slide 60

Slide 60 text

Offline Instrumentation + Online Analysis 25 Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … JavaScript runtime (V8) File/Net JS libraries Instrument the Node.js runtime so that at runtime we could easily obtain: EDG & event latency info.

Slide 61

Slide 61 text

Offline Instrumentation + Online Analysis 25 Exec Queue I/O Req1 Exec Queue I/O Req2 Request Latency Tail Latency Bottleneck Anaysis IO (%) Queue (%) Exec (%) GC (%) JIT (%) … Event- driven runtime (libuv) JS to C++ bindings Req1 Req2 … JavaScript runtime (V8) File/Net JS libraries Instrument the Node.js runtime so that at runtime we could easily obtain: EDG & event latency info. Identify root-causes of long tails at runtime.

Slide 62

Slide 62 text

Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20 10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail

Slide 63

Slide 63 text

Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20 10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail

Slide 64

Slide 64 text

Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20 10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail

Slide 65

Slide 65 text

Step 2: EDG-based Bottleneck Analysis 26 50 40 30 20 10 0 Latency (ms) A n B n C n D n Avg n A t B t C t D t Avg t Request Type IO Queue Exec Tail Non-tail Etherpad: Queue and Exec. latency increases in tails, and
 I/O latency is not dominant for this particular application.

Slide 66

Slide 66 text

EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail

Slide 67

Slide 67 text

EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail Client Manager: Queue and Exec. latency dominate in tails,
 but unlike Etherpad I/O plays a notable role in the requests.

Slide 68

Slide 68 text

EDG-based Bottleneck Analysis 27 16 12 8 4 0 Latency (ms) V n W n X n Avg n V t W t X t Avg t Request Type IO Queue Exec Tail Non-tail Client Manager: Queue and Exec. latency dominate in tails,
 but unlike Etherpad I/O plays a notable role in the requests. On average, queuing and native code execution time contribute to ~80% of the tail latencies.

Slide 69

Slide 69 text

Breakdown Within Compute 28 Compute Time

Slide 70

Slide 70 text

Breakdown Within Compute 28 Compute Time

Slide 71

Slide 71 text

Breakdown Within Compute 28 Compute Time

Slide 72

Slide 72 text

Breakdown Within Compute 28 Compute Time

Slide 73

Slide 73 text

Breakdown Within Compute 28 Compute Time

Slide 74

Slide 74 text

Breakdown Within Compute 28 Compute Time

Slide 75

Slide 75 text

Breakdown Within Compute 28 Compute Time We should focus optimization efforts on Garbage Collection and Generated Native Code

Slide 76

Slide 76 text

Step 3: Tail Latency Optimization ▸ Leveraging the turbo boosting capability of modern CPUs ▸ Key: wisely choose what to boost ▸ GC Boosting ▹ Boost GC ▸ Queue Boosting ▹ Boost when the system is “busy” ▹ Use event queue stats as “hints” 29 Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue (Intel Turbo Boosting)

Slide 77

Slide 77 text

Optimization 1: VM Optimization (GC Boost)

Slide 78

Slide 78 text

Optimization 1: VM Optimization (GC Boost) ▸ Observations: ▹ GCs are infrequent, little overall energy overhead ▹ IPC during GC is relatively high: ~1.3

Slide 79

Slide 79 text

Optimization 1: VM Optimization (GC Boost) ▸ Observations: ▹ GCs are infrequent, little overall energy overhead ▹ IPC during GC is relatively high: ~1.3 ▸ Implementation ▹ User-space DVFS embedded in Google V8: enter boosting at GC prologues and exit boosting at GC epilogues ▹ More benefits if we have access to fine-grained per-core DVFS mechanism

Slide 80

Slide 80 text

Optimization 2: Queue Boost 31 … Queue Monitor DVFS

Slide 81

Slide 81 text

Optimization 2: Queue Boost 31 ▸ More general compute acceleration: Boost when the system is “busy” … Queue Monitor DVFS

Slide 82

Slide 82 text

Optimization 2: Queue Boost 31 ▸ More general compute acceleration: Boost when the system is “busy” ▸ How do you detect that? … Queue Monitor DVFS

Slide 83

Slide 83 text

Optimization 2: Queue Boost 31 ▸ More general compute acceleration: Boost when the system is “busy” ▸ How do you detect that? ▸ Rely on two queue-related heuristics: ▹ # of events in the queue ▹ Processing time of the head-of-line event … Queue Monitor DVFS

Slide 84

Slide 84 text

Optimization 2: Queue Boost 31 ▸ More general compute acceleration: Boost when the system is “busy” ▸ How do you detect that? ▸ Rely on two queue-related heuristics: ▹ # of events in the queue ▹ Processing time of the head-of-line event … Queue Monitor DVFS ▸ Implementation: ▸ Periodic Sampling: Every 1 ms ▸ Dynamic Thresholding ▹ Sample the average value of event number and per event processing time ▹ Amplify the average value to decide a dynamic threshold by 2-3x

Slide 85

Slide 85 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost 32

Slide 86

Slide 86 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: 32

Slide 87

Slide 87 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost 32

Slide 88

Slide 88 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning 32

Slide 89

Slide 89 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost 32

Slide 90

Slide 90 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost 32

Slide 91

Slide 91 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline 32

Slide 92

Slide 92 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz 32

Slide 93

Slide 93 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] 32

Slide 94

Slide 94 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32

Slide 95

Slide 95 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite

Slide 96

Slide 96 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite

Slide 97

Slide 97 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite

Slide 98

Slide 98 text

Evaluation ▸ Our system: normal operating frequency at 2.6 GHz, boosts to max 4.0 GHz during GC boost and Queue boost ▸ Different Variants: ▹ GC Boost ▹ GC Boost with GC Parameter Tuning ▹ Queue Boost ▹ GC Tuning + GC Boost + Queue Boost ▸ Baseline ▸ Static Frequency: 3.3GHz and 4.0GHz ▸ Adrenaline [HPCA 2015] ▸ Rubik [MICRO 2015] 32 2.4 2.1 1.8 1.5 1.2 0.9 Norm. Energy 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost Etherpad Lite

Slide 99

Slide 99 text

4 1 8 5 2 9 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat

Slide 100

Slide 100 text

4 1 8 5 2 9 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat

Slide 101

Slide 101 text

4 1 8 5 2 9 32 28 24 20 16 12 8 Tail Reduction (%) Combined GC Boost GC Boost+Tuning QBoost 2.8 2.3 1.8 1.3 0.8 Norm. Energy 10 8 6 4 2 0 Tail Reduction (%) Adrenaline Rubik 3.3 GHz 4.0 GHz 3.0 2.3 1.6 0.9 Norm. Energy 40 32 24 16 8 0 Tail Reduction (%) 2.8 2.4 2.0 1.6 1.2 0.8 Norm. Energy 26 21 16 11 6 Tail Reduction (%) 3.2 2.6 2.0 1.4 0.8 Norm. Energy 32 27 22 17 12 Tail Reduction (%) Todo Lighter Client Manager Let’s Chat Pareto-dominate existing solutions; 14-21% tail reduction with only 3-14% energy overhead over baseline.

Slide 102

Slide 102 text

Conclusions Node.js uniquely combines event-driven programming model and managed language runtime, presenting new landscape and challenges to tail latency optimizations. 34

Slide 103

Slide 103 text

Conclusions Node.js uniquely combines event-driven programming model and managed language runtime, presenting new landscape and challenges to tail latency optimizations. 34 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) Event-dependency graph (EDG) and event- critical path (ECP) critical to deconstruct tail latency in Node.js.

Slide 104

Slide 104 text

Conclusions Node.js uniquely combines event-driven programming model and managed language runtime, presenting new landscape and challenges to tail latency optimizations. 34 e1 e5 e3 e4 e2 Event Critical Path Event-Dependency Graph (EDG) Event-dependency graph (EDG) and event- critical path (ECP) critical to deconstruct tail latency in Node.js. Tail Latency Optimization Queue Boosting VM Boosting VM Event Queue Tail Latency Optimization Queue Boosting VM Boosting VM Optimization VM Tuning Event Queue Intelligently leverage existing hardware features, turbo boosting in particular, to reduce latency with little to none energy overhead.