Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

VEE 2019

Yuhao Zhu

April 14, 2019
Tweet

More Decks by Yuhao Zhu

Other Decks in Technology

Transcript

  1. Wenzhi Cui
    Daniel Richins
    Yuhao Zhu
    Vijay Janapa Reddi
    Tail Latency in Node.js:
    Energy Efficient Turbo Boosting for Long Tail
    Requests in JavaScript

    View Slide

  2. 2

    View Slide

  3. 3
    Connecting People (2010s)

    View Slide

  4. 4
    Connecting People (2010s)

    View Slide

  5. 5
    Connecting Things (2020s)

    View Slide

  6. 5
    Connecting Things (2020s)
    50 Billion
    Devices
    “The Internet of Things” — Cisco

    www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf

    View Slide

  7. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests

    View Slide

  8. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests
    Blocking
    I/O
    Client
    Response

    View Slide

  9. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests
    Blocking
    I/O
    Client
    Response

    View Slide

  10. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests
    Limited resources & thrashing
    Blocking
    I/O
    Client
    Response

    View Slide

  11. 7
    Thread-based
    Programming
    [Welsh et al. ’00]

    View Slide

  12. 8
    Event-driven Programming:
    Emerging Approach

    View Slide

  13. 8
    Event Queue
    Head
    Tail
    Event-driven Programming:
    Emerging Approach

    View Slide

  14. 8
    Application

    Tasks
    Event Queue
    Head
    Tail
    Event-driven Programming:
    Emerging Approach

    View Slide

  15. 8
    Application

    Tasks
    Event Queue
    Head
    Tail
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  16. 8
    Single-threaded
    event loop
    Application

    Tasks
    Event Queue
    Head
    Tail
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  17. 8
    Single-threaded
    event loop
    Application

    Tasks
    Event Queue
    Head
    Tail
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  18. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  19. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  20. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  21. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View Slide

  22. 9
    Thread-based
    Programming
    Event-driven
    Programming
    [Welsh et al. ’00]

    View Slide

  23. The Changing Programming Language Landscape
    10

    View Slide

  24. The Changing Programming Language Landscape
    10

    View Slide

  25. The Changing Programming Language Landscape
    10

    View Slide

  26. 11
    Managed

    Language
    Event-driven

    Execution Model

    View Slide

  27. 12

    View Slide

  28. Taming Tail Latencies in Event-Driven Web Services
    13
    Fraction of Requests
    Request Latency

    Tail

    View Slide

  29. Experimental Setup
    14
    (1 Gbps Network)
    Intel i7-4790K, 4 physical cores with hyper-
    threading, 32 GB DRAM, 240GB SSD
    Wrk2: A customized Load Testing Tool,
    simulate real-world workloads

    View Slide

  30. Applications
    15
    Benchmarks I/O Type #Requests Description
    Etherpad lite N/A 20K Real time word processor.
    Todo Redis 40K Online Task Manager.
    Lighter Disk 40K Blogging Engine.
    Let’s Chat MongoDB 10K Web-based Chat Application.
    Client Manager MongoDB 40K Online Address book.
    Github Repo: https://github.com/nodebenchmark/benchmarks

    View Slide

  31. Etherpad: An Example
    16
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)

    View Slide

  32. Etherpad: An Example
    17
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)

    View Slide

  33. Etherpad: An Example
    18
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)
    (7.30, 90%)

    View Slide

  34. Etherpad: An Example
    19
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)
    (7.30, 90%)
    (36.91, 99.9%)

    View Slide

  35. Etherpad: An Example
    19
    Tail Region
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)
    (7.30, 90%)
    (36.91, 99.9%)

    View Slide

  36. Tail Region
    Tail
    Region
    Tail Region
    Tail Region
    20
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    3.0
    2.0
    1.0
    Latency (ms)
    (0.47, 50%)
    (1.12, 90%)
    (1.80, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    16
    12
    8
    4
    Latency (ms)
    (0.81, 50%)
    (1.40, 90%)
    (8.75, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    40
    30
    20
    10
    Latency (ms)
    (12.52, 50%)
    (20.30, 90%)
    (39.54, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    20
    15
    10
    5
    Latency (ms)
    (1.65, 50%)
    (2.66, 90%)
    (12.99, 99.9%)
    Todo Lighter
    Client Manager
    Let’s Chat

    View Slide

  37. Tail Region
    Tail
    Region
    Tail Region
    Tail Region
    20
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    3.0
    2.0
    1.0
    Latency (ms)
    (0.47, 50%)
    (1.12, 90%)
    (1.80, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    16
    12
    8
    4
    Latency (ms)
    (0.81, 50%)
    (1.40, 90%)
    (8.75, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    40
    30
    20
    10
    Latency (ms)
    (12.52, 50%)
    (20.30, 90%)
    (39.54, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    20
    15
    10
    5
    Latency (ms)
    (1.65, 50%)
    (2.66, 90%)
    (12.99, 99.9%)
    Todo Lighter
    Client Manager
    Let’s Chat
    Tail latency (99.9%) is 9.1x longer

    than median request latency

    View Slide

  38. System Overview
    21
    Step 1 Step 2 Step 3

    View Slide

  39. System Overview
    21
    Step 1 Step 2 Step 3
    Tools to Root-
    cause Tail in
    Node.js

    View Slide

  40. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Step 1 Step 2 Step 3

    View Slide

  41. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Step 1 Step 2 Step 3
    Root-causing
    Tail in Node.js

    View Slide

  42. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Step 1 Step 2 Step 3

    View Slide

  43. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Step 1 Step 2 Step 3
    Mitigating Tail
    in Node.js

    View Slide

  44. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Step 1 Step 2 Step 3

    View Slide

  45. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Step 1 Step 2 Step 3
    Static
    Instrumentation

    View Slide

  46. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Step 1 Step 2 Step 3
    Static
    Instrumentation
    Dynamic Analysis &
    Optimization

    View Slide

  47. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View Slide

  48. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View Slide

  49. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View Slide

  50. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View Slide

  51. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });
    Source
    readdir
    readFile1
    readFile2
    Sink
    readFile3 readFileN

    View Slide

  52. Event Dependency Graph (EDG)
    23

    View Slide

  53. Event Dependency Graph (EDG)
    23
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)

    View Slide

  54. Event Dependency Graph (EDG)
    23
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)

    View Slide

  55. Event Dependency Graph (EDG)
    23
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)
    How do we obtain the latency of each event?

    View Slide

  56. Deconstructing Event Latency
    Sever-side
    Latency

    View Slide

  57. Deconstructing Event Latency
    Sever-side
    Latency
    Queue. Exec.
    I/O

    View Slide

  58. Deconstructing Event Latency
    Sever-side
    Latency
    Queue. Exec.
    I/O Node.js Runtime
    Compute Time

    View Slide

  59. Deconstructing Event Latency
    Sever-side
    Latency
    Queue. Exec.
    I/O Node.js Runtime
    User Code
    GC Interrupt IC Miss JIT Engine
    Compute Time

    View Slide

  60. Offline Instrumentation + Online Analysis
    25
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Instrument the Node.js
    runtime so that at runtime
    we could easily obtain:
    EDG & event latency info.

    View Slide

  61. Offline Instrumentation + Online Analysis
    25
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Instrument the Node.js
    runtime so that at runtime
    we could easily obtain:
    EDG & event latency info.
    Identify root-causes of
    long tails at runtime.

    View Slide

  62. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View Slide

  63. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View Slide

  64. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View Slide

  65. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail
    Etherpad: Queue and Exec. latency increases in tails, and

    I/O latency is not dominant for this particular application.

    View Slide

  66. EDG-based Bottleneck Analysis
    27
    16
    12
    8
    4
    0
    Latency (ms)
    V n W n X n
    Avg n V t W t X t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View Slide

  67. EDG-based Bottleneck Analysis
    27
    16
    12
    8
    4
    0
    Latency (ms)
    V n W n X n
    Avg n V t W t X t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail
    Client Manager: Queue and Exec. latency dominate in tails,

    but unlike Etherpad I/O plays a notable role in the requests.

    View Slide

  68. EDG-based Bottleneck Analysis
    27
    16
    12
    8
    4
    0
    Latency (ms)
    V n W n X n
    Avg n V t W t X t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail
    Client Manager: Queue and Exec. latency dominate in tails,

    but unlike Etherpad I/O plays a notable role in the requests.
    On average, queuing and native code execution time
    contribute to ~80% of the tail latencies.

    View Slide

  69. Breakdown Within Compute
    28
    Compute Time

    View Slide

  70. Breakdown Within Compute
    28
    Compute Time

    View Slide

  71. Breakdown Within Compute
    28
    Compute Time

    View Slide

  72. Breakdown Within Compute
    28
    Compute Time

    View Slide

  73. Breakdown Within Compute
    28
    Compute Time

    View Slide

  74. Breakdown Within Compute
    28
    Compute Time

    View Slide

  75. Breakdown Within Compute
    28
    Compute Time
    We should focus optimization efforts on Garbage
    Collection and Generated Native Code

    View Slide

  76. Step 3: Tail Latency Optimization
    ▸ Leveraging the turbo boosting
    capability of modern CPUs
    ▸ Key: wisely choose what to boost
    ▸ GC Boosting
    ▹ Boost GC
    ▸ Queue Boosting
    ▹ Boost when the system is “busy”
    ▹ Use event queue stats as “hints”
    29
    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    (Intel Turbo Boosting)

    View Slide

  77. Optimization 1: VM Optimization (GC Boost)

    View Slide

  78. Optimization 1: VM Optimization (GC Boost)
    ▸ Observations:
    ▹ GCs are infrequent, little overall energy overhead
    ▹ IPC during GC is relatively high: ~1.3

    View Slide

  79. Optimization 1: VM Optimization (GC Boost)
    ▸ Observations:
    ▹ GCs are infrequent, little overall energy overhead
    ▹ IPC during GC is relatively high: ~1.3
    ▸ Implementation
    ▹ User-space DVFS embedded in Google V8: enter boosting at GC prologues and exit
    boosting at GC epilogues
    ▹ More benefits if we have access to fine-grained per-core DVFS mechanism

    View Slide

  80. Optimization 2: Queue Boost
    31

    Queue
    Monitor
    DVFS

    View Slide

  81. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”

    Queue
    Monitor
    DVFS

    View Slide

  82. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”
    ▸ How do you detect that?

    Queue
    Monitor
    DVFS

    View Slide

  83. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”
    ▸ How do you detect that?
    ▸ Rely on two queue-related heuristics:
    ▹ # of events in the queue
    ▹ Processing time of the head-of-line event

    Queue
    Monitor
    DVFS

    View Slide

  84. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”
    ▸ How do you detect that?
    ▸ Rely on two queue-related heuristics:
    ▹ # of events in the queue
    ▹ Processing time of the head-of-line event

    Queue
    Monitor
    DVFS
    ▸ Implementation:
    ▸ Periodic Sampling: Every 1 ms
    ▸ Dynamic Thresholding
    ▹ Sample the average value of event number and
    per event processing time
    ▹ Amplify the average value to decide a dynamic
    threshold by 2-3x

    View Slide

  85. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    32

    View Slide

  86. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    32

    View Slide

  87. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    32

    View Slide

  88. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    32

    View Slide

  89. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    32

    View Slide

  90. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    32

    View Slide

  91. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    32

    View Slide

  92. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    32

    View Slide

  93. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    32

    View Slide

  94. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32

    View Slide

  95. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View Slide

  96. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View Slide

  97. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View Slide

  98. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View Slide

  99. 4
    1
    8
    5
    2
    9
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    2.8
    2.3
    1.8
    1.3
    0.8
    Norm. Energy
    10
    8
    6
    4
    2
    0
    Tail Reduction (%)
    Adrenaline
    Rubik
    3.3 GHz
    4.0 GHz
    3.0
    2.3
    1.6
    0.9
    Norm. Energy
    40
    32
    24
    16
    8
    0
    Tail Reduction (%)
    2.8
    2.4
    2.0
    1.6
    1.2
    0.8
    Norm. Energy
    26
    21
    16
    11
    6
    Tail Reduction (%)
    3.2
    2.6
    2.0
    1.4
    0.8
    Norm. Energy
    32
    27
    22
    17
    12
    Tail Reduction (%)
    Todo Lighter
    Client Manager
    Let’s Chat

    View Slide

  100. 4
    1
    8
    5
    2
    9
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    2.8
    2.3
    1.8
    1.3
    0.8
    Norm. Energy
    10
    8
    6
    4
    2
    0
    Tail Reduction (%)
    Adrenaline
    Rubik
    3.3 GHz
    4.0 GHz
    3.0
    2.3
    1.6
    0.9
    Norm. Energy
    40
    32
    24
    16
    8
    0
    Tail Reduction (%)
    2.8
    2.4
    2.0
    1.6
    1.2
    0.8
    Norm. Energy
    26
    21
    16
    11
    6
    Tail Reduction (%)
    3.2
    2.6
    2.0
    1.4
    0.8
    Norm. Energy
    32
    27
    22
    17
    12
    Tail Reduction (%)
    Todo Lighter
    Client Manager
    Let’s Chat

    View Slide

  101. 4
    1
    8
    5
    2
    9
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    2.8
    2.3
    1.8
    1.3
    0.8
    Norm. Energy
    10
    8
    6
    4
    2
    0
    Tail Reduction (%)
    Adrenaline
    Rubik
    3.3 GHz
    4.0 GHz
    3.0
    2.3
    1.6
    0.9
    Norm. Energy
    40
    32
    24
    16
    8
    0
    Tail Reduction (%)
    2.8
    2.4
    2.0
    1.6
    1.2
    0.8
    Norm. Energy
    26
    21
    16
    11
    6
    Tail Reduction (%)
    3.2
    2.6
    2.0
    1.4
    0.8
    Norm. Energy
    32
    27
    22
    17
    12
    Tail Reduction (%)
    Todo Lighter
    Client Manager
    Let’s Chat
    Pareto-dominate existing solutions; 14-21% tail reduction
    with only 3-14% energy overhead over baseline.

    View Slide

  102. Conclusions
    Node.js uniquely combines event-driven
    programming model and managed language
    runtime, presenting new landscape and
    challenges to tail latency optimizations.
    34

    View Slide

  103. Conclusions
    Node.js uniquely combines event-driven
    programming model and managed language
    runtime, presenting new landscape and
    challenges to tail latency optimizations.
    34
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)
    Event-dependency graph (EDG) and event-
    critical path (ECP) critical to deconstruct tail
    latency in Node.js.

    View Slide

  104. Conclusions
    Node.js uniquely combines event-driven
    programming model and managed language
    runtime, presenting new landscape and
    challenges to tail latency optimizations.
    34
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)
    Event-dependency graph (EDG) and event-
    critical path (ECP) critical to deconstruct tail
    latency in Node.js.
    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM
    Event Queue
    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Intelligently leverage existing hardware features,
    turbo boosting in particular, to reduce latency
    with little to none energy overhead.

    View Slide