Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

Tail Latency in Node.js: Energy Efficient Turbo Boosting for Long Latency Requests in Event-Driven Web Services

VEE 2019

Yuhao Zhu

April 14, 2019
Tweet

More Decks by Yuhao Zhu

Other Decks in Technology

Transcript

  1. Wenzhi Cui
    Daniel Richins
    Yuhao Zhu
    Vijay Janapa Reddi
    Tail Latency in Node.js:
    Energy Efficient Turbo Boosting for Long Tail
    Requests in JavaScript

    View full-size slide

  2. 3
    Connecting People (2010s)

    View full-size slide

  3. 4
    Connecting People (2010s)

    View full-size slide

  4. 5
    Connecting Things (2020s)

    View full-size slide

  5. 5
    Connecting Things (2020s)
    50 Billion
    Devices
    “The Internet of Things” — Cisco

    www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf

    View full-size slide

  6. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests

    View full-size slide

  7. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests
    Blocking
    I/O
    Client
    Response

    View full-size slide

  8. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests
    Blocking
    I/O
    Client
    Response

    View full-size slide

  9. 6
    Thread-based Programming:
    Traditional Approach
    Client
    Requests
    Limited resources & thrashing
    Blocking
    I/O
    Client
    Response

    View full-size slide

  10. 7
    Thread-based
    Programming
    [Welsh et al. ’00]

    View full-size slide

  11. 8
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  12. 8
    Event Queue
    Head
    Tail
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  13. 8
    Application

    Tasks
    Event Queue
    Head
    Tail
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  14. 8
    Application

    Tasks
    Event Queue
    Head
    Tail
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  15. 8
    Single-threaded
    event loop
    Application

    Tasks
    Event Queue
    Head
    Tail
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  16. 8
    Single-threaded
    event loop
    Application

    Tasks
    Event Queue
    Head
    Tail
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  17. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  18. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  19. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  20. 8
    Single-threaded
    event loop
    DB Access
    File I/O
    Network
    Application

    Tasks
    Event Queue
    Head
    Tail
    Asynchronous I/O
    fs.readFile(‘input.txt’,

    function (err, data) {
    if (err) return console.error(err);
    console.log(data.toString());

    }

    );
    console.log("Program continues…”);
    Event-driven Programming:
    Emerging Approach

    View full-size slide

  21. 9
    Thread-based
    Programming
    Event-driven
    Programming
    [Welsh et al. ’00]

    View full-size slide

  22. The Changing Programming Language Landscape
    10

    View full-size slide

  23. The Changing Programming Language Landscape
    10

    View full-size slide

  24. The Changing Programming Language Landscape
    10

    View full-size slide

  25. 11
    Managed

    Language
    Event-driven

    Execution Model

    View full-size slide

  26. Taming Tail Latencies in Event-Driven Web Services
    13
    Fraction of Requests
    Request Latency

    Tail

    View full-size slide

  27. Experimental Setup
    14
    (1 Gbps Network)
    Intel i7-4790K, 4 physical cores with hyper-
    threading, 32 GB DRAM, 240GB SSD
    Wrk2: A customized Load Testing Tool,
    simulate real-world workloads

    View full-size slide

  28. Applications
    15
    Benchmarks I/O Type #Requests Description
    Etherpad lite N/A 20K Real time word processor.
    Todo Redis 40K Online Task Manager.
    Lighter Disk 40K Blogging Engine.
    Let’s Chat MongoDB 10K Web-based Chat Application.
    Client Manager MongoDB 40K Online Address book.
    Github Repo: https://github.com/nodebenchmark/benchmarks

    View full-size slide

  29. Etherpad: An Example
    16
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)

    View full-size slide

  30. Etherpad: An Example
    17
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)

    View full-size slide

  31. Etherpad: An Example
    18
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)
    (7.30, 90%)

    View full-size slide

  32. Etherpad: An Example
    19
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)
    (7.30, 90%)
    (36.91, 99.9%)

    View full-size slide

  33. Etherpad: An Example
    19
    Tail Region
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    60
    50
    40
    30
    20
    10
    Latency (ms)
    (1.85, 50%)
    (7.30, 90%)
    (36.91, 99.9%)

    View full-size slide

  34. Tail Region
    Tail
    Region
    Tail Region
    Tail Region
    20
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    3.0
    2.0
    1.0
    Latency (ms)
    (0.47, 50%)
    (1.12, 90%)
    (1.80, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    16
    12
    8
    4
    Latency (ms)
    (0.81, 50%)
    (1.40, 90%)
    (8.75, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    40
    30
    20
    10
    Latency (ms)
    (12.52, 50%)
    (20.30, 90%)
    (39.54, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    20
    15
    10
    5
    Latency (ms)
    (1.65, 50%)
    (2.66, 90%)
    (12.99, 99.9%)
    Todo Lighter
    Client Manager
    Let’s Chat

    View full-size slide

  35. Tail Region
    Tail
    Region
    Tail Region
    Tail Region
    20
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    3.0
    2.0
    1.0
    Latency (ms)
    (0.47, 50%)
    (1.12, 90%)
    (1.80, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    16
    12
    8
    4
    Latency (ms)
    (0.81, 50%)
    (1.40, 90%)
    (8.75, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    40
    30
    20
    10
    Latency (ms)
    (12.52, 50%)
    (20.30, 90%)
    (39.54, 99.9%)
    1.0
    0.8
    0.6
    0.4
    0.2
    0.0
    CDF
    20
    15
    10
    5
    Latency (ms)
    (1.65, 50%)
    (2.66, 90%)
    (12.99, 99.9%)
    Todo Lighter
    Client Manager
    Let’s Chat
    Tail latency (99.9%) is 9.1x longer

    than median request latency

    View full-size slide

  36. System Overview
    21
    Step 1 Step 2 Step 3

    View full-size slide

  37. System Overview
    21
    Step 1 Step 2 Step 3
    Tools to Root-
    cause Tail in
    Node.js

    View full-size slide

  38. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Step 1 Step 2 Step 3

    View full-size slide

  39. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Step 1 Step 2 Step 3
    Root-causing
    Tail in Node.js

    View full-size slide

  40. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Step 1 Step 2 Step 3

    View full-size slide

  41. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Step 1 Step 2 Step 3
    Mitigating Tail
    in Node.js

    View full-size slide

  42. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Step 1 Step 2 Step 3

    View full-size slide

  43. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Step 1 Step 2 Step 3
    Static
    Instrumentation

    View full-size slide

  44. System Overview
    21
    Tail Latency
    Reconstruction
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    Event
    Data
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Event-Dependency
    Graph (EDG)
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Step 1 Step 2 Step 3
    Static
    Instrumentation
    Dynamic Analysis &
    Optimization

    View full-size slide

  45. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View full-size slide

  46. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View full-size slide

  47. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View full-size slide

  48. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });

    View full-size slide

  49. Step 1: Latency Reconstruction
    22
    var count = N;
    fs.readdir(“/data”, function dir(err, files) {
    files.foreach(function file(f, index) {
    var fname = …;
    fs.readFile(fname, function read(err, data) {
    count -= 1;
    if (count == 0)
    sendResponse();
    });
    });
    });
    Source
    readdir
    readFile1
    readFile2
    Sink
    readFile3 readFileN

    View full-size slide

  50. Event Dependency Graph (EDG)
    23

    View full-size slide

  51. Event Dependency Graph (EDG)
    23
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)

    View full-size slide

  52. Event Dependency Graph (EDG)
    23
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)

    View full-size slide

  53. Event Dependency Graph (EDG)
    23
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)
    How do we obtain the latency of each event?

    View full-size slide

  54. Deconstructing Event Latency
    Sever-side
    Latency

    View full-size slide

  55. Deconstructing Event Latency
    Sever-side
    Latency
    Queue. Exec.
    I/O

    View full-size slide

  56. Deconstructing Event Latency
    Sever-side
    Latency
    Queue. Exec.
    I/O Node.js Runtime
    Compute Time

    View full-size slide

  57. Deconstructing Event Latency
    Sever-side
    Latency
    Queue. Exec.
    I/O Node.js Runtime
    User Code
    GC Interrupt IC Miss JIT Engine
    Compute Time

    View full-size slide

  58. Offline Instrumentation + Online Analysis
    25
    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Instrument the Node.js
    runtime so that at runtime
    we could easily obtain:
    EDG & event latency info.

    View full-size slide

  59. Offline Instrumentation + Online Analysis
    25
    Exec Queue I/O
    Req1
    Exec Queue I/O
    Req2
    Request Latency
    Tail Latency
    Bottleneck Anaysis
    IO (%)
    Queue (%)
    Exec (%)
    GC (%)
    JIT (%)

    Event-
    driven
    runtime
    (libuv)
    JS to C++ bindings
    Req1 Req2 …
    JavaScript
    runtime
    (V8)
    File/Net JS libraries
    Instrument the Node.js
    runtime so that at runtime
    we could easily obtain:
    EDG & event latency info.
    Identify root-causes of
    long tails at runtime.

    View full-size slide

  60. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View full-size slide

  61. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View full-size slide

  62. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View full-size slide

  63. Step 2: EDG-based Bottleneck Analysis
    26
    50
    40
    30
    20
    10
    0
    Latency (ms)
    A n B n C n D n
    Avg n A t B t C t D t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail
    Etherpad: Queue and Exec. latency increases in tails, and

    I/O latency is not dominant for this particular application.

    View full-size slide

  64. EDG-based Bottleneck Analysis
    27
    16
    12
    8
    4
    0
    Latency (ms)
    V n W n X n
    Avg n V t W t X t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail

    View full-size slide

  65. EDG-based Bottleneck Analysis
    27
    16
    12
    8
    4
    0
    Latency (ms)
    V n W n X n
    Avg n V t W t X t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail
    Client Manager: Queue and Exec. latency dominate in tails,

    but unlike Etherpad I/O plays a notable role in the requests.

    View full-size slide

  66. EDG-based Bottleneck Analysis
    27
    16
    12
    8
    4
    0
    Latency (ms)
    V n W n X n
    Avg n V t W t X t
    Avg t
    Request Type
    IO Queue Exec
    Tail
    Non-tail
    Client Manager: Queue and Exec. latency dominate in tails,

    but unlike Etherpad I/O plays a notable role in the requests.
    On average, queuing and native code execution time
    contribute to ~80% of the tail latencies.

    View full-size slide

  67. Breakdown Within Compute
    28
    Compute Time

    View full-size slide

  68. Breakdown Within Compute
    28
    Compute Time

    View full-size slide

  69. Breakdown Within Compute
    28
    Compute Time

    View full-size slide

  70. Breakdown Within Compute
    28
    Compute Time

    View full-size slide

  71. Breakdown Within Compute
    28
    Compute Time

    View full-size slide

  72. Breakdown Within Compute
    28
    Compute Time

    View full-size slide

  73. Breakdown Within Compute
    28
    Compute Time
    We should focus optimization efforts on Garbage
    Collection and Generated Native Code

    View full-size slide

  74. Step 3: Tail Latency Optimization
    ▸ Leveraging the turbo boosting
    capability of modern CPUs
    ▸ Key: wisely choose what to boost
    ▸ GC Boosting
    ▹ Boost GC
    ▸ Queue Boosting
    ▹ Boost when the system is “busy”
    ▹ Use event queue stats as “hints”
    29
    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    (Intel Turbo Boosting)

    View full-size slide

  75. Optimization 1: VM Optimization (GC Boost)

    View full-size slide

  76. Optimization 1: VM Optimization (GC Boost)
    ▸ Observations:
    ▹ GCs are infrequent, little overall energy overhead
    ▹ IPC during GC is relatively high: ~1.3

    View full-size slide

  77. Optimization 1: VM Optimization (GC Boost)
    ▸ Observations:
    ▹ GCs are infrequent, little overall energy overhead
    ▹ IPC during GC is relatively high: ~1.3
    ▸ Implementation
    ▹ User-space DVFS embedded in Google V8: enter boosting at GC prologues and exit
    boosting at GC epilogues
    ▹ More benefits if we have access to fine-grained per-core DVFS mechanism

    View full-size slide

  78. Optimization 2: Queue Boost
    31

    Queue
    Monitor
    DVFS

    View full-size slide

  79. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”

    Queue
    Monitor
    DVFS

    View full-size slide

  80. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”
    ▸ How do you detect that?

    Queue
    Monitor
    DVFS

    View full-size slide

  81. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”
    ▸ How do you detect that?
    ▸ Rely on two queue-related heuristics:
    ▹ # of events in the queue
    ▹ Processing time of the head-of-line event

    Queue
    Monitor
    DVFS

    View full-size slide

  82. Optimization 2: Queue Boost
    31
    ▸ More general compute acceleration: Boost when the system is “busy”
    ▸ How do you detect that?
    ▸ Rely on two queue-related heuristics:
    ▹ # of events in the queue
    ▹ Processing time of the head-of-line event

    Queue
    Monitor
    DVFS
    ▸ Implementation:
    ▸ Periodic Sampling: Every 1 ms
    ▸ Dynamic Thresholding
    ▹ Sample the average value of event number and
    per event processing time
    ▹ Amplify the average value to decide a dynamic
    threshold by 2-3x

    View full-size slide

  83. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    32

    View full-size slide

  84. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    32

    View full-size slide

  85. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    32

    View full-size slide

  86. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    32

    View full-size slide

  87. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    32

    View full-size slide

  88. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    32

    View full-size slide

  89. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    32

    View full-size slide

  90. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    32

    View full-size slide

  91. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    32

    View full-size slide

  92. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32

    View full-size slide

  93. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View full-size slide

  94. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View full-size slide

  95. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View full-size slide

  96. Evaluation
    ▸ Our system: normal operating
    frequency at 2.6 GHz, boosts to max 4.0
    GHz during GC boost and Queue boost
    ▸ Different Variants:
    ▹ GC Boost
    ▹ GC Boost with GC Parameter Tuning
    ▹ Queue Boost
    ▹ GC Tuning + GC Boost + Queue Boost
    ▸ Baseline
    ▸ Static Frequency: 3.3GHz and 4.0GHz
    ▸ Adrenaline [HPCA 2015]
    ▸ Rubik [MICRO 2015]
    32
    2.4
    2.1
    1.8
    1.5
    1.2
    0.9
    Norm. Energy
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    Etherpad Lite

    View full-size slide

  97. 4
    1
    8
    5
    2
    9
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    2.8
    2.3
    1.8
    1.3
    0.8
    Norm. Energy
    10
    8
    6
    4
    2
    0
    Tail Reduction (%)
    Adrenaline
    Rubik
    3.3 GHz
    4.0 GHz
    3.0
    2.3
    1.6
    0.9
    Norm. Energy
    40
    32
    24
    16
    8
    0
    Tail Reduction (%)
    2.8
    2.4
    2.0
    1.6
    1.2
    0.8
    Norm. Energy
    26
    21
    16
    11
    6
    Tail Reduction (%)
    3.2
    2.6
    2.0
    1.4
    0.8
    Norm. Energy
    32
    27
    22
    17
    12
    Tail Reduction (%)
    Todo Lighter
    Client Manager
    Let’s Chat

    View full-size slide

  98. 4
    1
    8
    5
    2
    9
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    2.8
    2.3
    1.8
    1.3
    0.8
    Norm. Energy
    10
    8
    6
    4
    2
    0
    Tail Reduction (%)
    Adrenaline
    Rubik
    3.3 GHz
    4.0 GHz
    3.0
    2.3
    1.6
    0.9
    Norm. Energy
    40
    32
    24
    16
    8
    0
    Tail Reduction (%)
    2.8
    2.4
    2.0
    1.6
    1.2
    0.8
    Norm. Energy
    26
    21
    16
    11
    6
    Tail Reduction (%)
    3.2
    2.6
    2.0
    1.4
    0.8
    Norm. Energy
    32
    27
    22
    17
    12
    Tail Reduction (%)
    Todo Lighter
    Client Manager
    Let’s Chat

    View full-size slide

  99. 4
    1
    8
    5
    2
    9
    32
    28
    24
    20
    16
    12
    8
    Tail Reduction (%)
    Combined
    GC Boost
    GC Boost+Tuning
    QBoost
    2.8
    2.3
    1.8
    1.3
    0.8
    Norm. Energy
    10
    8
    6
    4
    2
    0
    Tail Reduction (%)
    Adrenaline
    Rubik
    3.3 GHz
    4.0 GHz
    3.0
    2.3
    1.6
    0.9
    Norm. Energy
    40
    32
    24
    16
    8
    0
    Tail Reduction (%)
    2.8
    2.4
    2.0
    1.6
    1.2
    0.8
    Norm. Energy
    26
    21
    16
    11
    6
    Tail Reduction (%)
    3.2
    2.6
    2.0
    1.4
    0.8
    Norm. Energy
    32
    27
    22
    17
    12
    Tail Reduction (%)
    Todo Lighter
    Client Manager
    Let’s Chat
    Pareto-dominate existing solutions; 14-21% tail reduction
    with only 3-14% energy overhead over baseline.

    View full-size slide

  100. Conclusions
    Node.js uniquely combines event-driven
    programming model and managed language
    runtime, presenting new landscape and
    challenges to tail latency optimizations.
    34

    View full-size slide

  101. Conclusions
    Node.js uniquely combines event-driven
    programming model and managed language
    runtime, presenting new landscape and
    challenges to tail latency optimizations.
    34
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)
    Event-dependency graph (EDG) and event-
    critical path (ECP) critical to deconstruct tail
    latency in Node.js.

    View full-size slide

  102. Conclusions
    Node.js uniquely combines event-driven
    programming model and managed language
    runtime, presenting new landscape and
    challenges to tail latency optimizations.
    34
    e1
    e5
    e3
    e4
    e2
    Event
    Critical Path
    Event-Dependency
    Graph (EDG)
    Event-dependency graph (EDG) and event-
    critical path (ECP) critical to deconstruct tail
    latency in Node.js.
    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM
    Event Queue
    Tail Latency
    Optimization
    Queue
    Boosting
    VM
    Boosting
    VM Optimization
    VM
    Tuning
    Event Queue
    Intelligently leverage existing hardware features,
    turbo boosting in particular, to reduce latency
    with little to none energy overhead.

    View full-size slide