Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Microarchitectural Implications of Event-driven Server-side Web Applications

Yuhao Zhu
December 09, 2015

Microarchitectural Implications of Event-driven Server-side Web Applications

MICRO 2015

Yuhao Zhu

December 09, 2015
Tweet

More Decks by Yuhao Zhu

Other Decks in Education

Transcript

  1. 1
    MICRO 2015
    Microarchitectural Implications of
    Event-driven
    Server-side Web Applications
    Yuhao Zhu
    UT Austin

    with Daniel Richins, Matthew Halpern, Vijay Janapa Reddi

    View full-size slide

  2. 2
    Instruction Supply
    is a Critical Aspect of
    Microarchitecture Design

    View full-size slide

  3. Exploit Instruction Locality

    a.k.a., Common Case Design
    3
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB

    View full-size slide

  4. Exploit Instruction Locality

    a.k.a., Common Case Design
    3
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Hot
    instructions

    View full-size slide

  5. Exploit Instruction Locality

    a.k.a., Common Case Design
    3
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Hot
    instructions
    Hot branch
    history patterns

    View full-size slide

  6. Exploit Instruction Locality

    a.k.a., Common Case Design
    3
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Hot
    instructions
    Hot branch
    history patterns
    Hot pages

    View full-size slide

  7. Exploit Instruction Locality

    a.k.a., Common Case Design
    3
    SPEC CPU

    (mostly)
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Hot
    instructions
    Hot branch
    history patterns
    Hot pages

    View full-size slide

  8. Exploit Instruction Locality

    a.k.a., Common Case Design
    3
    SPEC CPU

    (mostly)
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Event-driven

    Applications
    Hot
    instructions
    Hot branch
    history patterns
    Hot pages

    View full-size slide

  9. Locality Lost
    4
    SPEC CPU

    (mostly)
    Event-driven

    Applications
    Hot
    instructions
    Hot branch
    history pattern
    Hot pages
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB

    View full-size slide

  10. Locality Lost
    5
    SPEC CPU

    (mostly)
    Event-driven

    Applications
    Tight loops
    Hot pages
    Little branch
    aliasing
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB

    View full-size slide

  11. Locality Lost
    5
    SPEC CPU

    (mostly)
    Event-driven

    Applications
    Tight loops
    Hot pages
    Little branch
    aliasing
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB

    View full-size slide

  12. Event-driven Execution Model
    6
    Event
    Queue
    Head
    Tail

    View full-size slide

  13. Event-driven Execution Model
    6
    Event
    Queue
    Client

    Request
    Head
    Tail

    View full-size slide

  14. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    Client

    Request
    Head
    Tail

    View full-size slide

  15. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    Client

    Request
    Head
    Tail

    View full-size slide

  16. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    DB Access
    File I/O
    Network
    Client

    Request
    Head
    Tail

    View full-size slide

  17. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    DB Access
    File I/O
    Network
    Client

    Request
    Head
    Tail

    View full-size slide

  18. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    DB Access
    File I/O
    Network
    Client

    Request
    Head
    Tail

    View full-size slide

  19. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    DB Access
    File I/O
    Network
    Client

    Request
    Head
    Tail

    View full-size slide

  20. Event-driven Execution Model
    6
    Event
    Queue
    Single-threaded
    Event Loop
    DB Access
    File I/O
    Network
    Client

    Request
    Head
    Tail
    Our Focus

    View full-size slide

  21. Applications
    7
    Application Domain
    Etherpad Lite
    Document
    Collaboration
    Let’s Chat Messaging
    Lighter
    Content
    Management
    Mud Gaming
    Todo
    Task
    Management
    Word Finder API Services
    https://github.com/nodebenchmark/

    View full-size slide

  22. Locality Lost
    8

    View full-size slide

  23. Locality Lost
    8
    lbm
    leslie3d
    libquantum
    astar
    hmmer
    mcf
    bzip2
    namd
    gromacs
    zeusmp
    calculix
    GemsFDTD
    soplex
    sphinx3
    bwaves
    milc
    dealII
    wrf
    h264ref
    gamess
    tonto
    xalancbmk
    povray
    perlbench
    sjeng
    gcc
    gobmk
    omnetpp
    cactusADM
    L1 I-Cache MPKI
    0
    30
    60
    90
    120
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    I-Cache Parameters

    32 KB, 64 B cache line, 8-way

    View full-size slide

  24. Locality Lost
    9
    lbm
    leslie3d
    libquantum
    astar
    hmmer
    mcf
    bzip2
    namd
    gromacs
    zeusmp
    calculix
    GemsFDTD
    soplex
    sphinx3
    bwaves
    milc
    dealII
    wrf
    h264ref
    gamess
    tonto
    xalancbmk
    povray
    perlbench
    sjeng
    gcc
    gobmk
    omnetpp
    cactusADM
    L1 I-Cache MPKI
    0
    30
    60
    90
    120
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    I-Cache Parameters

    32 KB, 64 B cache line, 8-way

    View full-size slide

  25. Locality Lost
    9
    lbm
    leslie3d
    libquantum
    astar
    hmmer
    mcf
    bzip2
    namd
    gromacs
    zeusmp
    calculix
    GemsFDTD
    soplex
    sphinx3
    bwaves
    milc
    dealII
    wrf
    h264ref
    gamess
    tonto
    xalancbmk
    povray
    perlbench
    sjeng
    gcc
    gobmk
    omnetpp
    cactusADM
    L1 I-Cache MPKI
    0
    30
    60
    90
    120
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    I-Cache Parameters

    32 KB, 64 B cache line, 8-way
    SPEC CPU 2006 Average

    View full-size slide

  26. Locality Lost
    10
    lbm
    leslie3d
    libquantum
    astar
    hmmer
    mcf
    bzip2
    namd
    gromacs
    zeusmp
    calculix
    GemsFDTD
    soplex
    sphinx3
    bwaves
    milc
    dealII
    wrf
    h264ref
    gamess
    tonto
    xalancbmk
    povray
    perlbench
    sjeng
    gcc
    gobmk
    omnetpp
    cactusADM
    L1 I-Cache MPKI
    0
    30
    60
    90
    120
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    I-Cache Parameters

    32 KB, 64 B cache line, 8-way
    SPEC CPU 2006 Average

    View full-size slide

  27. Locality Lost
    10
    lbm
    leslie3d
    libquantum
    astar
    hmmer
    mcf
    bzip2
    namd
    gromacs
    zeusmp
    calculix
    GemsFDTD
    soplex
    sphinx3
    bwaves
    milc
    dealII
    wrf
    h264ref
    gamess
    tonto
    xalancbmk
    povray
    perlbench
    sjeng
    gcc
    gobmk
    omnetpp
    cactusADM
    L1 I-Cache MPKI
    0
    30
    60
    90
    120
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    I-Cache Parameters

    32 KB, 64 B cache line, 8-way
    SPEC CPU 2006 Average
    Node.js has 4.2 X higher MPKI than SPEC CPU.

    View full-size slide

  28. Root Cause Analysis
    11

    View full-size slide

  29. Root Cause Analysis
    11
    High I-$ Miss Ratio

    View full-size slide

  30. Root Cause Analysis
    11
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  31. Root Cause Analysis
    11
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  32. Root Cause Analysis
    11
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    omnetpp
    (worst)
    lbm
    (best)
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  33. Root Cause Analysis
    11
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    omnetpp
    (worst)
    lbm
    (best)
    Let’s
    Chat
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  34. Root Cause Analysis
    11
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    100
    80
    60
    40
    20
    0
    Dynamic Instructions (%)
    20
    24
    28
    212
    216
    Reuse Distance (log)
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  35. Root Cause Analysis
    12
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  36. Root Cause Analysis
    12


    Instruction

    Stream
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  37. Root Cause Analysis
    12


    Instruction

    Stream
    High I-$ Miss Ratio
    Large Instruction
    Reuse-distance

    View full-size slide

  38. Event 2
    Event 1
    Event 3
    Root Cause Analysis
    12


    Instruction

    Stream
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Instruction
    Reuse-distance

    View full-size slide

  39. Event 2
    Event 1
    Event 3
    Root Cause Analysis
    12


    Instruction

    Stream
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Instruction
    Reuse-distance
    Large Event
    Footprint

    View full-size slide

  40. Root Cause Analysis
    13
    100
    80
    60
    40
    20
    0
    Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Intra-event
    Inter-event
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  41. Root Cause Analysis
    13
    100
    80
    60
    40
    20
    0
    Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Intra-event
    Inter-event
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses

    View full-size slide

  42. Root Cause Analysis
    13
    100
    80
    60
    40
    20
    0
    Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Intra-event
    Inter-event
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses

    View full-size slide

  43. Root Cause Analysis
    13
    100
    80
    60
    40
    20
    0
    Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Intra-event
    Inter-event
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses

    View full-size slide

  44. Root Cause Analysis
    13
    100
    80
    60
    40
    20
    0
    Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Intra-event
    Inter-event
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses

    View full-size slide

  45. Root Cause Analysis
    13
    100
    80
    60
    40
    20
    0
    Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Intra-event
    Inter-event
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    100
    80
    60
    40
    20
    0
    Static Instructions (%)
    32
    64
    96
    128
    160
    192
    224
    256
    > 256
    # of reuses
    Most instruction reuses
    are inter-event.

    View full-size slide

  46. Root Cause Analysis
    14
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  47. Root Cause Analysis
    14
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  48. Root Cause Analysis
    14
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    L1-I$ Size
    32KB
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  49. Root Cause Analysis
    14
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    13
    215
    217
    219
    221
    Event Footprint (Bytes)
    L1 Cache Size 32 KB
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  50. Root Cause Analysis
    14
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    13
    215
    217
    219
    221
    Event Footprint (Bytes)
    L1 Cache Size 32 KB
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    Only 12% Events
    Fit in a 32 KB I-$
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  51. Root Cause Analysis
    15
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  52. Root Cause Analysis
    15
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  53. Root Cause Analysis
    15
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    213
    215
    217
    219
    221
    Event Footprint (Bytes)
    100
    80
    60
    40
    20
    0
    Events (%)
    Most events’ footprints
    do not fit in a typical I-$.
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Large Instruction
    Reuse-distance

    View full-size slide

  54. Root Cause Analysis
    16
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Few
    Event Types
    Large Instruction
    Reuse-distance

    View full-size slide

  55. Root Cause Analysis
    16
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Few
    Event Types
    Minimal Tight
    Loops
    Large Instruction
    Reuse-distance

    View full-size slide

  56. Root Cause Analysis
    16
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Few
    Event Types
    Minimal Tight
    Loops
    Large Instruction
    Reuse-distance
    Event-driven
    Applications

    View full-size slide

  57. Microarchitecture
    Behaviors
    Root Cause Analysis
    16
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Few
    Event Types
    Minimal Tight
    Loops
    Large Instruction
    Reuse-distance
    Event-driven
    Applications

    View full-size slide

  58. Application
    Characteristics
    Microarchitecture
    Behaviors
    Root Cause Analysis
    16
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Few
    Event Types
    Minimal Tight
    Loops
    Large Instruction
    Reuse-distance
    Event-driven
    Applications

    View full-size slide

  59. Application
    Characteristics
    Microarchitecture
    Behaviors
    Root Cause Analysis
    16
    High I-$ Miss Ratio
    Inter-event
    Code Reuse
    Large Event
    Footprint
    Few
    Event Types
    Minimal Tight
    Loops
    Large Instruction
    Reuse-distance
    Event-driven
    Applications

    View full-size slide

  60. 17
    Can we better capture
    instruction locality
    to improve instruction
    supply efficiency?

    View full-size slide

  61. Scale Up Hardware Resources?
    18
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter

    View full-size slide

  62. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter

    View full-size slide

  63. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    SPEC CPU 2006
    Avarage @ 32KB
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter

    View full-size slide

  64. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter

    View full-size slide

  65. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter

    View full-size slide

  66. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter

    View full-size slide

  67. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter
    ~256 KB Needed!

    View full-size slide

  68. Scale Up Hardware Resources?
    18
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    120
    90
    60
    30
    0
    I-Cache MPKI
    16 64 256 1024
    I-Cache size (KB)
    2.0
    1.8
    1.6
    1.4
    1.2
    1.0
    Norm. Time
    3.2
    2.4
    1.6
    Let's Chat
    Word Finder
    Todo
    Mud
    Etherpad
    Lighter
    ~256 KB Needed!

    View full-size slide

  69. Exploit Inter-Event Locality
    19

    View full-size slide

  70. Exploit Inter-Event Locality
    19
    … …
    Event 1
    Instruction

    Stream
    Event 2 Event 3

    View full-size slide

  71. Exploit Inter-Event Locality
    19
    1. Retain the reused portion of an
    event’s footprint in the cache
    … …
    Event 1
    Instruction

    Stream
    Event 2 Event 3

    View full-size slide

  72. Exploit Inter-Event Locality
    19
    1. Retain the reused portion of an
    event’s footprint in the cache
    … …
    Event 1
    Instruction

    Stream
    Event 2 Event 3

    View full-size slide

  73. Exploit Inter-Event Locality
    19
    1. Retain the reused portion of an
    event’s footprint in the cache
    2. Prefetch the unretained part
    … …
    Event 1
    Instruction

    Stream
    Event 2 Event 3

    View full-size slide

  74. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    20
    PRINCIPLES
    Exploit Inter-Event Locality

    View full-size slide

  75. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    20
    PRINCIPLES
    PRACTICES
    Exploit Inter-Event Locality

    View full-size slide

  76. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    20
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    Exploit Inter-Event Locality

    View full-size slide

  77. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    Exploit Inter-Event Locality

    View full-size slide

  78. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    Exploit Inter-Event Locality

    View full-size slide

  79. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    Exploit Inter-Event Locality

    View full-size slide

  80. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    Exploit Inter-Event Locality

    View full-size slide

  81. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    k
    Exploit Inter-Event Locality

    View full-size slide

  82. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    k
    Exploit Inter-Event Locality

    View full-size slide

  83. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    k
    … … … … … … … k j
    Exploit Inter-Event Locality

    View full-size slide

  84. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    k
    … … … … … … … k j
    a
    Exploit Inter-Event Locality

    View full-size slide

  85. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    k
    … … … … … … … k j
    a
    MISS
    Exploit Inter-Event Locality

    View full-size slide

  86. 2. Prefetch the unretained part
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    20
    a b c d e f g h i
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    MRU LRU
    j
    k
    Inter-event Locality Lost!
    … … … … … … … k j
    a
    MISS
    Exploit Inter-Event Locality

    View full-size slide

  87. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU

    View full-size slide

  88. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    j

    View full-size slide

  89. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    j

    View full-size slide

  90. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    k j

    View full-size slide

  91. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    j
    k

    View full-size slide

  92. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    j
    k
    a

    View full-size slide

  93. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    j
    k
    a
    HIT

    View full-size slide

  94. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07])

    ▹ Insert incoming line into LRU position, not MRU position
    21
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    a b c d e f g h i
    MRU LRU
    j
    k
    Reused Portion Retained!
    a
    HIT

    View full-size slide

  95. 22
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter

    View full-size slide

  96. 22
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter
    SPEC CPU 2006
    Average @ 32KB

    View full-size slide

  97. 23
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter
    Baseline

    View full-size slide

  98. 24
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter
    Baseline
    LIP

    View full-size slide

  99. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache

    View full-size slide

  100. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences

    View full-size slide

  101. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …

    View full-size slide

  102. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z

    View full-size slide

  103. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z

    View full-size slide

  104. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z x

    View full-size slide

  105. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z x

    View full-size slide

  106. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z x y z

    View full-size slide

  107. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z x y z y

    View full-size slide

  108. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z x y z y

    View full-size slide

  109. Exploit Inter-Event Locality
    2. Prefetch the unretained part
    25
    PRINCIPLES
    PRACTICES
    1. Retain the reused portion of an
    event’s footprint in the cache
    ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08])

    ▹ Find patterns in miss sequences
    … …
    x y z x y z y z …

    View full-size slide

  110. 26
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter
    Baseline
    LIP

    View full-size slide

  111. 27
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter
    Baseline
    LIP
    LIP+TIFS

    View full-size slide

  112. 27
    I-cache MPKI
    0
    25
    50
    75
    100
    W
    ord
    Finder
    Todo
    M
    ud
    Etherpad
    Let's
    C
    hat
    Lighter
    Baseline
    LIP
    LIP+TIFS
    88% Average MPKI Reduction

    View full-size slide

  113. Exploit Instruction Locality

    a.k.a., Common Case Design
    28
    SPEC CPU

    (mostly)
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Event-driven

    Applications
    Hot
    instructions
    Hot branch
    history patterns
    Hot pages
    Cache

    View full-size slide

  114. Exploit Instruction Locality

    a.k.a., Common Case Design
    28
    SPEC CPU

    (mostly)
    Instruction
    Supply
    Cache
    Branch
    Predictor
    TLB
    Event-driven

    Applications
    Hot
    instructions
    Hot branch
    history patterns
    Hot pages
    Cache
    Branch
    Predictor
    TLB

    View full-size slide

  115. Beyond Instruction Cache — Branch Predictor
    29
    milc
    GemsFDTD
    lbm
    libquantum
    calculix
    zeusmp
    wrf
    perlbench
    leslie3d
    cactusADM
    hmmer
    xalancbmk
    tonto
    gamess
    dealII
    sphinx3
    h264ref
    omnetpp
    soplex
    bwaves
    namd
    povray
    mcf
    gromacs
    gcc
    bzip2
    sjeng
    astar
    gobmk
    Misprediction (%)
    0
    5
    10
    15
    20
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    Tournament Predictor

    12-bit history register
    256 local branch histories

    View full-size slide

  116. Beyond Instruction Cache — Branch Predictor
    29
    milc
    GemsFDTD
    lbm
    libquantum
    calculix
    zeusmp
    wrf
    perlbench
    leslie3d
    cactusADM
    hmmer
    xalancbmk
    tonto
    gamess
    dealII
    sphinx3
    h264ref
    omnetpp
    soplex
    bwaves
    namd
    povray
    mcf
    gromacs
    gcc
    bzip2
    sjeng
    astar
    gobmk
    Misprediction (%)
    0
    5
    10
    15
    20
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    Tournament Predictor

    12-bit history register
    256 local branch histories
    2.4 X Higher!

    View full-size slide

  117. Beyond Instruction Cache — TLB
    30
    libquantum
    lbm
    hmmer
    astar
    milc
    bzip2
    mcf
    namd
    sjeng
    bwaves
    zeusmp
    cactusADM
    leslie3d
    GemsFDTD
    wrf
    gromacs
    sphinx3
    gamess
    calculix
    h264ref
    gobmk
    soplex
    tonto
    perlbench
    dealII
    omnetpp
    povray
    gcc
    xalancbmk
    L1 I-TLB MPKI
    0
    1
    2
    3
    4
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    I-TLB Parameters

    64 KB, 4-way
    4 KB page size

    View full-size slide

  118. Beyond Instruction Cache — TLB
    30
    libquantum
    lbm
    hmmer
    astar
    milc
    bzip2
    mcf
    namd
    sjeng
    bwaves
    zeusmp
    cactusADM
    leslie3d
    GemsFDTD
    wrf
    gromacs
    sphinx3
    gamess
    calculix
    h264ref
    gobmk
    soplex
    tonto
    perlbench
    dealII
    omnetpp
    povray
    gcc
    xalancbmk
    L1 I-TLB MPKI
    0
    1
    2
    3
    4
    Word Finder
    Todo
    Mud
    Etherpad
    Let's Chat
    Lighter
    SPEC CPU 2006
    NodeJS
    72 X Higher!
    I-TLB Parameters

    64 KB, 4-way
    4 KB page size

    View full-size slide

  119. 31
    Event-based processing is a fundamental
    computation pattern.

    View full-size slide

  120. 31
    Event-based processing is a fundamental
    computation pattern.
    Web Mobile Internet-of-Things
    Sensor networks Cloud

    View full-size slide

  121. 32
    MICRO 2015
    Microarchitectural Implications of
    Event-driven
    Server-side Web Applications
    Yuhao Zhu
    UT Austin

    with Daniel Richins, Matthew Halpern, Vijay Janapa Reddi

    View full-size slide