Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Microarchitectural Implications of Event-driven Server-side Web Applications

Yuhao Zhu
December 09, 2015

Microarchitectural Implications of Event-driven Server-side Web Applications

MICRO 2015

Yuhao Zhu

December 09, 2015
Tweet

More Decks by Yuhao Zhu

Other Decks in Education

Transcript

  1. 1 MICRO 2015 Microarchitectural Implications of Event-driven Server-side Web Applications

    Yuhao Zhu UT Austin with Daniel Richins, Matthew Halpern, Vijay Janapa Reddi
  2. Exploit Instruction Locality a.k.a., Common Case Design 3 Instruction Supply

    Cache Branch Predictor TLB Hot instructions Hot branch history patterns
  3. Exploit Instruction Locality a.k.a., Common Case Design 3 Instruction Supply

    Cache Branch Predictor TLB Hot instructions Hot branch history patterns Hot pages
  4. Exploit Instruction Locality a.k.a., Common Case Design 3 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Hot instructions Hot branch history patterns Hot pages
  5. Exploit Instruction Locality a.k.a., Common Case Design 3 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Event-driven Applications Hot instructions Hot branch history patterns Hot pages
  6. Locality Lost 4 SPEC CPU (mostly) Event-driven Applications Hot instructions

    Hot branch history pattern Hot pages Instruction Supply Cache Branch Predictor TLB
  7. Locality Lost 5 SPEC CPU (mostly) Event-driven Applications Tight loops

    Hot pages Little branch aliasing Instruction Supply Cache Branch Predictor TLB
  8. Locality Lost 5 SPEC CPU (mostly) Event-driven Applications Tight loops

    Hot pages Little branch aliasing Instruction Supply Cache Branch Predictor TLB
  9. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  10. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  11. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  12. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  13. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail Our Focus
  14. Applications 7 Application Domain Etherpad Lite Document Collaboration Let’s Chat

    Messaging Lighter Content Management Mud Gaming Todo Task Management Word Finder API Services https://github.com/nodebenchmark/
  15. Locality Lost 8 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way
  16. Locality Lost 9 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way
  17. Locality Lost 9 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way SPEC CPU 2006 Average
  18. Locality Lost 10 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way SPEC CPU 2006 Average
  19. Locality Lost 10 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way SPEC CPU 2006 Average Node.js has 4.2 X higher MPKI than SPEC CPU.
  20. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) High I-$ Miss Ratio Large Instruction Reuse-distance
  21. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) omnetpp (worst) lbm (best) High I-$ Miss Ratio Large Instruction Reuse-distance
  22. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) omnetpp (worst) lbm (best) Let’s Chat High I-$ Miss Ratio Large Instruction Reuse-distance
  23. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) High I-$ Miss Ratio Large Instruction Reuse-distance
  24. Root Cause Analysis 12 … … Instruction
 Stream High I-$

    Miss Ratio Large Instruction Reuse-distance
  25. Root Cause Analysis 12 … … Instruction
 Stream High I-$

    Miss Ratio Large Instruction Reuse-distance
  26. Event 2 Event 1 Event 3 Root Cause Analysis 12

    … … Instruction
 Stream High I-$ Miss Ratio Inter-event Code Reuse Large Instruction Reuse-distance
  27. Event 2 Event 1 Event 3 Root Cause Analysis 12

    … … Instruction
 Stream High I-$ Miss Ratio Inter-event Code Reuse Large Instruction Reuse-distance Large Event Footprint
  28. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  29. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  30. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  31. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  32. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  33. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Most instruction reuses are inter-event.
  34. Root Cause Analysis 14 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Large Instruction Reuse-distance
  35. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  36. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) L1-I$ Size 32KB High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  37. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) 13 215 217 219 221 Event Footprint (Bytes) L1 Cache Size 32 KB Word Finder Todo Mud Etherpad Let's Chat Lighter High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  38. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) 13 215 217 219 221 Event Footprint (Bytes) L1 Cache Size 32 KB Word Finder Todo Mud Etherpad Let's Chat Lighter Only 12% Events Fit in a 32 KB I-$ High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  39. Root Cause Analysis 15 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  40. Root Cause Analysis 15 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  41. Root Cause Analysis 15 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) Most events’ footprints do not fit in a typical I-$. High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  42. Root Cause Analysis 16 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Few Event Types Large Instruction Reuse-distance
  43. Root Cause Analysis 16 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance
  44. Root Cause Analysis 16 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  45. Microarchitecture Behaviors Root Cause Analysis 16 High I-$ Miss Ratio

    Inter-event Code Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  46. Application Characteristics Microarchitecture Behaviors Root Cause Analysis 16 High I-$

    Miss Ratio Inter-event Code Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  47. Application Characteristics Microarchitecture Behaviors Root Cause Analysis 16 High I-$

    Miss Ratio Inter-event Code Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  48. Scale Up Hardware Resources? 18 2.0 1.8 1.6 1.4 1.2

    1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  49. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  50. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) SPEC CPU 2006 Avarage @ 32KB 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  51. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  52. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  53. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  54. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter ~256 KB Needed!
  55. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter ~256 KB Needed!
  56. Exploit Inter-Event Locality 19 1. Retain the reused portion of

    an event’s footprint in the cache … … Event 1 Instruction
 Stream Event 2 Event 3
  57. Exploit Inter-Event Locality 19 1. Retain the reused portion of

    an event’s footprint in the cache … … Event 1 Instruction
 Stream Event 2 Event 3
  58. Exploit Inter-Event Locality 19 1. Retain the reused portion of

    an event’s footprint in the cache 2. Prefetch the unretained part … … Event 1 Instruction
 Stream Event 2 Event 3
  59. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache 20 PRINCIPLES Exploit Inter-Event Locality
  60. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache 20 PRINCIPLES PRACTICES Exploit Inter-Event Locality
  61. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache 20 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache Exploit Inter-Event Locality
  62. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache Exploit Inter-Event Locality
  63. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU Exploit Inter-Event Locality
  64. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j Exploit Inter-Event Locality
  65. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j Exploit Inter-Event Locality
  66. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k Exploit Inter-Event Locality
  67. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k Exploit Inter-Event Locality
  68. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k … … … … … … … k j Exploit Inter-Event Locality
  69. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k … … … … … … … k j a Exploit Inter-Event Locality
  70. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k … … … … … … … k j a MISS Exploit Inter-Event Locality
  71. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k Inter-event Locality Lost! … … … … … … … k j a MISS Exploit Inter-Event Locality
  72. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU
  73. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j
  74. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j
  75. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU k j
  76. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k
  77. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k a
  78. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k a HIT
  79. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k Reused Portion Retained! a HIT
  80. 22 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter
  81. 22 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter SPEC CPU 2006 Average @ 32KB
  82. 23 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline
  83. 24 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP
  84. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache
  85. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences
  86. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … …
  87. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z
  88. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z
  89. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x
  90. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x
  91. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z
  92. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z y
  93. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z y
  94. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z y z …
  95. 26 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP
  96. 27 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP LIP+TIFS
  97. 27 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP LIP+TIFS 88% Average MPKI Reduction
  98. Exploit Instruction Locality a.k.a., Common Case Design 28 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Event-driven Applications Hot instructions Hot branch history patterns Hot pages Cache
  99. Exploit Instruction Locality a.k.a., Common Case Design 28 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Event-driven Applications Hot instructions Hot branch history patterns Hot pages Cache Branch Predictor TLB
  100. Beyond Instruction Cache — Branch Predictor 29 milc GemsFDTD lbm

    libquantum calculix zeusmp wrf perlbench leslie3d cactusADM hmmer xalancbmk tonto gamess dealII sphinx3 h264ref omnetpp soplex bwaves namd povray mcf gromacs gcc bzip2 sjeng astar gobmk Misprediction (%) 0 5 10 15 20 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS Tournament Predictor 12-bit history register 256 local branch histories
  101. Beyond Instruction Cache — Branch Predictor 29 milc GemsFDTD lbm

    libquantum calculix zeusmp wrf perlbench leslie3d cactusADM hmmer xalancbmk tonto gamess dealII sphinx3 h264ref omnetpp soplex bwaves namd povray mcf gromacs gcc bzip2 sjeng astar gobmk Misprediction (%) 0 5 10 15 20 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS Tournament Predictor 12-bit history register 256 local branch histories 2.4 X Higher!
  102. Beyond Instruction Cache — TLB 30 libquantum lbm hmmer astar

    milc bzip2 mcf namd sjeng bwaves zeusmp cactusADM leslie3d GemsFDTD wrf gromacs sphinx3 gamess calculix h264ref gobmk soplex tonto perlbench dealII omnetpp povray gcc xalancbmk L1 I-TLB MPKI 0 1 2 3 4 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-TLB Parameters 64 KB, 4-way 4 KB page size
  103. Beyond Instruction Cache — TLB 30 libquantum lbm hmmer astar

    milc bzip2 mcf namd sjeng bwaves zeusmp cactusADM leslie3d GemsFDTD wrf gromacs sphinx3 gamess calculix h264ref gobmk soplex tonto perlbench dealII omnetpp povray gcc xalancbmk L1 I-TLB MPKI 0 1 2 3 4 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS 72 X Higher! I-TLB Parameters 64 KB, 4-way 4 KB page size
  104. 31

  105. 31

  106. 31

  107. 32 MICRO 2015 Microarchitectural Implications of Event-driven Server-side Web Applications

    Yuhao Zhu UT Austin with Daniel Richins, Matthew Halpern, Vijay Janapa Reddi