Microarchitectural Implications of Event-driven Server-side Web Applications

3c332dfc0b438785cb10c5234652dd66?s=47 Yuhao Zhu
December 09, 2015

Microarchitectural Implications of Event-driven Server-side Web Applications

MICRO 2015

3c332dfc0b438785cb10c5234652dd66?s=128

Yuhao Zhu

December 09, 2015
Tweet

Transcript

  1. 1 MICRO 2015 Microarchitectural Implications of Event-driven Server-side Web Applications

    Yuhao Zhu UT Austin with Daniel Richins, Matthew Halpern, Vijay Janapa Reddi
  2. 2 Instruction Supply is a Critical Aspect of Microarchitecture Design

  3. Exploit Instruction Locality a.k.a., Common Case Design 3 Instruction Supply

    Cache Branch Predictor TLB
  4. Exploit Instruction Locality a.k.a., Common Case Design 3 Instruction Supply

    Cache Branch Predictor TLB Hot instructions
  5. Exploit Instruction Locality a.k.a., Common Case Design 3 Instruction Supply

    Cache Branch Predictor TLB Hot instructions Hot branch history patterns
  6. Exploit Instruction Locality a.k.a., Common Case Design 3 Instruction Supply

    Cache Branch Predictor TLB Hot instructions Hot branch history patterns Hot pages
  7. Exploit Instruction Locality a.k.a., Common Case Design 3 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Hot instructions Hot branch history patterns Hot pages
  8. Exploit Instruction Locality a.k.a., Common Case Design 3 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Event-driven Applications Hot instructions Hot branch history patterns Hot pages
  9. Locality Lost 4 SPEC CPU (mostly) Event-driven Applications Hot instructions

    Hot branch history pattern Hot pages Instruction Supply Cache Branch Predictor TLB
  10. Locality Lost 5 SPEC CPU (mostly) Event-driven Applications Tight loops

    Hot pages Little branch aliasing Instruction Supply Cache Branch Predictor TLB
  11. Locality Lost 5 SPEC CPU (mostly) Event-driven Applications Tight loops

    Hot pages Little branch aliasing Instruction Supply Cache Branch Predictor TLB
  12. Event-driven Execution Model 6 Event Queue Head Tail

  13. Event-driven Execution Model 6 Event Queue Client Request Head Tail

  14. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop Client

    Request Head Tail
  15. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop Client

    Request Head Tail
  16. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  17. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  18. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  19. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail
  20. Event-driven Execution Model 6 Event Queue Single-threaded Event Loop DB

    Access File I/O Network Client Request Head Tail Our Focus
  21. Applications 7 Application Domain Etherpad Lite Document Collaboration Let’s Chat

    Messaging Lighter Content Management Mud Gaming Todo Task Management Word Finder API Services https://github.com/nodebenchmark/
  22. Locality Lost 8

  23. Locality Lost 8 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way
  24. Locality Lost 9 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way
  25. Locality Lost 9 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way SPEC CPU 2006 Average
  26. Locality Lost 10 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way SPEC CPU 2006 Average
  27. Locality Lost 10 lbm leslie3d libquantum astar hmmer mcf bzip2

    namd gromacs zeusmp calculix GemsFDTD soplex sphinx3 bwaves milc dealII wrf h264ref gamess tonto xalancbmk povray perlbench sjeng gcc gobmk omnetpp cactusADM L1 I-Cache MPKI 0 30 60 90 120 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-Cache Parameters 32 KB, 64 B cache line, 8-way SPEC CPU 2006 Average Node.js has 4.2 X higher MPKI than SPEC CPU.
  28. Root Cause Analysis 11

  29. Root Cause Analysis 11 High I-$ Miss Ratio

  30. Root Cause Analysis 11 High I-$ Miss Ratio Large Instruction

    Reuse-distance
  31. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) High I-$ Miss Ratio Large Instruction Reuse-distance
  32. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) omnetpp (worst) lbm (best) High I-$ Miss Ratio Large Instruction Reuse-distance
  33. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) omnetpp (worst) lbm (best) Let’s Chat High I-$ Miss Ratio Large Instruction Reuse-distance
  34. Root Cause Analysis 11 100 80 60 40 20 0

    Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) 100 80 60 40 20 0 Dynamic Instructions (%) 20 24 28 212 216 Reuse Distance (log) High I-$ Miss Ratio Large Instruction Reuse-distance
  35. Root Cause Analysis 12 High I-$ Miss Ratio Large Instruction

    Reuse-distance
  36. Root Cause Analysis 12 … … Instruction
 Stream High I-$

    Miss Ratio Large Instruction Reuse-distance
  37. Root Cause Analysis 12 … … Instruction
 Stream High I-$

    Miss Ratio Large Instruction Reuse-distance
  38. Event 2 Event 1 Event 3 Root Cause Analysis 12

    … … Instruction
 Stream High I-$ Miss Ratio Inter-event Code Reuse Large Instruction Reuse-distance
  39. Event 2 Event 1 Event 3 Root Cause Analysis 12

    … … Instruction
 Stream High I-$ Miss Ratio Inter-event Code Reuse Large Instruction Reuse-distance Large Event Footprint
  40. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  41. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  42. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  43. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  44. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses
  45. Root Cause Analysis 13 100 80 60 40 20 0

    Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Intra-event Inter-event High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses 100 80 60 40 20 0 Static Instructions (%) 32 64 96 128 160 192 224 256 > 256 # of reuses Most instruction reuses are inter-event.
  46. Root Cause Analysis 14 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Large Instruction Reuse-distance
  47. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  48. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) L1-I$ Size 32KB High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  49. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) 13 215 217 219 221 Event Footprint (Bytes) L1 Cache Size 32 KB Word Finder Todo Mud Etherpad Let's Chat Lighter High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  50. Root Cause Analysis 14 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) 13 215 217 219 221 Event Footprint (Bytes) L1 Cache Size 32 KB Word Finder Todo Mud Etherpad Let's Chat Lighter Only 12% Events Fit in a 32 KB I-$ High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  51. Root Cause Analysis 15 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  52. Root Cause Analysis 15 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  53. Root Cause Analysis 15 213 215 217 219 221 Event

    Footprint (Bytes) 100 80 60 40 20 0 Events (%) 213 215 217 219 221 Event Footprint (Bytes) 100 80 60 40 20 0 Events (%) Most events’ footprints do not fit in a typical I-$. High I-$ Miss Ratio Inter-event Code Reuse Large Event Footprint Large Instruction Reuse-distance
  54. Root Cause Analysis 16 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Few Event Types Large Instruction Reuse-distance
  55. Root Cause Analysis 16 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance
  56. Root Cause Analysis 16 High I-$ Miss Ratio Inter-event Code

    Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  57. Microarchitecture Behaviors Root Cause Analysis 16 High I-$ Miss Ratio

    Inter-event Code Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  58. Application Characteristics Microarchitecture Behaviors Root Cause Analysis 16 High I-$

    Miss Ratio Inter-event Code Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  59. Application Characteristics Microarchitecture Behaviors Root Cause Analysis 16 High I-$

    Miss Ratio Inter-event Code Reuse Large Event Footprint Few Event Types Minimal Tight Loops Large Instruction Reuse-distance Event-driven Applications
  60. 17 Can we better capture instruction locality to improve instruction

    supply efficiency?
  61. Scale Up Hardware Resources? 18 2.0 1.8 1.6 1.4 1.2

    1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  62. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  63. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) SPEC CPU 2006 Avarage @ 32KB 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  64. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  65. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  66. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter
  67. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter ~256 KB Needed!
  68. Scale Up Hardware Resources? 18 120 90 60 30 0

    I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 120 90 60 30 0 I-Cache MPKI 16 64 256 1024 I-Cache size (KB) 2.0 1.8 1.6 1.4 1.2 1.0 Norm. Time 3.2 2.4 1.6 Let's Chat Word Finder Todo Mud Etherpad Lighter ~256 KB Needed!
  69. Exploit Inter-Event Locality 19

  70. Exploit Inter-Event Locality 19 … … Event 1 Instruction
 Stream

    Event 2 Event 3
  71. Exploit Inter-Event Locality 19 1. Retain the reused portion of

    an event’s footprint in the cache … … Event 1 Instruction
 Stream Event 2 Event 3
  72. Exploit Inter-Event Locality 19 1. Retain the reused portion of

    an event’s footprint in the cache … … Event 1 Instruction
 Stream Event 2 Event 3
  73. Exploit Inter-Event Locality 19 1. Retain the reused portion of

    an event’s footprint in the cache 2. Prefetch the unretained part … … Event 1 Instruction
 Stream Event 2 Event 3
  74. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache 20 PRINCIPLES Exploit Inter-Event Locality
  75. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache 20 PRINCIPLES PRACTICES Exploit Inter-Event Locality
  76. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache 20 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache Exploit Inter-Event Locality
  77. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache Exploit Inter-Event Locality
  78. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU Exploit Inter-Event Locality
  79. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j Exploit Inter-Event Locality
  80. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j Exploit Inter-Event Locality
  81. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k Exploit Inter-Event Locality
  82. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k Exploit Inter-Event Locality
  83. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k … … … … … … … k j Exploit Inter-Event Locality
  84. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k … … … … … … … k j a Exploit Inter-Event Locality
  85. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k … … … … … … … k j a MISS Exploit Inter-Event Locality
  86. 2. Prefetch the unretained part 1. Retain the reused portion

    of an event’s footprint in the cache ▸ LRU Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 20 a b c d e f g h i PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache MRU LRU j k Inter-event Locality Lost! … … … … … … … k j a MISS Exploit Inter-Event Locality
  87. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU
  88. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j
  89. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j
  90. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU k j
  91. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k
  92. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k a
  93. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k a HIT
  94. Exploit Inter-Event Locality 2. Prefetch the unretained part ▸ LRU

    Cache Insertion Policy (LIP) (Qureshi et al., [ISCA’07]) ▹ Insert incoming line into LRU position, not MRU position 21 PRINCIPLES PRACTICES 1. Retain the reused portion of an event’s footprint in the cache a b c d e f g h i MRU LRU j k Reused Portion Retained! a HIT
  95. 22 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter
  96. 22 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter SPEC CPU 2006 Average @ 32KB
  97. 23 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline
  98. 24 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP
  99. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache
  100. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences
  101. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … …
  102. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z
  103. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z
  104. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x
  105. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x
  106. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z
  107. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z y
  108. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z y
  109. Exploit Inter-Event Locality 2. Prefetch the unretained part 25 PRINCIPLES

    PRACTICES 1. Retain the reused portion of an event’s footprint in the cache ▸ Temporal Instruction Fetch Streaming (TIFS) (Ferdman et al., [MICRO’08]) ▹ Find patterns in miss sequences … … x y z x y z y z …
  110. 26 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP
  111. 27 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP LIP+TIFS
  112. 27 I-cache MPKI 0 25 50 75 100 W ord

    Finder Todo M ud Etherpad Let's C hat Lighter Baseline LIP LIP+TIFS 88% Average MPKI Reduction
  113. Exploit Instruction Locality a.k.a., Common Case Design 28 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Event-driven Applications Hot instructions Hot branch history patterns Hot pages Cache
  114. Exploit Instruction Locality a.k.a., Common Case Design 28 SPEC CPU

    (mostly) Instruction Supply Cache Branch Predictor TLB Event-driven Applications Hot instructions Hot branch history patterns Hot pages Cache Branch Predictor TLB
  115. Beyond Instruction Cache — Branch Predictor 29 milc GemsFDTD lbm

    libquantum calculix zeusmp wrf perlbench leslie3d cactusADM hmmer xalancbmk tonto gamess dealII sphinx3 h264ref omnetpp soplex bwaves namd povray mcf gromacs gcc bzip2 sjeng astar gobmk Misprediction (%) 0 5 10 15 20 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS Tournament Predictor 12-bit history register 256 local branch histories
  116. Beyond Instruction Cache — Branch Predictor 29 milc GemsFDTD lbm

    libquantum calculix zeusmp wrf perlbench leslie3d cactusADM hmmer xalancbmk tonto gamess dealII sphinx3 h264ref omnetpp soplex bwaves namd povray mcf gromacs gcc bzip2 sjeng astar gobmk Misprediction (%) 0 5 10 15 20 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS Tournament Predictor 12-bit history register 256 local branch histories 2.4 X Higher!
  117. Beyond Instruction Cache — TLB 30 libquantum lbm hmmer astar

    milc bzip2 mcf namd sjeng bwaves zeusmp cactusADM leslie3d GemsFDTD wrf gromacs sphinx3 gamess calculix h264ref gobmk soplex tonto perlbench dealII omnetpp povray gcc xalancbmk L1 I-TLB MPKI 0 1 2 3 4 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS I-TLB Parameters 64 KB, 4-way 4 KB page size
  118. Beyond Instruction Cache — TLB 30 libquantum lbm hmmer astar

    milc bzip2 mcf namd sjeng bwaves zeusmp cactusADM leslie3d GemsFDTD wrf gromacs sphinx3 gamess calculix h264ref gobmk soplex tonto perlbench dealII omnetpp povray gcc xalancbmk L1 I-TLB MPKI 0 1 2 3 4 Word Finder Todo Mud Etherpad Let's Chat Lighter SPEC CPU 2006 NodeJS 72 X Higher! I-TLB Parameters 64 KB, 4-way 4 KB page size
  119. 31

  120. 31

  121. 31

  122. 31 Event-based processing is a fundamental computation pattern.

  123. 31 Event-based processing is a fundamental computation pattern. Web Mobile

    Internet-of-Things Sensor networks Cloud
  124. 32 MICRO 2015 Microarchitectural Implications of Event-driven Server-side Web Applications

    Yuhao Zhu UT Austin with Daniel Richins, Matthew Halpern, Vijay Janapa Reddi