Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Basic Cache Optimizations

David Liman
February 16, 2012

Basic Cache Optimizations

Advanced computer architecture

David Liman

February 16, 2012
Tweet

More Decks by David Liman

Other Decks in Research

Transcript

  1. Cache Performance Metrics Hit Time Time to deliver a line

    in the cache to the processors 1 - 2 clock cycle (L1), 5-20 clock cycle for (L2) Miss Penalty 50 - 200 clock cyles from the main memory Miss Rate Fraction of memory references not found in the cache (misses / accesses) 3-10 % (L1), less than 1%(L2)
  2. CPU Execution Time = IC * ( CPI execution +

    Memory Accesses / Instruction * Miss Rate * Miss Penalty) * Clock cycle time Average Memory Accesses = Hit time * miss rate * miss penalty
  3. Classifying cache misses Compulsory misses First access to a block

    not in cache Cold start. First reference misses (misses in even an infinite cache) Capacity misses if the cache can not contain all needed blocks, due to discarded (misses in fully associative cache) Conflict misses Set associative or direct mapped: too many blocks in a set Collision (misses in n-way associative cache)
  4. Compulsory misses independent of cache size Capacity misses decreases if

    capacity is increased if upper level memory too small, too much time spend moving data between two levels. Trashing Conflict misses Set associative or direct mapped: too many blocks in a set Collision (misses in n-way associative cache)
  5. Cache optimizations 1. Larger block size to reduce miss rate

    2. Bigger cache to reduce miss rate 3. Higher associativity to reduce miss rate 4. Multilevel caches to reduce miss penalty 5. Giving priority to read misses over writes to reduce miss penalty
  6. 1. Larger block size to reduce miss rate Take advantage

    of spatial locality reduces compulsory misses Increases miss penalty (reduced number of blocks in the cache) increases conflict misses increases capacity misses if cache is small Miss penalty outweighs decreases in miss rate
  7. 2. Larger cache to reduce miss rate Increase the capacity

    of the cache longer hit time higher cost and power
  8. 3. Higher associativity to reduce miss rate Increasing block size,

    reduces miss rate Increasing miss penalty, clock cycle increases Increasing hit time (larger MUX) Increasing hardware cost (comparator) Facts: Direct mapped cache with size of N has the same miss rate as two way set associative with N/2
  9. 4. Multilevel caches to reduce miss penalty Improvement in miss

    penalty Improvement in miss rate More complicated measurement local miss rate = miss in a cache / memory accesses in this cache global miss rate = misses in the cache / total number of memory accesses by the processors Global cache miss rate should be used for measuring second level cache Speed of the first level => clock rate of the processor Speed of the second level => miss penalty of the first level First level cache design => fast hit, few misses Second level cache design => fewer hits, fewer misses, higher associativity, larger blocks
  10. Average memory access time = Hit time L1 + Miss

    rate L1 × Miss penalty L1 becomes Average memory access time = Hit timeL1 + Miss rateL1 × (Hit time L2 + Miss rate L2 × Miss penalty L2)
  11. 5. Giving priority to read misses over write Write back

    Before : write dirty block to memory, and then do the read After : copy the dirty block to the write buffer, then do the read, then do the write For the read miss, wait until the write buffer is empty. Write will block the CPU (cache access time limits the clock cycle)
  12. Assume direct mapped, write through cache that maps 512 and

    1024 to the same block, and a four word write buffer that is not checked on read miss. Will R2 always be equal to R3?