Basic Cache Optimizations

Basic Cache Optimizations David Liman

Cache Performance Metrics Hit Time Time to deliver a line
in the cache to the processors 1 - 2 clock cycle (L1), 5-20 clock cycle for (L2) Miss Penalty 50 - 200 clock cyles from the main memory Miss Rate Fraction of memory references not found in the cache (misses / accesses) 3-10 % (L1), less than 1%(L2)

Time vs space tradeoff.....and cost How do we optimize the
cache?

CPU Execution Time = IC * ( CPI execution +
Memory Accesses / Instruction * Miss Rate * Miss Penalty) * Clock cycle time Average Memory Accesses = Hit time * miss rate * miss penalty

Classifying cache misses Compulsory misses First access to a block
not in cache Cold start. First reference misses (misses in even an infinite cache) Capacity misses if the cache can not contain all needed blocks, due to discarded (misses in fully associative cache) Conflict misses Set associative or direct mapped: too many blocks in a set Collision (misses in n-way associative cache)

Compulsory misses independent of cache size Capacity misses decreases if
capacity is increased if upper level memory too small, too much time spend moving data between two levels. Trashing Conflict misses Set associative or direct mapped: too many blocks in a set Collision (misses in n-way associative cache)

Cache optimizations 1. Larger block size to reduce miss rate
2. Bigger cache to reduce miss rate 3. Higher associativity to reduce miss rate 4. Multilevel caches to reduce miss penalty 5. Giving priority to read misses over writes to reduce miss penalty

1. Larger block size to reduce miss rate Take advantage
of spatial locality reduces compulsory misses Increases miss penalty (reduced number of blocks in the cache) increases conflict misses increases capacity misses if cache is small Miss penalty outweighs decreases in miss rate

2. Larger cache to reduce miss rate Increase the capacity
of the cache longer hit time higher cost and power

3. Higher associativity to reduce miss rate Increasing block size,
reduces miss rate Increasing miss penalty, clock cycle increases Increasing hit time (larger MUX) Increasing hardware cost (comparator) Facts: Direct mapped cache with size of N has the same miss rate as two way set associative with N/2

4. Multilevel caches to reduce miss penalty Improvement in miss
penalty Improvement in miss rate More complicated measurement local miss rate = miss in a cache / memory accesses in this cache global miss rate = misses in the cache / total number of memory accesses by the processors Global cache miss rate should be used for measuring second level cache Speed of the first level => clock rate of the processor Speed of the second level => miss penalty of the first level First level cache design => fast hit, few misses Second level cache design => fewer hits, fewer misses, higher associativity, larger blocks

Average memory access time = Hit time L1 + Miss
rate L1 × Miss penalty L1 becomes Average memory access time = Hit timeL1 + Miss rateL1 × (Hit time L2 + Miss rate L2 × Miss penalty L2)

5. Giving priority to read misses over write Write back
Before : write dirty block to memory, and then do the read After : copy the dirty block to the write buffer, then do the read, then do the write For the read miss, wait until the write buffer is empty. Write will block the CPU (cache access time limits the clock cycle)

Assume direct mapped, write through cache that maps 512 and
1024 to the same block, and a four word write buffer that is not checked on read miss. Will R2 always be equal to R3?

Basic Cache Optimizations

Basic Cache Optimizations

David Liman

More Decks by David Liman

Other Decks in Research

Featured

Transcript

Basic Cache Optimizations David Liman

Cache Performance Metrics Hit Time Time to deliver a line

Time vs space tradeoff.....and cost How do we optimize the

CPU Execution Time = IC * ( CPI execution +

Classifying cache misses Compulsory misses First access to a block

Compulsory misses independent of cache size Capacity misses decreases if

Cache optimizations 1. Larger block size to reduce miss rate

1. Larger block size to reduce miss rate Take advantage

2. Larger cache to reduce miss rate Increase the capacity

3. Higher associativity to reduce miss rate Increasing block size,

4. Multilevel caches to reduce miss penalty Improvement in miss

Average memory access time = Hit time L1 + Miss

5. Giving priority to read misses over write Write back

Assume direct mapped, write through cache that maps 512 and