# Computer Science 146 Computer Architecture

Fall 2019 Harvard University

Instructor: Prof. David Brooks dbrooks@eecs.harvard.edu

Lecture 15: More on Caches























## Another Example

- 32-bit machine
- 64KB, 32B Block, 2-Way Set Associative
- Compute Total Size of Tag Array
  - 64KB/ 32B blocks => 2K Blocks
  - 2K Blocks / 2-way set-associative => 1K Sets
  - 32B Blocks => 5 Offset Bits
  - -1K Sets => 10 index bits
  - 32-bit address 5 offset bits 10 index bits = 17 tag bits
  - 17 tag bits \* 2K Blocks => 34Kb => 4.25KB



# More Detailed Questions

- Block placement policy?
  - Where does a block go when it is fetched?
- Block identification policy?
  - How do we find a block in the cache?
- Block replacement policy?
  - When fetching a block into a full cache, how do we decide what other block gets kicked out?
- Write strategy?
  - Does any of this differ for reads vs. writes?







# Write Hit Policies

- Q1: When to propagate new values to memory?
- Write back Information is only written to the cache.
  - Next lower level only updated when it is evicted (dirty bits say when data has been modified)
  - Can write at speed of cache
  - Caches become temporarily inconsistent with lower-levels of hierarchy.
  - Uses less memory bandwidth/power (multiple consecutive writes may require only 1 final write)
  - Multiple writes within a block can be merged into one write
  - Evictions are longer latency now (must write back)

Computer Science 146 David Brooks

# <section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>









# Write Buffer Flush Policies

- When to flush?
  - Aggressive flushing => Reduce chance of stall cycles due to full write buffer
  - Conservative flushing => Write Merging more likely (entries stay around longer) => reduces memory traffic
  - On-chip L2's => More aggressive flushing
- What to flush?
  - Selective flushing of particular entries?
  - Flush everything below a particular entry
  - Flush everything



## Write misses?

- Write Allocate
  - Block is allocated on a write miss
  - Standard write hit actions follow the block allocation
  - Write misses = Read Misses
  - Goes well with write-back
- No-write Allocate
  - Write misses do not allocate a block
  - Only update lower-level memory
  - Blocks only allocate on Read misses!
  - Goes well with write-through

| Write Policy            | Hit/Miss | Writes to |
|-------------------------|----------|-----------|
| WriteBack/Allocate      | Both     | L1 Cache  |
| WriteBack/NoAllocate    | Hit      | L1 Cache  |
| WriteBack/NoAllocate    | Miss     | L2 Cache  |
| WriteThrough/Allocate   | Both     | Both      |
| WriteThrough/NoAllocate | Hit      | Both      |
| WriteThrough/NoAllocate | Miss     | L2 Cache  |

## Cache Performance

CPU time = (CPU execution cycles + Memory Stall Cycles)\*Clock Cycle Time

AMAT = Hit Time + Miss Rate \* Miss Penalty

- Reducing these three parameters can have a big impact on performance
- Out-of-order processors can hide some of the miss penalty



# Reducing Miss Penalty: Victim Caches

- Direct mapped caches => many conflict misses
- Solution 1: More associativity (expensive)
- Solution 2: Victim Cache
- Victim Cache
  - Small (4 to 8-entry), fully-associative cache between L1 cache and refill path
  - Holds blocks discarded from cache because of evictions
  - Checked on a miss before going to L2 cache
  - Hit in victim cache => swap victim block with cache block













