## Computer Science 146 Computer Architecture

Fall 2019 Harvard University

Instructor: Prof. David Brooks dbrooks@eecs.harvard.edu

Lecture 18: Virtual Memory



# Simple Interleaving

| Cycle | Addr | Bank0 | Bank1 | Bank2 | Bank3 | steady |
|-------|------|-------|-------|-------|-------|--------|
| 1     | 12   | А     | A     | Α     | A     |        |
| 2     |      | А     | А     | А     | А     |        |
| 3     |      | T/B   | В     | В     | В     | *      |
| 4     |      | В     | T/B   | В     | В     | *      |
| 5     |      |       |       | Т     |       | *      |
| 6     |      |       |       |       | Т     | *      |

- 4-word access = 6-cycles
- 4-word cycle = 4-cycles
  - Can start a new access in cycle 5
  - Overlap access with transfer (and still use a 32-bit bus!)















# Virtual Memory: Cache Analogy

| Parameter          | First-Level Cache               | Virtual Memory               |  |
|--------------------|---------------------------------|------------------------------|--|
| Block (page) Size  | 16-128 Bytes                    | 4KB – 64KB                   |  |
| Hit Time           | 1-3 clock cycles                | 50-150 clock cycles          |  |
| Miss Penalty       | 8-150 clock cycles              | 1M-10M clock cycles          |  |
| (access time)      | (6-130 clock cycles)            | (.8M – 8M clock cycles)      |  |
| (transfer time)    | (2-20 clock cycles)             | (.2M – 2M clock cycles)      |  |
| Miss Rate          | 0.1-10%                         | 0.00001 -0.001%              |  |
| Address Mapping    | 25-45bit PA to 14-20bit CacheAd | 32-64 bit VA to 25-45 bit PA |  |
| Replacement Policy | Hardware Replacement            | Software Replacement         |  |
| Total Size         | Independent of Address Space    | Processor Address Space      |  |
| Backing Store      | Level 2 Cache                   | Physical Disk                |  |

Computer Science 146 David Brooks

# <section-header><list-item><list-item><list-item><list-item><list-item><list-item><list-item><list-item>

David Brooks



- Same four questions as caches
  - Page Placement: fully associative
    - Why?
  - Page Identification: address translation
    - Indirection through one or two page tables
  - Page Replacement: Sophisticated LRU + Working set
    - Why?
  - Write Strategy: Always write-back + write allocate
    - Why?



























## Selecting Page Size

- Larger Page Size
  - Page table is smaller (inversely proportional to page size)
  - Larger page size may allow larger caches with virtually indexed, physically tagged caches (larger page offset)
  - Page transfers can be more efficient
  - More efficient TLB => reduces number of TLB misses
- Smaller Page Size
  - Internal fragmentation: contiguous region of virtual memory not a multiple of the page size
  - Process startup time (load in large pages for small processes)
- Multiple Page Sizes
  - Some processors support multiple choices => larger pages are powers of 2 times the smaller page sizes







### Memory Summary

- Main Memory
  - DRAM is slow but dense
  - Interleaving/banking for high bandwidth
- Virtual Memory, Address Translation, Protection
  - Larger memory, protection, relocation, multiprogramming
  - Page tables
  - TLB: cache translations for speed
    - Access in parallel with cache tags