# Computer Science 146 Computer Architecture

Fall 2019 Harvard University

Instructor: Prof. David Brooks dbrooks@eecs.harvard.edu

Lecture 21: Multithreading and I/O















| X86 Core sizes are ~10x larger |                     |                                |                                     |                    |                        |
|--------------------------------|---------------------|--------------------------------|-------------------------------------|--------------------|------------------------|
| Processors                     | Total Cache<br>(KB) | Die Size<br>(mm <sup>2</sup> ) | Est Core<br>Size (mm <sup>2</sup> ) | Core Size<br>Ratio | Typical<br>Speed (MHz) |
| Intel ULV PIII-M               | 544                 | 80                             | ~34                                 | ~13                | >1000                  |
| AMD Duron                      | 192                 | 55                             | ~37                                 | ~14                | >1000                  |
| Transmeta 5800                 | 640                 | 55                             | ~25                                 | ~10                | >800                   |
| VIA C3                         | 192                 | 52                             | ~31                                 | ~12                | >800                   |
| ARM 1026EJ-S                   | 32                  | 4.6                            | 2.6                                 | 1                  | >400                   |







## Multi-fetch Using 3 Identical Fetch Units

- Each fetch unit
  - Operates independently
  - Holds four 8-byte blocks
  - Prefetches up to 3 blocks from sequential path
  - Prefetches 2 blocks from target path as condition is evaluated
- Cache-location aware logic
  - Determines cache location of the next sequential 64B line
  - Remembers cache location of two previous 64B lines









#### Motivation: Who Cares About I/O?

- CPU Performance: 57% per year
- I/O system performance limited by *mechanical* delays (disk I/O):
   < 10% increase per year (IO per sec)</li>
- Amdahl's Law: system speed-up limited by the slowest part!
  - -10% IO & 10x CPU => 5x Performance (lose 50%)
  - 10% IO & 100x CPU => 10x Performance (lose 90%)
  - Need fast disk accesses (VM swaps, file reading, networks, etc)
- I/O bottleneck:
  - Increasing fraction of time in I/O (relative to CPU)
  - Similar to Memory Wall problem
- Why not context switch on I/O operation?
  - Must find threads to context switch to
  - Context-switching requires more memory















#### Data Rate: Inner vs. Outer Tracks

- To keep things simple, orginally kept same number of sectors per track
  - Since outer track longer, lower bits per inch
- Competition ⇒ decided to keep BPI the same for all tracks ("constant bit density")
  - $\Rightarrow$  More capacity per disk
  - $\Rightarrow$  More of sectors per track towards edge
  - ⇒ Since disk spins at constant speed, outer tracks have faster data rate
- Bandwidth outer track 1.7X inner track!
  - Inner track highest density, outer track lowest, so not really constant
  - 2.1X length of track outer / inner, 1.7X bits outer / inner













- 1956 IBM Ramac early 1970s Winchester
  - Developed for mainframe computers, proprietary interfaces
    Steady shrink in form factor: 27 in. to 14 in
- Form factor and capacity drives market, more than performance
- 1970s: Mainframes  $\Rightarrow$  14 inch diameter disks
- 1980s: Minicomputers, Servers  $\Rightarrow$  8", 5 1/4" diameter
- PCs, workstations Late 1980s/Early 1990s:
  - Mass market disk drives become a reality
  - Pizzabox PCs  $\Rightarrow$  3.5 inch diameter disks
  - Laptops, notebooks  $\Rightarrow$  2.5 inch disks
- 2000s:
  - 1 inch for cameras, cell phones?







### What about FLASH

- Compact Flash Cards
  - Intel Strata Flash (16 Mb in 1 square cm.)
  - 100,000 write/erase cycles.
  - Standby current = 100uA, write = 45mA
  - Transfer @ 3.5MB/s, read access times in 65-150ns range
  - Compact Flash (2002) 256MB=\$73 512MB=\$170, 1GB=\$560
  - Compact Flash (2004) 256MB=\$39 512MB=\$80 1GB=\$146 2GB=\$315 4GB=\$800
- IBM/Hitachi Microdrive 4GB=\$370
  - Standby current = 20mA, write = 250mA
  - Efficiency advertised in watts/MB
- · Flash vs. Disks
  - Nearly instant standby wake-up time
  - Random access to data stored
  - Tolerant to shock and vibration (1000G of operating shock)



#### Next two lectures

- Monday:
  - Finish up with I/O Monday
    - I/O Buses
    - RAID Systems
  - Course Evaluations (need a volunteer to return them)
- Next Wednesday:
  - Google Cluster
  - Course Summary and Wrapup
  - Final Review (may schedule another review before final)
- Final Exam: Tue 05/25 (Boylston 105)