# Computer Science 146 Computer Architecture

Fall 2019 Harvard University

Instructor: Prof. David Brooks dbrooks@eecs.harvard.edu

Lecture 7: Dynamic Branch Prediction



#### **Tomasulo Review**

- Reservation Stations
  - Distribute RAW hazard detection
  - Renaming eliminates WAW hazards
  - Buffering values in Reservation Stations removes WARs
  - Tag match in CDB requires many associative compares
- Common Data Bus
  - Achilles heal of Tomasulo
  - Multiple writebacks (multiple CDBs) expensive
- Load/Store reordering
  - Load address compared with store address in store buffer



| Tomasulo Review                      |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
|--------------------------------------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
|                                      | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13  | 14  | 15  | 16  | 17  | 18  | 19  | 20  |
| LD F0, 0(R1)                         | Iss | M1  | M2  | M3  | M4  | M5  | M6  | M7  | M8  | Wb  |     |     |     |     |     |     |     |     |     |     |
| MUL F4, F0, F2                       |     | Iss | Ex  | Ex  | Ex  | Ex  | Wb  |     |     |     |     |     |
| SD 0(R1), F0                         |     |     | Iss | M1  | M2  | M3  | Wb  |     |
| SUBI R1, R1, 8                       |     |     |     | Iss | Ex  | Wb  |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
| BNEZ R1, Loop                        |     |     |     |     | Iss | Ex  | Wb  |     |     |     |     |     |     |     |     |     |     |     |     |     |
| LD F0, 0(R1)                         |     |     |     |     |     | Iss | Iss | Iss | Iss | М   | Wb  |     |     |     |     |     |     |     |     |     |
| MUL F4, F0, F2                       |     |     |     |     |     |     | Iss | Iss | Iss | Iss | Iss | Ex  | Ex  | Ex  | Ex  | Wb  |     |     |     |     |
| SD 0(R1), F0                         |     |     |     |     |     |     |     | Iss | M1  | M2  | M3  | Wb  |
| SUBI R1, R1, 8                       |     |     |     |     |     |     |     |     | Iss | Ex  | Wb  |     |     |     |     |     |     |     |     |     |
| BNEZ R1, Loop                        |     |     |     |     |     |     |     |     |     | Iss | Ex  | Wb  |     |     |     |     |     |     |     |     |
| LD F0, 0(R1)                         |     |     |     |     |     |     |     |     |     |     | Iss | Ml  | M2  | M3  | M4  | M5  | M6  | M7  | M8  | Wb  |
| MUL F4, F0, F2                       |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     | Iss | Iss | Iss | Iss | Iss |
| SD 0(R1), F0                         |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     | Iss | Iss | Iss | Iss |
| Computer Science 146<br>David Brooks |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |















# Why predict? Speculative Execution

- Execute *beyond* branch boundaries before the branch is resolved
- Correct Speculation
  - Avoid stall, result is computed early, performance++
- Incorrect Speculation
  - Abort/squash incorrect instructions, complexity+
  - Undo any incorrect state changes, complexity++
- Performance gain is weighed vs. penalty
- Speculation accuracy = branch prediction accuracy

Computer Science 146 David Brooks

### Dynamic Hardware Branch Prediction

- Branch behavior is monitored during program execution
  - History data can influence prediction of future executions of the branch instruction
- Branches instruction execution has two tasks/predictions
  - Condition evaluation (taken or not-taken)
  - Target address calculation (where to go when taken)
- Target prediction also applies to unconditional branches
- Branch Direction Prediction: 3 levels of complexity
  - Branch history tables, Two-level tables, hybrid predictors





| Example: Two-bit Vs. 1-bit |
|----------------------------|
| <b>Branch Prediction</b>   |

| Branch Outcome     | Т | Т | Т | N | Т | Т | Т | N | Т | Т | Т | N | % predict rate |
|--------------------|---|---|---|---|---|---|---|---|---|---|---|---|----------------|
| 1-bit Prediction   | N | Т | Т | Т | N | Т | Т | Т | N | Т | Т | Т |                |
| 1-bit Mis-Predict? | Y |   |   | Y | Y |   |   | Y | Y |   |   | Y | ~50%           |
| 2-bit Prediction   | n | Т | Т | t | Т | Т | Т | t | Т | Т | Т | t |                |
| 2-bit Mis-Predict? | Y |   |   | Y |   |   |   | Y |   |   |   | Y | ~75%           |

• 2-bit "hysterisis" helps



# **Correlating Predictors**

- 2-bit scheme only looks at branch's *own* history to predict its behavior
- What if we use other branches to predict it as well?

```
if (aa==2)aa=0; // Branch #1
if (bb==2)bb=0; // Branch #2
if (aa!=bb) {..} // Branch #3
```

- Clearly branch #3 depends on outcome of #1 and #2
- Prediction must be a function of own branch as well as recent outcomes of other branches



























# Return Address Stack

- Included in many recent processors
   Alpha 21264 => 12 entry RAS
- Procedure returns account for ~85% of indirect jumps
- Like a hardware stack, LIFO
  - Procedure Call => Push Return PC onto stack
  - Procedure Return => Prediction off of top of stack, Pop it
- RAS tends to work quite well since call depths are typically not large
- Problem: Speculative state! More next time

