eece476: computer architecture lecture 20: branch prediction chapter 6.6 + extra the university of...
Post on 20-Dec-2015
215 views
TRANSCRIPT
EECE476: Computer Architecture
Lecture 20: Branch Prediction
Chapter 6.6 + extra
The University ofBritish Columbia EECE 476 © 2005 Guy Lemieux
2
Control Hazards Summary
• We reduced branch/jump penalty to 1 cycle• Still have 2 remaining problems
• Utilization problem– We may fetch the wrong instruction(s) after branch/jump
• Option 1: stall after every branch/jump• Option 2: nullify-if-branch-taken (small performance improvement)• Option 3: declare as a “delay slot”, always-execute (avoid)• Option 4: new strategies?
• Forwarding problem– We may depend on result of instruction(s) just before branch
• Option 1: stall when dependence detected (HDU)• Option 2: forward when dependence detected (FDU)
3
New Strategy: Nullify-if-Not-Taken
• Previously: Nullify-if-taken– Instruction after branch (PC+4) “sneaks” into pipeline– Nullify if branch is taken (T)
• Observation– Branch has 2 outcomes: taken, not-taken (T or NT)
• What about nullify-if-NT?– This is another valid strategy– We “sneak” instruction from PC+4+OFFSET
• Differences?– Can we predict which outcome is more likely? T or NT?– If so, we can “sneak” the right instruction into the pipeline– Reduces frequency of nullify operations
4
Nullify-if-T vs. Nullify-if-NT
• These are 2 forms of static branch prediction:– Nullify-if-T: always predict NT is likely– Nullify-if-NT: always predict T is likely
• Main Idea– Predict target where branch is going (T or NT)– Put useful (target) instructions in pipeline after branch– Nullify only if we predict wrong
• Performance impact?– More accurate predictions better performance
5
Implementing Static Branch Prediction
• Simplest static branch prediction
– Predict backward branches T, forward branches NT• Requires 0 instruction bits
• More sophisticated static branch prediction
– Define two instruction types: BEQ-likely, BEQ-unlikely
– For each individual branch, compiler decides if branch is likely or unlikely
• Requires 1 instruction bit in ISA to encode “likely-vs-unlikely” into each branch
6
Reducing Branch Pipeline Penalty
Static Methods Summary1. Always stall
– Works well, wastes CPU cycles.
2. Always execute (delayed branch)– Requires useful instruction to be scheduled by compiler
3. Nullify-if-taken (always predicts branch is NT)– Fetch from PC+4, PC+8, etc– Half of branch-forward instructions are NT– Some performance benefit
4. Nullify-if-not-taken (always predicts branch is T)– Fetch from PC+4+OFFSET, PC+8+OFFSET, etc– Almost all branch-backward instructions are T– Big performance benefit
7
Reducing Branch Pipeline Penalty
Dynamic Method5. Nullify-if-mispredicted
• Dynamically predict T or NT• To do this…
– Need branch prediction– Predict direction based upon recent history– Must fetch from predicted direction (target address)
• Note: no correctness problems arise if we mispredict (only performance)• Performance impact?
– Depends on “prediction accuracy”– Want >= 80% to be useful
Somehow, must implement in ISA– ISA may adopt one of more of above policies for branch instructions– ISA may also adopt multiple policies (eg, multiple versions of same branch
instruction)
8
Dynamic Branch Prediction
• Dynamic: predicted branch direction depends upon recent history
– No history? Must guess
– Execute same branch many times History
Need state information to retain history
9
Overview of Dynamic Branch Prediction Schemes
• Many Types of Dynamic Branch Predictors– Basic
• 1-bit predictor• 2-bit predictor (very good)
– Generalization• N-bit saturating counter (not very good)
– Hybrid/advanced (excellent)• Correlating predictors• Multilevel predictors
– Perfect (prescient) predictor• Non-causal, only works in simulation• Used to measure effectiveness of other prediction schemes
10
Dynamic 1-bit Branch Prediction
Basic scheme
• 1-bit predictor– Remembers most recent execution of branch
• Was it taken or not taken?
– Assume same outcome next time
– Where to store 1 bit?• In the instruction encoding?• 1 global bit (DFF) in the CPU?• Visit this again later…
11
Dynamic 1-bit Branch Prediction
1-bit Predictor ExampleA = 0 * initialize registersLoop:
A = A + 1 Loop: ADD $1,$1,$2If A != 10 goto Loop BNE $1,$3, Loop
• PredictionAccuracy?
• Last iteration NT, so next time, first iteration assumes NT• Result: 80% accuracy (20% mispredictions)
Prediction OutcomePrediction Correct?
Middle iterations
T T 8 correct
Last iteration
T NT 1 wrong
First iteration
NT T 1 wrong
12
Dynamic 2-bit Branch Prediction
Two basic schemes
• Simple: 2-bit “saturating counter” predictor– Remember two most recent outcomes?
• History (prev,curr)– (T,T) Predicts T– (NT,NT) Predicts NT– (T,NT) Predicts ?– (NT,T) Predicts ?
– Although a possibility, this scheme is not usually used
• Better: 2-bit “sequence” predictor– Mispredict twice before changing prediction
13
Dynamic 2-bit Sequence Prediction
• Saturating– Repeating T stays in ‘11’
state– Repeating NT stays in ‘00’
state
• Two-in-a-row to change prediction– (T,NT) won’t change
prediction– (NT,T) won’t change
prediction
T11
T10
NT01
NT00
T
NT
T
T
T
NT
NT
NT
14
Dynamic 2-bit Prediction Example
• 2-bit Predictor ExampleA = 0 * initialize registersLoop:
A = A + 1 Loop: ADD $1,$1,$2If A != 10 goto Loop: BNE $1,$3, Loop
• PredictionAccuracy?
• Last iteration is 1st mispredict, so next time, 1st iteration still predicts T• Result: 90% accuracy (10% mispredictions)
Prediction OutcomePrediction Correct?
Middle iterations
T T 8 correct
Last iteration
T NT1 wrong, but
next prediction still T
First iteration
T T 1 correct
15
Dynamic 2-bit Prediction Results
• Effectiveness?• Mispredictions in SPEC89 with 4096-entry branch
prediction table:– Nasa7: 1%– Matrix300: 0%– Tomcatv: 1%– Doduc: 5%– Spice: 9%– FPPPP: 9%– Gcc: 12%– Espresso: 5%– Eqntott: 18%– Li: 10%
• About 90% effective!
16
Dynamic 2-bit Prediction Results
• Mispredictions in SPEC89 with N-entry branch prediction table:
N=4096 N=Infinity– Nasa7: 1% 0%– Matrix300: 0% 0%– Tomcatv: 1% 0%– Doduc: 5% 5%– Spice: 9% 9%– FPPPP: 9% 9%– Gcc: 12% 11%– Espresso: 5% 5%– Eqntott: 18% 18%– Li: 10% 10%
• Still about 90% effective!
17
Dynamic N-bit Prediction Scheme
• We can try to generalize the 2-bit approach
• N-bit “saturating counter” predictor– Increment on taken branch– Decrement on untaken branch– Predictions
• Counter value >= (2^N)/2, predict T• Counter value < (2^N)/2, predict NT
• N-bit “sequence” predictor– X-mispredicts-in-a-row to change– How big is X (relative to N)? Possible?
• Effectiveness?– Not very… 2-bit predictors good enough!
18
N-bit “Saturating Counter” Predictor
T100
NT000
NT001
NT010
NT011
T111
T110
T101
NT
NT
T
T
T
NT
NT
NT
T
NT
NT
NT
T
19
Storing Branch History
• Where? In instruction memory?– Must write 1 or 2 bits into instruction, not good!
• Use special branch prediction table memory– Eg, 4096 entries of 2 bits each
• Not enough for one entry per branch instruction in your program– Or is it?
– Which entry goes with which branch?• Use lower bits of program counter (hash function)• Some branches will use the *same* table entry• Is this incorrect? No!
– Some branches will be predicted with less accuracy…ie, slower program execution
20
Advanced Branch Prediction 1
• Correlating Predictors– Create 8 branch prediction tables
• Each table may contain ~1024 entries, 2-bits of history each entry• Each table is “local history”
– 3 global bits in CPU form “global history”• Simple, small shift register• Stores outcome of 3 most recently executed branches (of all
branches)– Key idea
• “global history” determines which branch prediction table to use• “local history” works like “2-bit predictor”
– Called a (3,2) branch-prediction buffer• Regular 2-bit predictor is a (0,2) predictor
– Works better than (0,2) predictor
21
Advanced Branch Prediction 2
• Multilevel Branch Prediction– Eg, Tournament Predictors:
• Use 2 different branch predictors per entry• Choose the best between them
– How to decide which is best?• Use a third 2-bit predictor
– Like any 2-bit predictor, eg “sequence”– This one says “use predictor 1” or “use predictor 2”
• Change if current predictor is wrong (but other one was right) twice in a row
– Works better than Correlating Predictor
22
Predictors Summary
• Static– Stall– Always execute (delay slots)– Nullify-if-T (Execute-if-NT)– Nullify-if-NT (Execute-if-T)
• Dynamic– Nullify-if-mispredicted
• 2-bit, N-bit “saturating counter” predictor• 2-bit “sequence” predictor (N-bit possible?)• Correlating predictor
– Concept of global / local history• Multilevel predictor
– Eg, Tournament predictor