branch prediction - cseweb.ucsd.edu · modification neural branch prediction • optimize the speed...
TRANSCRIPT
Branch Prediction
Eugene Kolinko Joon Lee
Introduction
Branch prediction Predict and begin fetching, decoding, or issuing instruction Reason To reduce delay of conditional branch decisions
Pipeline
IF ID EXE MEM WB
Branch predica5on accuracy is very important in deeply pipeline processors because mispredic)on lead to a larger penalty (delay)
IF ID EXE MEM WB
Bubble
Bubble
IF ID EXE MEM WB
IF ID EXE MEM WB
Without Branch Predic5on With Branch Predic5on
Pro and Con
Pro • It will reduce the delay if prediction was correct
Con • Wrong prediction will cause more delay
Branch History
Taken
Not Taken
Taken
Taken
Not Taken
Not Taken
0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0 0 1
Satura5ng Counter
Strongly Not Taken
Weakly Not taken
Weakly taken
Strongly Taken
Taken
Taken
Not Taken
Branch History
Two-‐Level Branch Predictor
Pa=ern History Table
00 Strongly Not Taken
01 Weakly Taken
10 Strongly Taken
11 Strongly Not Taken
Taken
Not Taken
Taken
Taken
Not Taken
Not Taken
0 0 1 0 0 1 1 0 1 1 0 1 0 0
Weakly Not Taken
Weakly Taken
Strongly Taken Weakly Taken
Strongly Taken
Basic Strategies Strategies
Predict all Taken Accuracy
1 Sta5c Predict all branches to be taken 76.68
1a Sta5c Special case branches predict taken 86.70
2 Dynamic Predict same as 1 bit history 90.40
3 Sta5c Predict backward branches as taken 80.28
4 Dynamic Predict taken if branches is not in memory 90.40
5 Dynamic Predict same as Instruc5on 1 bit history 90.23
6 Dynamic Address hashing 1 bit history 90.42
7 Dynamic Address hashing 2 bit history 92.55
Sta5c Advantage
while(true) Sta5c predict true. assuming 64 line in history table
Type Accuracy Memory
Sta5c 100% 0
1 bit 100% 8 bytes
2 bit 100% 16 bytes
1 bit Advantage
for (int i = 0; I < 10; i ++) { if(i % 10 < 7) { }
} T T T T T T T N N N T T T T T T T
Type Accuracy
Sta5c 70%
1 bit 80%
2 bit 60%
2 bit Advantage
for (int i = 0; I < 10; i ++) { if(i % 3 == 0) { }
} N N T N N T
Type Accuracy
Sta5c 33%
1 bit 33%
2 bit 66%
Local Global
PHT Branch History
PHT Branch History
PHT Branch History
PHT Branch History
Pa=ern History Table
00 Strongly Not Taken
01 Weakly Taken
10 Strongly Taken
11 Strongly Not Taken
… …
Global Predictor: gshare
Branch History XOR
Branch Address
16-‐bit
Branch Correla)on
if ( A ) … if ( A & B )
if ( A ) B = 2 … if ( B < 0)
if ( ~A ) else if ( ~B ) else if ( C ) … if ( A & B )
In-‐path correla)on
Selec)ve History
1 Most Important Branch
2 Most Important Branches
3 Most Important Branches
Track n most important branches in history
94.1% 94%
94.7%
95.1%
ghare 1 Most Important Branch 2 Most Important Branches
3 Most Important Branches
Pred
ic)o
n Ac
curacy
Per-‐Address Predictability
Class Behavior Example
Loop Taken n )mes, then not taken 11111110 11111110
Repea)ng
Fixed-‐length 10110 10110
Block 11111 00 11111 00
Non-‐Repea)ng No repea)ng pa=ern, predicted based on past outcomes
data-‐dependent branches
PA Predictor
14%
7%
33%
46%
Loop Repea)ng Pa=ern Non-‐Repea)ng Pa=ern Ideal Sta)c
Percen
tage Best (Dy
namic)
21% Unexploited Predictability
Proposed Predictor: Per-‐Address Predictor w/ 3 types
55%
16%
29%
0
20
40
60
80
100
Ideal Sta)c Best PA Best gshare Best
Percen
tage Best (Dy
namic)
Exis)ng Algorithms
Weighted by Execu)on Frequency
40%
22%
38%
0
20
40
60
80
100
Ideal Sta)c Best Per-‐Address Best Global Best
Percen
tage Best (Dy
namic)
Proposed Algorithms
38% Be=er
31% Be=er
Weighted by Execu)on Frequency
88
89
90
91
92
93
94
95
96
97
98
32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K
Pred
ic)o
n Ac
curacy (%
)
Predictor Size (bytes)
bimodal
gshare
bimodal/gshare
Combined Predictor Performance Selec)vely choosing between several predictors can improve overall predic)on accuracy
Neural Branch Prediction
• hash to find location in table index • update on weight vector and bias • update history register
Modification Neural Branch Prediction
• optimize the speed by path-based o choose weight vector according to the path leading up to a
branch § Faster due to early start § more accurate due to info on path
o pipeline the calculation and ahead of time § compute the data ahead § use pipeline to compute the weight
• optimize the speed and power by analog implementation o Use scale to give more value toward newest data o adaptive threshold
Thank you