evaluation of the gini-index for studying branch prediction features veerle desmet lieven eeckhout...
TRANSCRIPT
Evaluation of the Gini-index for Studying Branch
Prediction FeaturesVeerle Desmet
Lieven EeckhoutKoen De Bosschere
2
A simple prediction example
outlook t°
windyseason,...
umbrellaprediction
predictionmechanism
features
goal = prediction accuracy of 100%
past observations
3
A simple prediction example
• Daily prediction• Binary prediction: yes or no• Outcome in the evening
• Prediction strategies:– No need in summer, yes otherwise
• Easy, not very accurate
– Based on humidity and temperature• More complex, very accurate
4
Predicting
• How to improve prediction accuracy?• Shortcomings of existing models?
– Feature set– Prediction mechanism– Implementation limits– ...
• This talk: evaluation of prediction featuresfor branch prediction
5
Program execution
• Phases during instruction execution:
• Fetch = read next instruction• Decode = analyze type and
read operands• Execute• Write Back = write result
Fetch Decode Execute Write Back
R1=R2+R3addition
4 3
computation
R1 contains 7
6
Pipelined architectures
Parallel versus sequential:
• Constant flow of instructions possible• Faster applications
• Limitation due to branches
Fetch Decode Execute Write Back
R1=R2+R3R1=R2+R3R5=R2+1 R1=R2+R3R5=R2+1R4=R3-1 R1=R2+R3R5=R2+1R4=R3-1R7=2*R1 R5=R2+1R4=R3-1R7=2*R1R5=R6 R4=R3-1R7=2*R1R5=R6R1=4
7
Branches
Fetch Decode Execute Write Back
R1=R2+R3 R1=R2+R3R5=R2+1 R5=R2+1R4=R3-1R7=2*R1R5=R6 R5=R6test R1=0 test R1=0 R5=R2+1 R5=R6? test R1=0 R5=R2+1?? test R1=0R7=2*R1
R1=R2+R3
R5=R2+1
R7=0
R7=2*R1
R5=R6
test R1=0
yes
noR2=R2-1
• Branches determine program flow or execution path
• Introduce 2 bubbles affecting pipeline throughput
8
Solution
• 1 out of 8 instructions is a branch• Waiting for the outcome of branches
seriously affects amount of parallelism• Increasing number of pipeline stages
– Pentium 4: up to 20 stages
Predict outcomeof branch
9
Branch prediction
• Fetch those instructions that are likely to be executed
• Correct prediction eliminates bubbles
Fetch Decode Execute Write Back
R1=R2+R3 R1=R2+R3R5=R2+1 R5=R2+1R4=R3-1R7=2*R1R5=R6 R5=R6test R1=0 test R1=0 R5=R2+1 R5=R6R7=2*R1 test R1=0 R5=R2+1R7=2*R1R2=R2-1
R1=R2+R3
R5=R2+1
R7=0
R7=2*R1
R5=R6
test R1=0
yes
noR2=R2-1
10
Branch prediction
• Prediction for each branch execution• Binary prediction: taken or not-taken• Outcome after the test is excuted
• Prediction strategies:– Many predictors in literature– Static versus dynamic
11
Static branch prediction
• BTFNT: Backward Taken, Forward Not Taken– Loops (e.g. For, while)– Summer no need of umbrella
• Based on type of test in branch– Branch if equal mostly not-taken– Sunday no need of umbrella
• Easy, prediction fixed at compile-time• Prediction accuracy: about 75%
12
Dynamic branch prediction
• Bimodal• Global• Gshare• Local
Simulations:• SimpleScalar/Alpha• SPEC2000 integer benchmarks• 250M branches
13
Bimodal branch predictor
Averaging outcomes from previous years
branchaddress
saturatingcountere.g. 2
saturatingcountere.g. 3
predictione.g. taken
update withoutcome
e.g. taken
14
Global branch predictor
Averaging last day outcomes
globalhistory
e.g. 0111
saturatingcountere.g. 2
saturatingcountere.g. 3
predictione.g. taken
update withoutcome
e.g. taken
globalhistory
e.g. 1111
15
Gshare branch predictor
saturatingcountere.g. 2
predictione.g. taken
branch address
XOR
global historye.g. 1010
update with outcome
AMD K6
16
Local branch predictor
Record day outcomes of previous yearsAveraging over same day histories
branchaddress
prediction
local history
e.g. 1111
saturatingcountere.g. 2
17
Accuracy versus storage
local
bimodal
gshare
global
75
80
85
90
95
100
1 10 100 1000 10000 100000Pre
dic
tion
Accu
racy
(%
)
Predictor Size (byte)
18
Branch prediction strategies
• All saturating counter mechanism• All use of limited tables
– problem with so-called aliasing• Different prediction features• Accuracies up to 95%
• Further improvement?• Predictive power of features?
19
Feature selection
• Which features are relevant?• Less features
– require less storage– faster prediction
predictionpredictionmechanismfeatures?
Feature selection
20
Systematic feature evaluation
• Feature = input to predictor• Power of features
– predictor size not fixed– prediction strategy not fixed
• Decision trees:– Selects feature– Split observations– Recursive algorithm– Easy understandable
21
Decision Tree Construction
Outlook t° windy
sunny high no no
sunny low yes yes
overcast high no no
overcast low no no
overcast high yes yes
overcast low yes yes
rain low no yes
rain high yes yes
past o
bserv
ations
features
outlook
windy
YES
NO YES
yes no
sunnyovercast
rain
predictionmechanism
prediction
22
Gini-indexMetric for partition purity of a data set S:
Gini (S) = 1 – p0² – p1²
where pi is the relative frequence of class i in S
For binary prediction: minimum 0 maximum 0.5
The higher the Gini-index, the more difficult to predict
23
Finding good split points
• If data set S is split into two subsets S0 and S1 with sizes N0 and N1 (N = N0 + N1):
• Feature with lowest Ginisplit is chosen
• Extensible for non binary features
• Looking for features with low Ginisplit-index, i.e. features with good predictive power
N0 N
N1 N
Gini(S0) + Gini(S1)Ginisplit(S) =
24
Individual feature bits
00,050,1
0,150,2
0,250,3
0,350,4
0,450,5
globalhistory
localhistory
branch address
gshare-index
target directionbranch type
ending typesuccessorbasic block
dynamic features static features
Gin
i sp
lit-in
dex
25
Individual features
• Local history bits very good– perfect local history uses branch
address• Static features powerful
– non-binary– except target direction– known at compile-time
• Looking for good feature combinations...
26
Features as used in predictors
00,050,1
0,150,2
0,250,3
0,350,4
0,450,5
0 1 2 3 4 5 6 7 8 9 1011121314151617181920
Gin
i sp
lit-in
dex
globalhistory
branch address
gshare-index
localhistory
Feature length (bit)
27
Features as used in predictors
• Static features better for small lengths• Better if longer features• A few local history bits enough
• Same behaviour as accuracy curves– low Gini-index implies high accuracy
• Independent to predictor size• Independent to prediction strategy
28
Remark
• Limitation of decision trees: outliers– majority vote– clean data
• Keep implementation in mind
Outlook t° windy
sunny high no no
sunny high no yes
sunny high no no
29
Conclusion
• Need of accurate branch prediction in modern microprocessors
• Towards systematic predictor development– Selecting features– Predictive power of features
• Gini-index useful for studying branch prediction features– without fixing any predictor
aspect
Thanks for Listening