trace substitution hans vandierendonck, hans logie, koen de bosschere ghent university europar 2003,...
TRANSCRIPT
Trace Substitution
Hans Vandierendonck,Hans Logie, Koen De Bosschere
Ghent University
EuroPar 2003, Klagenfurt
August 27, 2003 Euro-Par 2003 2
Instruction Fetch
• Wide-issue superscalar processors need to fetch multiple branches per cycle– IPC=8 implies fetching ~16 instructions/cycle and
predicting ~3 branches/cycle– Multi-ported instruction cache?
• Trace cache:– Packs fetch groups in a trace– Trace tagged with PC, path, next fetch PC– Multiple branch predictor (MBP) predicts branch
directions
August 27, 2003 Euro-Par 2003 3
The Trace Cache
instructioncache
tracecache
MBP
MUX
select
hit
pred. trace
pred. insn
fetch addressinstructionshit/miss
legend
pred. path
fetch address
next addressinstructions
fillunit
onlyexecuted
paths!
August 27, 2003 Euro-Par 2003 4
Overview
• Observation– Trace cache misses are (sometimes) branch
mispredictions
• Trace Substitution– How to make use of it
• Evaluation– Is it worth it?
• Conclusion
August 27, 2003 Euro-Par 2003 5
Observation
• Multiple branch predictor affects trace cache:– Non-perfect branch
predictors reduce the trace cache hit rate
– FIPA correlates better with TC hit rate than with MBP accuracy
TC: 16K-traces, 4-way set-assoc, path associativityMGAg, Mgshare: 12-bit historyrepeat: 8Kbit hybrid, accessed 3x
0
2
4
6
8
10
12
14
16
MG
Ag
Mg
sha
re
rep
ea
t
pe
rfe
ct
MG
Ag
Mg
sha
re
rep
ea
t
pe
rfe
ct
MG
Ag
Mg
sha
re
rep
ea
t
pe
rfe
ct
gcc vortex avg
FIP
A
70%
75%
80%
85%
90%
95%
100%
Hit
ra
te (
%)
FIPA MBP hits TC hits
August 27, 2003 Euro-Par 2003 6
TC Misses Are a Tell-Tale for MBP misses
• Trace cache misses coincide with branch mispredictions, e.g.:– 16K-entry trace cache, 12-bit MGAg:
• 84.9% of TC misses are also MBP misses• 37.6% of MBP misses are also TC misses
– 256-entry trace cache, 12 bit MGAg:• 25.1% of TC misses are also MBP misses• 55.9% of MBP misses are also TC misses
• This work: use TC misses to detect MBP misses and fix them
high accuracy,low coverage
low accuracy,higher coverage
August 27, 2003 Euro-Par 2003 7
Trace Substitution
• Assumption: TC miss implies MBP miss– Correlation between branches implies that some
paths never occur– TC stores only those paths that do occur
• If the predicted path is wrong …– Fetch a different trace– Override MBP with MRU trace starting at fetch PC
• Detect MRU trace from LRU bits stored in TC• No trace substitution applied if it does not exist
August 27, 2003 Euro-Par 2003 8
Implementation
instructioncache
tracecache
MBP
MUX
select
hit
MRU hit
MRU
pred. trace
pred. insn
fetch addressinstructionshit/miss
legend
pred. path
fetch address
next addressinstructions
fillunit
August 27, 2003 Euro-Par 2003 9
Evaluation Setup
• Benchmarks– SPECint95 (except compress, go), reference inputs– 500 million instructions from start of program– Compiled for Alpha ISA, Compaq C compiler, -O4
• Fetch Unit– TC: 1 trace = 16 instructions, 3 cond. branches, trace ends at
system call, indirect jump– TC: 4-way set-assoc., path associativity– MBP: MGAg, varying history length– Instruction cache: 32K, 2-way, 32byte blocks, LRU
• Metric– FIPA = fetched instructions per fetch unit access
August 27, 2003 Euro-Par 2003 10
Evaluation (1)
• Observations:– Gap MGAg-perfect
increases with TC size– 20-40% of gap filled
with trace substitution– Only on TC miss, thus
performance increase drops with TC size
TC: 4-way set-associativeMGAg: 12-bit history
8
9
10
11
12
13
14
64 256 1024 4096 16384
Trace cache size (traces)
FIP
A
perfect
MGAg+subst
MGAg
August 27, 2003 Euro-Par 2003 11
Evaluation (2)
• Observations:– Compensate poor
branch predictor– No history ~ 10 bit
history– Improvement drops
with more accurate predictor
TC: 256 traces, 4-ways
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
12.0
0 2 4 6 8 10 12 14 16
Branch history length
FIP
A
MGAg+subst
MGAg
August 27, 2003 Euro-Par 2003 12
Accuracy vs. Usage
• Definitions:– Usage = substitutions
per fetch unit access– Accuracy = fraction
correct substitutions
• Note– Accuracy limited
because correct-path trace is not always present!
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
0 2 4 6 8 10 12 14 16
Branch history length
Fra
ction o
f A
ccesses
Usage
Accuracy
TC: 256 traces, 4-way
August 27, 2003 Euro-Par 2003 13
Conclusion
• Proposed trace substitution– TC miss flags MBP miss
• Not always correct, not all MBP misses found• Fetch MRU trace instead: cheap implementation
• Results in– Consistent performance improvement
• No history+substitution ~ MGAg with 10-bit history• In other cases: 0.2 instructions/access
or same performance as with 16 times smaller MBP
• Most effective when MBP or TC is small