andré seznec caps team irisa/inria 1 a 256 kbits l-tage branch predictor andré seznec...
DESCRIPTION
André Seznec Caps Team Irisa 3 TAGE: TAgged GEometric history length predictors The genesisTRANSCRIPT
![Page 1: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/1.jpg)
1
André Seznec Caps Team
IRISA/INRIA
A 256 Kbits L-TAGE branch predictor
André SeznecIRISA/INRIA/HIPEAC
![Page 2: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/2.jpg)
2André Seznec
Caps TeamIrisa
Directly derived from:
A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006
+Tricks:
Loop predictorKernel/user histories
![Page 3: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/3.jpg)
3André Seznec
Caps TeamIrisa
TAGE:TAgged GEometric history length predictors
The genesis
![Page 4: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/4.jpg)
4André Seznec
Caps TeamIrisa
Back around 2003
2bcgskew was state-of-the-art, but: but was lagging behind neural inspired
predictors on a few benchmarks Just wanted to get best of both behaviors
and maintain: Reasonable implementation cost:
• Use only global history • Medium number of tables
In-time response
![Page 5: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/5.jpg)
5André Seznec
Caps TeamIrisa
L(0) ?
L(4)
L(3)L(2)
L(1)
TOT1
T2T3
T4
The basis : A Multiple length global history predictor
![Page 6: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/6.jpg)
6André Seznec
Caps TeamIrisa
GEometric History Length predictor
L(1)1iαL(i)
0 L(0)
The set of history lengths forms a geometric series
What is important: L(i)-L(i-1) is drastically increasing
most of the storage for short history !!
{0, 2, 4, 8, 16, 32, 64, 128}
Capture correlation on very long histories
![Page 7: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/7.jpg)
7André Seznec
Caps TeamIrisa
Combining multiple predictions ?
Classical solution: Use of a meta predictor
“wasting” storage !?! chosing among 5 or 10 predictions ??
Neural inspired predictors, Jimenez and Lin 2001 Use an adder tree instead of a meta-predictor
Partial matching Use tagged tables and the longest matching historyChen et al 96, Michaud 2005
![Page 8: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/8.jpg)
8André Seznec
Caps TeamIrisa
L(0) ∑
L(4)
L(3)L(2)
L(1)
TOT1
T2T3
T4
CBP-1 (2004): OGEHL
Final computation through a sum
Prediction=Sign
12 components 3.670 misp/KI
![Page 9: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/9.jpg)
9André Seznec
Caps TeamIrisa
pc h[0:L1]
ctr u tag
hash hash
=?
ctr u tag
hash hash
=?
ctr u tag
hash hash
=?
prediction
pc pc h[0:L2] pc h[0:L3]
11 1 1 1 1 1
1
1
TAGEGeometric history length + PPM-like
+ optimized update policy
Tagless base predictor
![Page 10: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/10.jpg)
10André Seznec
Caps TeamIrisa
=? =? =?
11 1 1 1 1 1
1
1
Hit
Hit
Altpred
Pred
Miss
![Page 11: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/11.jpg)
11André Seznec
Caps TeamIrisa
Prediction computation
General case: Longest matching component provides the prediction
Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit
counter
![Page 12: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/12.jpg)
12André Seznec
Caps TeamIrisa
TAGE update policy
General principle:
Minimize the footprint of the prediction.
Just update the longest history matching component and allocate at most one entry on mispredictions
![Page 13: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/13.jpg)
13André Seznec
Caps TeamIrisa
A tagged table entry
Ctr: 3-bit prediction counter U: 2-bit useful counter
Was the entry recently useful ? Tag: partial tag
Tag CtrU
![Page 14: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/14.jpg)
14André Seznec
Caps TeamIrisa
Updating the U counter
If (Altpred ≠ Pred) then• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1
Graceful aging:Periodic shift of all U counters• implemented through the reset of a single bit
![Page 15: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/15.jpg)
15André Seznec
Caps TeamIrisa
Allocating a new entry on a misprediction
Find a single “useless” entry with a longer history: Priviledge the smallest possible history
• To minimize footprint But not too much
• To avoid ping-pong phenomena
Initialize Ctr as weak and U as zero
![Page 16: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/16.jpg)
16André Seznec
Caps TeamIrisa
Improve the global history
Address + conditional branch history: path confusion on short histories
Address + path: Direct hashing leads to path confusion
1. Represent all branches in branch history2. Use also path history ( 1 bit per branch, limited to 16
bits)
![Page 17: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/17.jpg)
17André Seznec
Caps TeamIrisa
Design tradeoff for CBP2 (1)
13 components:Bring the best accuracy on distributed traces
• 8 components not very far !
History length:Min=4 , Max = 640 Could use any Min in [2,6] and any Max in
[300, 2000]
![Page 18: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/18.jpg)
18André Seznec
Caps TeamIrisa
Design tradeoff for CBP2 (2)
Tag width tradeoff: (destructive) false match is better tolerated
on shorter history7 bits on T1 to 15 bits on T12
Tuning the number of table entries:Smaller number for very long historiesSmaller number for very short histories
![Page 19: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/19.jpg)
19André Seznec
Caps TeamIrisa
Adding a loop predictor
The loop predictor captures the number of iterations of a loop
When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction.
Advantages: Very reliable Small storage budget: 256 52-bit entries
Complexity ? Might be difficult to manage speculative iteration numbers on
deep pipelines
![Page 20: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/20.jpg)
20André Seznec
Caps TeamIrisa
Using a kernel history and a user history
Traces mix user and kernel activities: Kernel activity after exception
• Global history pollution
Solution: use two separate global histories
User history is updated only in user mode Kernel history is updated in both modes
![Page 21: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/21.jpg)
21André Seznec
Caps TeamIrisa
L-TAGE submission accuracy (distributed traces)
3.314 misp/KI
![Page 22: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/22.jpg)
22André Seznec
Caps TeamIrisa
Reducing L-TAGE complexity
Included 241,5 Kbits TAGE predictor:3.368 misp/KI
Loop predictor beneficial only on gzip:Might not be worth the extra complexity
![Page 23: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/23.jpg)
23André Seznec
Caps TeamIrisa
Using less tables
8 components 256 Kbits TAGE predictor:3.446 misp/KI
![Page 24: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/24.jpg)
24André Seznec
Caps TeamIrisa
TAGE prediction computation time ?
3 successive steps: Index computation Table read Partial match + multiplexor
Does not fit on a single cycle: But can be ahead pipelined !
![Page 25: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/25.jpg)
25André Seznec
Caps TeamIrisa
Ahead pipelining a global history branch predictor (principle)
Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:
• X-block ahead instruction address• X-block ahead history
To ensure accuracy: Use intermediate path information
![Page 26: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/26.jpg)
26André Seznec
Caps TeamIrisa
Practice
Ahead pipelined TAGE:4// prediction computations
bc
Ha
A
A B C
![Page 27: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/27.jpg)
27André Seznec
Caps TeamIrisa
3-branch ahead pipelined 8 component 256 Kbits TAGE
3.552 misp/KI
![Page 28: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/28.jpg)
28André Seznec
Caps TeamIrisa
A final case for the Geometric History Length predictors
delivers state-of-the-art accuracy
uses only global information: Very long history: 300+ bits !!
can be ahead pipelined
many effective design points OGEHL or TAGE Nb of tables, history lengths
![Page 29: André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC](https://reader035.vdocument.in/reader035/viewer/2022062302/5a4d1b3c7f8b9ab05999f07c/html5/thumbnails/29.jpg)
29André Seznec
Caps TeamIrisa
The End