dynamic history-length fitting: a third level of adaptivity for branch prediction toni juan sanji...

Dynamic History-Length Fitting:A third level of adaptivity for branch

prediction

Toni JuanSanji SanjeevanJuan J. Navarro

Department of Computer ArchitectureUniversity Politècnica de Catalunya

Presented by Danyao WangECE1718, Fall 2008

ISCA '98

Overview

• Branch prediction background

• Dynamic branch predictors

• Dynamic history-length fitting (DHLF)– Without context switches

– With context switches

• Results

• Conclusion

Why branch prediction?

• Superscalar processors with deep pipelines– Intel Core 2 Duo: 14 stages

– AMD Athlon 64: 12 stages

– Intel Pentium 4: 31 stages

• Many cycles before branch is resolved– Wasting time if wait…

– Would be good if can do some useful work…

• Branch prediction!

What does it do?sub r1, r2, r3bne r1, r0, L1add r4, r5, r6…

L1: add r4, r7, r8sub r9, r4, r2

fetch decode sub

fetch decode bne

fetch decode add

Execute speculatively

Predict taken.Fetch from L1

Branch resolved

Branch fetched

Validate prediction: Correct

What happens when mispredicted?sub r1, r2, r3bne r1, r0, L1add r4, r5, r6…

L1: add r4, r7, r8sub r9, r4, r2

fetch decode sub

fetch decode bne

fetch decode add

Execute speculatively

Predict taken.Fetch from L1

Branch resolved

Branch fetched

Validate prediction: Incorrect!

squash

How to predict branches?

• Statically at compile time– Simple hardware

– Not accurate enough…

• Dynamically at execution time– Hardware predictors

• Last-outcome predictor

• Saturation counter

• Pattern predictor

• Tournament predictorMore ComplexMore Accurate

Last-Outcome Branch Predictor

• Simplest dynamic branch predictor

• Branch prediction table with 1-bit entries

• Intuition: history repeats itself

2N entries

lower N bits of PC

Branch Prediction Table

1-bit Prediction: T or NT-Read at Fetch-Write on misprediction

Saturation Counter Predictor

• Observation: branches highly bimodal

• n-bit saturation counter– Hysteresis

– n-bit entries in branch prediction table

00 01 10 11

Pred. TakenPred. Not-TakenT T T

WEAK bias

Strong biase.g. 2-bit bimodal predictor

Pattern Predictors

• Near-by branches often correlate

• Looks for patterns in branch history– Branch History Register (BHR): m most recent branch

outcomes

2N entries

lower n bits of PC

N-bit index

saturation counter

m-bit history

Two-Level Predictor

Tournament Predictor

• No one-size-suits-all predictor

• Dynamically choose among different predictors

Predictor A

Predictor B

Predictor C

Chooser or metapredictor

What is the best predictor?

Optimal

Better

Observations

• Predictor performance depends on history length

• Optimal history length differs for programs

• Predictors with fixed history length underperforming potential

• … dynamic history length?

Dynamic History-Length Fitting (DHLF)

Intuition

• Tournament predictor– Picks best out of many predictors

– Spatial multiplexing

– Area cost …

• DHLF: time multiplexing– Try different history lengths during execution

– Adapt history length to code

– Hope to find the best one

2-Level Predictor Revisited

• Index = f(PC, BHR)

• gshare, f = xor, m < n

• 2-bit saturation counter

2n entries

lower n bits of PC

n-bit index

saturation counter

m-bit history

PredeterminedFigure out dynamically

DHLF Approach

• Current history length

• Best so far length

• Misprediction counter

• Branch counter

• Table of measured misprediction rates per length– Initialized to zero

• Sampling at fixed intervals (step size)– Try new length: get MR– Adjust if worse than best seen before– Move to a random length if length has not changed for a while

• Avoids local minima

DHLF ExamplesIndex = 12 bitsstep = 16K

Optimal

Experimental Methodology

• SPECint95

• gshare and dhlf-gshare

• Trace-driven simulation

• Simulated up to 200M conditional branches

• Branch history register & pattern history table immediately updated with the true outcome

DHLF Performance

• Area overhead– Index length = 10; step size = 16K; overhead = 7%– Index length = 16; step size = 16K; overhead = 0.02%

Better

Optimization Strategies

• Step size– Small: learns faster

• Has to be big enough for meaningful misprediction stats

– Big: learns slower

• Change length incrementally– Test as many lengths as possible

• Warm-up period– No MR count for 1 interval after length change

Context Switches

• Branch prediction table trashed periodically

• Lower prediction accuracy immediately after a context switch

• Context switch frequency affects optimal history length

Impact on Misprediction Rate

Better

gshare. Index = 16 bits

Context-switch distance: # branches executed between context switches

Coping with Context Switches

• Upon context switch– Discard current misprediction counter

– Save current predictor data• misprediction table

• current history length

• Approx. 221 bits for 16-bit index, step = 16K, 13 bit misprediction counter

• Returning from a context switch– Warm-up: no MR counter for 1 interval

DHLF with Context SwitchesM

Better

x dhlf-gshare with step value = 16K gshare with all possible history length

Branch prediction table flush every 70K instructions to simulate context switch.

Contributions

• Dynamically finds near-optimal history lengths

• Performs well for programs with different branch behaviours

• Performs well under context switches

• Can be applied to any two-level branch predictor

• Small area overhead

Backup Slides

DHLF Performance: SPECint95

dhlf-share; step size = 16K. Compared to all possible history lengths (no context switch)

Better

DHLP with Context Switches

Better

dhlf-gshare; step size = 16K; context-switch distance = 70K

dhlf-gskew

Step value = 16K. Compared to all history lengths for gskew,

Better

dhlf-gskew with Context Switch

Step size = 16K; Context-switch distance = 70K.

Better

DHLF Structure

Run next interval

Misprediction table

N entries

Nstep dynamicbranches

Initial history length

branch counter

misprediction counter

current misprediction > min achieved?

ptr. to min. misprediction count

ptr. to entry for current history length

Adjust history length

DHLF Data Structure

Questions

• Is fixed context switch distance realistic?

• Does updating the PHT with true branch data immediately affect results?– Previous studies show little impact due to this

dynamic history-length fitting: a third level of adaptivity for branch prediction toni juan sanji...

Documents

universidad rey juan carlos · universidad rey juan carlos...

ssan juan recordan juan recordssan juan recordan juan record...

las especialidades de chef juan chef juan specialties

josé zorrilla don juan tenorio - cobaep...

o. ardaiz , l. díaz de cerio, r. meseguer, a. gallardo, k....

k education trust summer gathering 9-13 may 2012 on the...

du jian , du li-juan , and hu hong-juan

rajita sanji & devi modha ap comparative & politics | 3b

characters: la virgen de guadalupe juan diego juan diegos...

meet juan and julia! we asked juan and julia

date: 09/27/18 no: koku-kan-sanji-614 no: koku-kan-sanji

sanji the king

information users - university of toronto t-space · a cdma...

summary, conclusion and suggestions....

ffiffiffffi&mw -...

ficci quality forumficciqualityforum.com/htm/fqf.pdf ·...

we know it case study: hosted multimedia contact centre...

tax treaties-an overview -by vinay n sanji · vinay n sanji...

triptico san juan 2018 - san juan de aznalfarache

thai-ussr relations british booty in africa resumed by...