analysis of branch predictors guang pan ming lu dec 12, 2006

25
Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Upload: meghan-day

Post on 31-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Analysis of Branch Predictors

Guang PanMing Lu

Dec 12, 2006

Page 2: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Outline

Motivation Introduction Previous works Our works Simulation results Conclusion & Recommendations Future works

Page 3: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Motivation Branches are very frequent

Approx. 20% of all instructions Accurate branch prediction improves

performance of a superscalars or superpipled processor. Decreasing miss prediction rate saves

cycles. Decreasing miss prediction rate saves

energy.

Page 4: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Introduction Need to know two things

Whether the branch is taken or not (direction) The target address if it is taken (target)

Direct jumps, function calls Direction known (always taken), target easy to

compute Conditional branches (typically PC-relative)

Direction difficult to predict, target easy to compute Indirect jumps, function returns

Direction known (always taken), target difficult

Page 5: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Introduction (cont’) Framework and traces are based on a branch

predictor competition (Championship Branch Prediction)

Focused on conditional branches, evaluated several branch predictors by measurements on real traces from IBS (Instruction Benchmark Set)

Proposed two closely related modifications of global adaptive prediction mechanisms which can achieve satisfactory accuracy.

Page 6: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works

Static predictor Always predict Not taken

(predictor_nottaken) Easy to implement 30-40% accuracy … not so good

Always predict Taken (predictor_taken) 60-70% accuracy

Page 7: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’)

Local 2-bit predictor (with hysteresis)

T

T

NT

Predict Taken

Predict Not Taken

Predict Taken

Predict Not TakenT

NT

T

NT

NT

Page 8: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’) Bimodal predictor

With a table of two-bit entries, indexed with the least significant bits of the instruction addresses.

Entries typically do not have tags.

A particular counter mapped to different branch instructions

Each counter has one of four states:

Strongly not taken Weakly not taken Weakly taken Strongly taken

Page 9: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’) Correlating

predictor Branch outcome correlates

with the outcome of some recently executed branches

Use this in our prediction Keep N bits of history

of recent outcomes Use a different M-addr-bit

predictor for each differenthistory

Note: N-bit history means2^N different predictors foreach branch

Branch address (4 bits)

2-bits per branch local predictors

PredictionPrediction

2-bit global branch history

Page 10: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’) Gshare predictor

Correlating predictors often wasteful

Some histories are rare or even impossible

Yet we dedicate a counter for each history

Solution: hashing Use a single large predictor

table Hash history and branch

address together Use the hash to index into the

table The hash is just an XOR, so

it’s fast

K bits of branchinstruction address

Index

Table of 2-bitpredictors with2^max(N,K)entries

N bits of globalbranch history

XOR Prediction

Page 11: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’)

Why Gshare is bad? Needs a lot of branch instances to

train the different 2-bit predictors Simple 2-bit predictor

Has a prediction after it sees one instance of a branch The gShare predictor

Has a prediction after it sees an instanceof that branch and that particular history

But for the same number of counters, gShare usually gives better prediction accuracy

Page 12: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’)

Tag-based PPM (Prediction by Partial Matching) predictor

PPM was originally introduced for text compression, and it was used in for branch prediction.

Tag-based, global-history predictor derived from PPM. Features five tables. (indexed with a different history

length) Prediction is given by the up-down saturating counter

associated with the longest matching history.

Page 13: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Previous Works (cont’)

The PPM predictor features 5 tables. The “bimodal” table on the left has 4k entries, with 4 bits per entry. Each of the 4 other tables has 1k entries, with 12 bits per entry. The table on the right is the one using the more global history bits (80 bits).

Page 14: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Our Work

Hybrid_vote Predictor Combination of bimodal, gshare, ppm

predictors (hardware achievable) Three predictors predict a conditional branch

simultaneously Using voting mechanism to predict

Compare the predictions of three predictors Choose the final prediction from the majority.

Update each of the predictors by its own updating method.

Page 15: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Our Work (Cont’)

Hybrid_select Predictor Combination of bimodal, gshare, ppm

predictors (hardware achievable) Three predictors predict a conditional branch

simultaneously Using bimodal selecting mechanism to

predict Compare the predictions of three predictors Choose the final prediction by:

Bimodal >= 2 choose ppm prediction Bimodal < 2 choose gshare prediction

Update each of the predictors by its own updating method.

Page 16: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Simulation Results

Page 17: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

0

20

4060

80

100

120140

160

180

164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 201.compress 202.jess 205.raytrace 209.db

020406080

100120140160180

213.javac 222.mpegaudio 227.mtrt 228.jack 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.tw olf

Taken Not taken Local 2 bits Local 3 bits bimodal 2 bits

bimodal 3 bits Correlating 8x8K Correlating 4x16K Gshare-32K Gshare-64K

PPM hybrid vote hybrid select

Bench mark of MPKI (MPKI = Misses per 1000 Instructions)

Page 18: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Simulation Results (Cont’)

Page 19: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Simulation Results (Cont’)

0%5%

10%15%20%25%30%35%40%45%50%55%60%65%70%

Miss rate

Taken

Not taken

Local 2 bits

Local 3 bits

bimodal 2 bits

bimodal 3 bits

Correlating 8x8K

Correlating 4x16K

Gshare-32K

Gshare-64K

PPM

hybrid vote

hybrid select

Page 20: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Simulation Results (Cont’)

0102030405060708090

100

MPKI

Taken

Not taken

Local 2 bits

Local 3 bits

bimodal 2 bits

bimodal 3 bits

Correlating 8x8K

Correlating 4x16K

Gshare-32K

Gshare-64K

PPM

hybrid vote

hybrid select

MPKI = Misses per 1000 Instructions

Page 21: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Simulation Results (Cont’)

0

0.5

1

1.5

2

2.5

3

3.5

4

CPU time(uS)

Taken

Not taken

Local 2 bits

Local 3 bits

bimodal 2 bits

bimodal 3 bits

Correlating 8x8K

Correlating 4x16K

Gshare-32K

Gshare-64K

PPM

hybrid vote

hybrid select

Page 22: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Conclusion & Recommendations

Identifying a good branch predictor plays an important role in improving performance more effective.

By combining current predictors, new hybrid predictors also perform well in the Benchmark.

Different structures of branch prediction schemes perform well on different branch structures.

The benchmark trace files are in favor of Not taken.

Recommendations: For embedded systems -> gshare For desktop/server systems -> ppm, hybrid_vote

Page 23: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Future Works

Research new methods for selecting more accurate predictions among the predictors.

Research new algorithms for implementing more effective and accurate dynamic branch predictors.

Page 24: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

References

[1] David Tarjan & Kevin Skadron, “Merging Path and Gshare indexing in Perceptron Branch Prediction “, ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 3, September 2005, Pages 280–300.

[2] Wikipedia.org, “Branch predictor “, http://en.wikipedia.org/wiki/Branch_prediction, 2006.

[3] A. N. Eden & T. Mudge, “The YAGS Branch Prediction Scheme,” Dept. EECS, University of Michigan, Ann Arbor.

[4] Lecture notes, “Branch Prediction”, http://www.cs.utah.edu/classes/cs6810/lectures/6810-bp.pdf, School of Computing, University of Utah

[5] Pierre Michaud, “A PPM-like, tag-based predictor”, http://www.jilp.org/cbp/Pierre.pdf

[6] John L. Hennessy & David A. Patterson, “Computer Archetecture – A Quantitative Approach”, Third Edition Morgan Kaufmann Publisher, 2003.

Page 25: Analysis of Branch Predictors Guang Pan Ming Lu Dec 12, 2006

Thanks and Questions?