andré seznec caps team irisa/inria 1 analysis of the o-gehl branch predictor optimized geometric...

24
1 André Seznec Caps Team IRISA/INRIA Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC

Upload: james-kelly

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

1

André Seznec Caps Team

IRISA/INRIA

Analysis of the O-GEHL branch predictor

Optimized GEometric History Length

André Seznec IRISA/INRIA/HIPEAC

2

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Objectives

State of the art accuracy:

Any gain in branch prediction accuracy results in performance gain and power consumption gain

Keep the design implementable: Rely on a single prediction scheme Use only global history information:

Designers hate maintaining speculative local history

3

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

L(0) ?

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

The basis : A Multiple history length predictor

4

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Selecting between multiple predictions

Classic solution: Use of a meta predictor

“wasting” storage !?!

chosing among 5 or 10 predictions ??

Neural inspired predictors: Use an adder tree instead of a meta-predictor

Let’s use the adder tree

5

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

L(0) ∑

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

Final computation through a sum

Prediction=Sign

6

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

From “old experience” on 2bcgskew

Some applications benefit from 100+ bits histories

Some don’t !!

7

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

GEometric History Length predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

{0, 2, 4, 8, 16, 32, 64, 128}

What is important: L(i)-L(i-1) is drastically increasing

Spends most of the storage for short history !!

8

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Update policy

Perceptron inspired threshold based

Perceptron-like threshold did not work

Reasonable fixed threshold= Number of tables

9

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Dynamic update threshold fitting

On an O-GEHL predictor, best threshold depends on the application the predictor size the counter width

By chance, on most applications, for the best fixed threshold,

updates on mispredictions ≈ updates on correct predictions

Monitor the difference

and adapt the update threshold

10

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Adaptative history length fitting (inspired by Juan et al 98)

(½ applications: L(7) < 50) ≠

(½ applications: L(7) > 150)

Let us adapt some history lengths to the behavior of each application

8 tables: T2: L(2) and L(8) T4: L(4) and L(9) T6: L(6) and L(10)

11

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Adaptative history length fitting (2)

Intuition: if high degree of conflicts on T7, stick with

short history

Implementation: monitoring of aliasing on updates on T7

through a tag bit and a counter

Simple is sufficient: Flipping from short to long histories and

vice-versa

12

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Evaluation framework

1st Championship Branch Prediction traces:

20 traces including system activity

Floating point apps : loop dominated

Integer apps: usual SPECINT

Multimedia apps

Server workload apps: very large footprint

13

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Reference configurationpresented for Championship Branch Prediction

8 tables: 2 Kentries except T1, 1Kentries 5 bit counters for T0 and T1, 4 bit counters

otherwise 1 Kbits of one bit tags associated with T7

10K + 5K + 6x8K + 1K = 64K

L(1) =3 and L(10)= 200 {0,3,5,8,12,19,31,49,75,125,200}

14

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

A case for the OGEHL predictor

2nd at CBP: 2.82 misp/KI

Best practice award: The predictor the closest to a possible hardware

implementation Does not use exotic features:

• Various prime numbers, etc• Strange initial state

Very short warming intervals: Chaining all simulations: 2.84 misp/KI

15

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

A case for the OGEHL predictor (2)

High accuracy 32Kbits (3,150): 3.41 misp/KI

better than any implementable 128 Kbits predictor before CBP

• 128 Kbits 2bcgkew (6,6,24,48): 3.55 misp/KI• 176 Kbits PBNP (43) : 3.67 misp/KI

1Mbits (5,300): 2.27 misp/KI• 1Mbit 2bcgskew (9,9,36,72): 3.19 misp/KI• 1888 Kbits PBNP (58): 3.23 misp/KI

16

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

A case for the OGEHL predictor (3)

Robustness to variations of history lengths choices: L(1) in [2,6], L(10) in [125,300] misp. rate < 2.96 misp/KI

Geometric series: not a bad formula !! best geometric L(1)=3, L(10)=223, 2.80 misp/KI best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145,

266} 2.78 misp/KI

17

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Impact of the number of components

4 components — 8 components• 64 Kbits: 3.02 -- 2.84 misp/KI• 256Kbits: 2.59 -- 2.44 misp/KI• 1Mbit: 2.40 -- 2.27 misp/KI

6 components — 12 components• 48 Kbits: 3.02 – 3.03 misp/KI• 768Kbits: 2.35 – 2.25 misp/KI

4 to 12 components bring high accuracy

18

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Impact of counter width

Robustness to counter width variations: 3-bit counter, 49 Kbits: 3.09 misp/KI

Dynamic update threshold fitting helps a lot

5-bit counter 79 Kbits: 2.79 misp/KI

4-bit is the best tradeoff

19

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Prediction computation time

3 successive steps: Index computation: a 3-entry XOR gate Table read Adder tree

May not fit on a single cycle: But can be ahead pipelined !

20

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Ahead pipelining a global history branch predictor (principle)

Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:

• X-block ahead instruction address• X-block ahead history

To ensure accuracy: Use intermediate path information

21

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Practice

Ahead OGEHL:8 // prediction computations

bcd

Ha

A

A B C D

22

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

Ahead Pipelined 64 Kbits OGEHL

3-block: 2.94 misp/KI 4-block: 2.99 misp/KI 5-block: 3.04 misp/KI

Not such a huge accuracy loss

23

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

A final case for the O-GEHL predictor

delivers state-of-the-art accuracy uses only global information:

Very long history: 200+ bits !! can be ahead pipelined many effective design points

Nb of tables, counter width, history lengths prediction computation logic complexity is low

(compared with concurrent predictors)

24

An

aly

sis

of

th

e O

-GE

HL

bra

nch

pre

dic

tor

André SeznecCaps Team

Irisa

The End