bungee jumps: accelerating indirect branches …lea edi, [rsi+rcx*1] cmp edi, edx jz j e ld edx,...

90
Bungee Jumps: Accelerating Indirect Branches Through Hardware/Software Co-Design Daniel S. McFarlin Craig Zilles 1

Upload: others

Post on 20-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Bungee Jumps:Accelerating Indirect Branches Through Hardware/Software

Co-Design Daniel  S.  McFarlin

Craig  Zilles

1

Page 2: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Indirect  Branches  Are  Increasingly  Predictable

2

Better

Indirect Branch

Predictability

Weighted Average Indirect

Branch BiasbtreefannkuchfastafloatknukemandelbrotmeteornbodyqueensraytraceregexdnarevcomprichardsspecnormGeomean

94.862840739 71.06865491491.69069463 64.80385336299.85320364 81.398931811

97.317006175 73.80388728699.263857238 87.481192648

92.6457219 68.124670899.046499732 75.93342432791.788141519 70.56609820898.393163943 70.73422413491.206546801 73.28397545788.678664125 71.81510786199.96575296 90.146778028

92.096928801 71.51024038297.17686519 63.895228067

95.212001287 73.546478657

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Avg

Mis

pred

icts

/Kilo

Inst

rs Nehalem Sandy Bridge Haswell TAGE

0

25

50

75

100

btree

fannkuch

fasta

float

knuke

mandelb

rot

meteor

nbody

queens

raytra

ce

regexd

na

revco

mp

richard

s

specn

orm

Geomean

Bran

ch B

ias/

Pred

icta

bilit

y Indirect Branch Predictability Weighted Average Indirect Branch Bias

Page 3: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Indirect  Branches  Are  Increasingly  Predictable

3

Better

Indirect Branch

Predictability

Weighted Average Indirect

Branch BiasbtreefannkuchfastafloatknukemandelbrotmeteornbodyqueensraytraceregexdnarevcomprichardsspecnormGeomean

94.862840739 71.06865491491.69069463 64.80385336299.85320364 81.398931811

97.317006175 73.80388728699.263857238 87.481192648

92.6457219 68.124670899.046499732 75.93342432791.788141519 70.56609820898.393163943 70.73422413491.206546801 73.28397545788.678664125 71.81510786199.96575296 90.146778028

92.096928801 71.51024038297.17686519 63.895228067

95.212001287 73.546478657

0

5

10

15

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Avg

Mis

pred

icts

/Kilo

Inst

rs Nehalem Sandy Bridge Haswell TAGE

0

25

50

75

100

btree

fannkuch

fasta

float

knuke

mandelb

rot

meteor

nbody

queens

raytra

ce

regexd

na

revco

mp

richard

s

specn

orm

Geomean

Bran

ch B

ias/

Pred

icta

bilit

y Indirect Branch Predictability Weighted Average Indirect Branch Bias

0

0.25

0.5

0.75

1

meteo

rray

trace

btree

fannkuch

fasta

richards

nqueens

revcomp float

specnorm

regexdna

knuke

mandelb

rotGeom

ean

predictability biasAnd  Unbiased

Page 4: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

4

In-­‐Order  Machines  Specialize  Based  on  Branch  Bias  or  Eliminate  Branch  PredicCon  Altogether

Shape

area()

area()

area()

VTable60

30

10

Page 5: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

4

In-­‐Order  Machines  Specialize  Based  on  Branch  Bias  or  Eliminate  Branch  PredicCon  Altogether

Shape

area()

area()

area()

VTable60

30

10 area()

area()

area()

if s->type == Circle

else if s->type == Rect

else if s->type == Square

else

RCPO

area()

Page 6: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

4

In-­‐Order  Machines  Specialize  Based  on  Branch  Bias  or  Eliminate  Branch  PredicCon  Altogether

Shape

area()

area()

area()

VTable60

30

10 area()

area()

area()

if s->type == Circle

else if s->type == Rect

else if s->type == Square

else

RCPO

area()

if (obj is type B) r = B::func( );else if (obj is type C) r = C::func( );else if (obj is type D) r = D::func( );else r = obj->func( );

p0 = (obj is type B);p1 = (obj is type C)p2 = (obj is type D) p0: r = B::func( );p1: r = C::func( );p2: r = D::func( );

if( !(p0 | p1 | p2)) r = obj->func( );

r = obj->func( ); PredicationRCPO Predication

Page 7: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 8: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 9: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 10: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 11: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 12: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 13: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 14: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 15: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 16: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 17: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 18: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Non-­‐Reconvergence  &  Large  Number  of  Targets

5

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 ld r10, [rcx*4+0x42]cmp edi, r10jnz LC

ld edx, [rsi*4+0x7d]cmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]cmp edi, edxjz J

Eld edx, [rcx*8+0x10]test edx, edxjz K

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10jnz H

Ald r9, [rax*8+0x94]jmp r9

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 ld r10, [rcx*4+0x46]cmp edi, r10jnz M

7

11

99 1

19

5 95

96 499 1

1 99

sjeng: f_in_check

Text

Page 19: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Missed  OpCmizaCon  Opportunity:  Next  Branch  Bias

6

80859095

100

gcc9

5 li

m88ksi

mperl

95 eon

gap

gcc2

kperl

2k

gcc2

k6

h264

ref

omen

tpp

perl2k

6pov

ray sjeng

xalan

c

richa

rds

Geomea

n

Bran

ch B

ias

Page 20: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Missed  OpCmizaCon  Opportunity:  Next  Branch  Bias

6

80859095

100

gcc9

5 li

m88ksi

mperl

95 eon

gap

gcc2

kperl

2k

gcc2

k6

h264

ref

omen

tpp

perl2k

6pov

ray sjeng

xalan

c

richa

rds

Geomea

n

Bran

ch B

ias

80859095

100

bench

btree

fannk

uch

fasta

fastar

edux

knuk

e

mandelb

rotnb

ody

regexd

na

revco

mp run

spec

norm

Geomea

n

Bran

ch B

ias

Page 21: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Missed  OpCmizaCon  Opportunity:  Next  Branch  Bias

6

80859095

100

gcc9

5 li

m88ksi

mperl

95 eon

gap

gcc2

kperl

2k

gcc2

k6

h264

ref

omen

tpp

perl2k

6pov

ray sjeng

xalan

c

richa

rds

Geomea

n

Bran

ch B

ias

80859095

100

bench

btree

fannk

uch

fasta

fastar

edux

knuk

e

mandelb

rotnb

ody

regexd

na

revco

mp run

spec

norm

Geomea

n

Bran

ch B

ias

80859095

100

btree

fannk

uch

fasta

float

knuk

eman

delbr

otmete

ornb

ody

quee

nsray

trace

regex

dna

revco

mpric

hards

spec

norm

Geomea

n

Bran

ch B

ias

Page 22: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 23: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 24: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 25: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 26: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 27: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 28: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 29: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

ExploiCng  Predictability:  Benefit  from  Next  Branch  Bias

7

Gld r8, [rip+0x5e]ld edi, [rsi*4+0x42]movsxd rcx, r8 assert r9, Gld r10, [rcx*4+0x42]cmp edi, r10jnz L

Hoist From

Cld edx, [rsi*4+0x7d]assert r9, Ccmp edx, 0x6jz I

Dld ecx, [rip+0x58]ld esi, [rip+0x81]lea edi, [rsi+rcx*1]assert r9, Dcmp edi, edxjz J

Eld edx, [rcx*8+0x10]assert r9, Etest edx, edxjz KHoist

From

Hoist From

Hoist From

Bld r8, [rip+0x5b]ld edi, [rsi*4+0x92]movsxd rcx, r8ld r10, [rcx*4+0x92]cmp edi, r10assert r9, Bjnz H

Hoist From

Ald r9, [rax*8+0x94]pred-indirect-jump

25

24 24 14

Fld r8, [rip+0x39]ld edi, [rsi*4+0x46]movsxd rcx, r8 assert r9, Fld r10, [rcx*4+0x46]cmp edi, r10jnz M

11

99 1

19

5 95

96 499 1

1 99

Page 30: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

Page 31: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

Page 32: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 33: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 34: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 35: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 36: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

M

N

O

P

...

0: r0 = load[a]

predict

...

Invalid Target: M

Page 37: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

M

N

O

P

...

0: r0 = load[a]

predict

...

Invalid Target: M

Page 38: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Challenge:  Invalid  Predicted  Target  Address

8

...

0: r0 = load[a]

1: r2 = r0 + r1

2: jmp [r2]Prediction Point

A

B

C

D

Valid Targets: {A, G}Predict A

...

0: r0 = load[a]

predict

...

2: resolve r3, A

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

M

N

O

P

...

0: r0 = load[a]

predict

...

Invalid Target: M

Page 39: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 40: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 41: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

Page 42: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

Page 43: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

Page 44: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

Page 45: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Page 46: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

Page 47: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

Page 48: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

Page 49: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

Page 50: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

r2 = r0 + r1

jmp [r2]

Redirect Fetch

No Predict: Stall Fetch

Page 51: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

r2 = r0 + r1

jmp [r2]

Redirect Fetch

No Predict: Stall Fetch

Page 52: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

SoluCon:  Landing  Pad

9

...

0: r0 = load[a]

predict 0x0

2: resolve r3, A

...

A

B

C

D

marker 0x0

1: r2 = r0 + r1

2*: r3 = load[r2]

...

0: r0 = load[a]

predict 0x0

...

r2 = r0 + r1

jmp [r2]

...

0: r0 = load[a]

predict 0x0

...

0: Prediction Point

r2 = r0 + r1

jmp [r2]

Invalid Target: M

M

N

O

P

Fails 0x0 Check

r2 = r0 + r1

jmp [r2]

Redirect Fetch

No Predict: Stall Fetch

Necessitates  some  changes  to  indirect  call  /return  handling

Page 53: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

Page 54: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

Page 55: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

Page 56: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

Page 57: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

Page 58: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

Page 59: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

Page 60: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

undo B

undo C

RC: undo A

undo D

(4) Fetch Recovery

Code

(5) r3 value from

Resteer

r3: E

...

jmp r3

F

(6) Fetch Correct

Path starting at

r3

Page 61: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

undo B

undo C

RC: undo A

undo D

(4) Fetch Recovery

Code

(5) r3 value from

Resteer

r3: E

...

jmp r3

F

(6) Fetch Correct

Path starting at

r3

Page 62: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

undo B

undo C

RC: undo A

undo D

(4) Fetch Recovery

Code

(5) r3 value from

Resteer

r3: E

...

jmp r3

F

(6) Fetch Correct

Path starting at

r3

Page 63: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

undo B

undo C

RC: undo A

undo D

(4) Fetch Recovery

Code

(5) r3 value from

Resteer

r3: E

...

jmp r3

F

(6) Fetch Correct

Path starting at

r3

Page 64: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

undo B

undo C

RC: undo A

undo D

(4) Fetch Recovery

Code

(5) r3 value from

Resteer

r3: E

...

jmp r3

F

(6) Fetch Correct

Path starting at

r3

Page 65: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Recovery  From  MispredicCon  

10

2: resolve r3, A, RC

A

B

C

D

1: r2 = r0 + r1

2*: r3 = load[r2]

(1) resolve

fails

(2)Resteer

RCaddr to

Front End

(3)Resteer

r3to Front

End

undo B

undo C

RC: undo A

undo D

(4) Fetch Recovery

Code

(5) r3 value from

Resteer

r3: E

...

jmp r3

F

(6) Fetch Correct

Path starting at

r3

Page 66: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 67: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 68: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 69: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 70: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 71: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 72: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

11

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

New

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch BufferTo Back-End

Old

predict instruction: Insert New DBB Entry

Existing BPU

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 73: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

12

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches

• Single  pointer,  single  R/W  port  

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

Inde

x

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch Buffer

Insert DBB Index Into Branch Resolution InstructionTo Back-End

Existing BPU

Page 74: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

12

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches

• Single  pointer,  single  R/W  port  

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

Inde

x

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch Buffer

Insert DBB Index Into Branch Resolution InstructionTo Back-End

Existing BPU

Page 75: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

12

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches

• Single  pointer,  single  R/W  port  

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

Inde

x

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch Buffer

Insert DBB Index Into Branch Resolution InstructionTo Back-End

Existing BPU

Page 76: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

12

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches

• Single  pointer,  single  R/W  port  

History

PC HASH

Write Port

Predictor Table

Read Port

DBB Tail Pointer

Inde

x

Indices

DBB

To Fetch Unit

Indices

Prediction State

Fetch Buffer

Insert DBB Index Into Branch Resolution InstructionTo Back-End

Existing BPU

Page 77: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

13

History

PC HASH

Write Port

Predictor Table

Read Port

Indices

DBB

Resteer Logic To Fetch Unit

From Back-End

DBB Index from Branch Resolution Instruction

New Prediction State

Indices

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 78: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

13

History

PC HASH

Write Port

Predictor Table

Read Port

Indices

DBB

Resteer Logic To Fetch Unit

From Back-End

DBB Index from Branch Resolution Instruction

New Prediction State

Indices

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 79: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

13

History

PC HASH

Write Port

Predictor Table

Read Port

Indices

DBB

Resteer Logic To Fetch Unit

From Back-End

DBB Index from Branch Resolution Instruction

New Prediction State

Indices

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 80: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

13

History

PC HASH

Write Port

Predictor Table

Read Port

Indices

DBB

Resteer Logic To Fetch Unit

From Back-End

DBB Index from Branch Resolution Instruction

New Prediction State

Indices

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 81: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

13

History

PC HASH

Write Port

Predictor Table

Read Port

Indices

DBB

Resteer Logic To Fetch Unit

From Back-End

DBB Index from Branch Resolution Instruction

New Prediction State

Indices

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 82: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Hardware  Requirements

13

History

PC HASH

Write Port

Predictor Table

Read Port

Indices

DBB

Resteer Logic To Fetch Unit

From Back-End

DBB Index from Branch Resolution Instruction

New Prediction State

Indices

• Maintain  correspondence  between  predic<on/resolu<on• Small  FIFO:  not  many  branches  outstanding  and  no  reordering  of  predic<on  and  resolu<on  instruc<ons–Dovetails  with  exis<ng  structures  for  outstanding  branches• Single  pointer,  single  R/W  port  

Page 83: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Experimental  Methodology

14

• Profile  Guided  Op<miza<on                                                                                                                  LLVM  3.5

• Benchmarks  with  0.3%  dynamic                                                                                                instruc<on  stream  indir  branch  

• SPEC  TRAIN  input,  PHP  and  Python  first  input• Cycle  Accurate  x86  simulator  PTLSim  provides  predictability• Transform  non-­‐loop  branches  with  pred  >  bias                                                          and  (pred  -­‐  bias)  >  3%

• Run  SPEC,  PHP,  Python  on  PTLSim  using    PGO  binaries  with  and  without  predic<on  guide  on  REF  

• Specula<on  support  :  (alias/shadow  registers)                                                                                                            for  baseline  and  experimental

• Improved  Indirect  Branch  Predictor:  VPC    (TAGE  study  in  paper)•  

second approach would be to mark the DBB entries as invalid on one of these exceptional control flow events and suppress any

branch predictor updates using an invalid DBB entry.

We size the DBB empirically. In practice, the number of outstanding decomposed branches is small; there tends to be

significant back-pressure in the in-order due to head-of-line blocking in the issue stage. We found that 16 entries were more than

sufficient, resulting in a 4-bit DBB index that needs to be stored with resolution instructions. In our implementation, each DBB

entry contains 24 bits: 16 bits for the indices into the branch prediction table hierarchy and 8 bits for the prediction metadata.

The DBB itself requires one read and one write port

7. Experimental Evaluation

In our evaluation, we use LLVM 3.5, PTLSim [44], and the SPEC 2006 and 2000 benchmark suites. As it is not profitable to

decompose all branches, we use a profile-guided strategy to select the set of branches to transform. We run the TRAIN input

sets to completion in PTLSim to collect branch bias and predictability. We transform forward branches1 whose predictability

exceeds bias by at least 5%; this heuristic provided the best overall performance. Code generation is handled by LLVM at the

-O3 level with PGO and SSE2 vectorization.

For performance evaluation, we use REF input sets. We select hot spots for simulation using Intel’s Software Development

Emulator [5]; simulation is up to 10B dynamic instructions. We simulate three different width in-order superscalars (2-wide,

4-wide, and 8-wide, with configuration parameters shown in Table 1). For our experimental system, PTLSim was extended with

a decoupled branch buffer (DBB, as described in Section 6) as well as the support for speculative compilation as discussed in

Section 4.2, including “shadow registers” which allows the compiler to re-use architectural registers for speculative computations

and commit them when resolution instructions commit.

Key Structures Configuration ParametersBpred PTLSim default: GShare, 24 KB 3-table direction

predictor, 4K-entry BTB, 64-entry RASFront-End 5 stages, Experimentally Varied 2/4/8 wide

Fetch/Decode/Dispatch, 32-entry FetchBufferExecution Ports Experimentally Varied 2/4/8Functional Units Up to 2 x LD/ST, 2 x INT/SIMD-Permute, 4 x 64-bit

SIMD/FP, 1-cycle bypassL1 Caches 8-way 32 KB L1-D$, 4-way 32 KB L1-I$, 64B lines,

4-cycle latencyL2 Cache 16-way 256KB Unified, 12-cycle latencyL3 Cache 32-way 4MB LLC, 25-cycle latencyMiss Handling 64-entry Miss Buffer, 64-entry Load Fill Request

QueueMain Memory 140-cycle latency

Table 1: Machine Configuration Parameters

1Backwards-taken branches i.e. loop branches tend to be highly biased and highly predictable and are ably handled by well-known loop transformations suchas modulo scheduling.

14

Page 84: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Performance:  SPEC95,  SPEC2K,  SPEC2K6

• Performance  propor<onal  to  %  of  dynamic  instruc<ons  which  are  indirect  branches  (PDS)  and  amenable  to  transforma<on  (weighted  averaged  bias:  WAB)• richards:  4%  PDS,  41%  WAB  • m88ksim:  0.4%  PDS,  57%  WAB• Geomean:  1.1%  PDS,  62%  WAB 15

0%

6%

12%

18%

24%

30%

gcc95 li

m88ksi

mperl

95 eon

gapgcc

2kperl

2k

gcc2k

6

h264

ref

omnetpp

perl2k

6povra

ysje

ngxal

anc

richa

rds

Geomea

n

2-wide 4-wide

Page 85: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Performance:  PHP

• Specnorm:  1.4%  PDS,  18%  WAB• run:  0.6%  PDS,    49%  WAB• Geomean:  1.4%  PDS,  37%  WAB

16

0%

10%

20%

30%

40%

50%

bench

btree

fannk

uch

fasta

fastar

edux

knuk

e

mandelb

rotnb

ody

regex

dna

revco

mp run

spec

norm

Geomea

n

2-wide 4-wide

Page 86: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Performance:  Python  

• regexdna:  2.7%  PDS,      72%  WAB• revcomp:  1.0%  PDS,      90%  WAB• Geomean:  1.8%  PDS,  73%  WAB

17

0%

5%

10%

15%

20%

btree

fannk

uch

fasta

float

knuk

eman

delbr

otmete

ornb

ody

quee

nsray

trace

regex

dna

revco

mpric

hards

spec

norm

Geomea

n

Spee

dup

2-wide 4-wide

Page 87: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Efficiency

18

0%

1%

2%

3%

4%

gcc95 li

m88ksimperl95 eon gapgcc2k

perl2kgcc2k6

h264ref

omentppperl2k6

povray

sjengxalanc

richards

Geomean

SPEC

Extr

a In

strs

Issu

ed

0%

1%

2%

3%

4%

btree

fannkuchfasta float

knuke

mandelbrotmeteor

nbodyqueens

raytrace

regexdna

revcomprichards

specnorm

Geomean

Python

Extr

a In

strs

Issu

ed

0%

1%

2%

3%

4%

benchbtree

fannkuchfasta

fastaredux

knuke

mandelbrotnbody

regexdna

revcomp run

specnorm

Geomean

PHP

Extr

a In

strs

Issu

ed

Better

Page 88: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Conclusions• Straighaorward,  low-­‐cost  “enabling”  transforma<on–Leverages  DBT  profiling  and  specula<on  facili<es  

• Modest  Hardware  Requirements• Leverages  Advances  In  Indirect  Branch  Predic<on• Good  Performance  across  Integer  and  Floa<ng  Point• Maintains  the  Efficiency  of  the  In-­‐Order

19

Page 89: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Thank  You  

20

Page 90: Bungee Jumps: Accelerating Indirect Branches …lea edi, [rsi+rcx*1] cmp edi, edx jz J E ld edx, [rcx*8+0x10] test edx, edx jz K B ld r8, [rip+0x5b] ld edi, [rsi*4+0x92] movsxd rcx,

Thank  You  

20

QuesCons?