chen-yong cher & t. n. vijaykumar - microarch · chen-yong cher & t. n. vijaykumar school...

28
Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University http://www.ece.purdue.edu/~vijay

Upload: others

Post on 31-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Chen-Yong Cher & T. N. Vijaykumar

School of Electrical and Computer EngineeringPurdue University

http://www.ece.purdue.edu/~vijay

Page 2: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 2

Accuracy is not 100% due to difficult branches� Complex branching patterns� Conflicts in prediction tables

Trends show deeper pipelines (e.g., 20-stage Pentium 4)� One misprediction squash

� At least 15 cycles or 15 x 4 = 60 instructions� At 5% mispredictions, CPI = 0.25 + 0.2*0.05*15 = 0.40

� Actually, squashes cost more due to late outcomes

Branch mispredictions cause significant performance loss

Page 3: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 3

branch PC2

A…

B…

C…

TakenNot Taken

Control-flow independent

Control-flow dependent

ExecutedIrrespectiveof branch outcome

Page 4: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 4

Skip over control-flow dependent code� For only difficult branches� Without even fetching control-flow dependent code� Execute control-flow independent code� Execute control-flow dependent code after branch resolves� Conserve hardware resources

Today’s OoO pipelines routinely exploit data independence� But not control-flow independence directly

Page 5: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 5

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 6: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 6

Tim

e Some of data-independent C

Correct Incorrect Skipper

Predict not taken Predict taken Skip

Resolve not taken Resolve not taken Resolve not taken

Some of A & C Some of B & C

Rest of A & C Squash ALL B & C

Re-execute ALL of A & C

A & rest of CBranch PC2

A B

C

IncorrectCorrect

Page 7: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 7

Execution is out of orderBut fetch and rename are in orderInstruction Window maintains precise interrupt

Relies on fetching in program order

predict/fetch decode rename

OoOissue

regread execute

branchor

cachewriteback

Page 8: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 8

Skipping results in out-of-order fetching� First fetch control-flow independent� Then fetch control-flow dependent

Convince an in-order fetch pipeline to fetch out-of-order!

Page 9: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 9

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 10: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 10

� When: only difficult branches �JRS low confidence predictor [MICRO ‘96]�Count consecutive correct predictions�Identify as difficult if recently mispredicted

Page 11: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 11

� Hardware Heuristic based on If-Then-Else� Learn and keep in table� Branch PC2 # difficult branch (step 1)� A� …� Jump PC3 # jump instruction (step 2)� PC2: B # target of difficult branch� …� PC3: C # target of jump instruction

� Reconvergence PC: PC3 for If-Then-Else, PC2 for If (step 3)

Page 12: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 12

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 13: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 13

�Create a gap in instruction window �Fill the gap later when fetching skipped instructions

�Learn the gap length from past�Use largest length of if/else paths conservatively�squash if actual instruction count exceeds gap length

Despite out-of-order fetch, program order in I-window

Page 14: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 14

Prog

ram

Ord

er

Instruction Window

Gap

Control-flow independent

Control-flow dependent A B

C

Branch PC2Head

Tail

Program Order

FetchedFirst

FetchedLater

Out-Of-OrderFetching

Page 15: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 15

Prog

ram

Ord

er

Instruction Window

GapA B

C

Branch PC2Head

Tail

Program Order

FetchedFirst

FetchedLater

Inputregs (2)

Outputregs (1)

Page 16: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 16

� How will data dependent instructions wait for skipped instructions�Learn outputregs written by control dependent insts�Preallocate and preassign for outputregs, mark “busy” �Insert Pmoves instructions after gap filled�pmoves copy values to preallocated after gap filled

�If actual output not in outputregs, squash

Use normal rename and wake-up mechanism

Gap

Page 17: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 17

� How will control-flow dependent instructions know the correct registers to source�Learn inputregs read by control dependent insts�Cannot backup all rename maps in single cycle�Backup only inputregs and outputregs

�Skipped instructions use backup rename table�If actual input not in inputregs, squash

Use normal rename backup mechanismGap

Page 18: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 18

backup in/outputregs’ rename maps

wbmem/br

execreadOoOissue

rendecfet

fetch next from reconv PC

mark busy

create Inst-Window gap

allocate new regs

place in Inst-Window gap

fetch skippedinsts

Last inst Inserts pmoves

Preassign for outputregs

Usual

DifficultBranch

Skipped lookup in backup rename table

Page 19: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 19

� Introduction� Skipper: When and Where to Skip� Skipper: How to skip� Results� Conclusions

Page 20: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 20

Simplescalar simulator� 8k/8k/8k entries Hybrid predictors, commit-update� 9-cycle misprediction penalty� 4K-entry, 4-bit JRS

� 64K 2-way L1 I & D caches, 2M L2 cache

� 128-entry information table of 3KB total

Page 21: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 21

� Speedup 10% over base � Compress – deep data dependent� Cc1, go –mispredictions in control-dependent path� Perl, vortex – low misprediction rate and low coverage

0.900.951.001.051.101.151.20

cc1

compre

ss go

ijpeg li

m88ks

im perl

vorte

x

Spee

dup 128

256

Page 22: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 22

� Speedup 8% over Polypath� Polypath executes both if & else paths� Equal I-cache bandwidth for all machines

0.900.951.001.051.101.151.20

cc1co

mpress go ijpeg li

m88ksim perl

vortex

Skipper 128Polypath 128

Skipper 256Polypath 256

Spee

dup

Page 23: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 23

� Actual Coverage Mean: 23% of mispredictions� Overshoot Mean: 4.3% of all branches

� Mean of branch misprediction rate�Skipper’s: 4.06%�Superscalar’s 6.53%

Page 24: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 24

Exploits control-flow independence for difficult branches� Fetch control-independent code while branch resolves� Fetch control-dependent code after the branch is resolved

� Out-of-order instruction fetch � Mechanisms: Inst-Window gap, Preallocation, Pmoves

� Performs better �10% over Superscalar�8% over Polypath

Page 25: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 25

A B

C…

Branch PC6

Branch PC2

Program Order Predictor relies on fetching in-order

Missingpatternhistories

Shift In Predictionhistory

Pattern History

Page 26: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 26

� Compiler might confuse reconvergence PC heuristic1. Compiler changes code patterns (trace scheduling)

� But only performed non-difficult branches

2. Compiler changes control instructions(branch to jump)

3. Compiler increases # of control-dependent: (Example: tail duplication)� Increasing gap length to unacceptably large number

Page 27: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 27

2416891208798go

Misprediction RatioCoverage

981009910098

10092

HeuristicAccuracy

112177988vortex432169894perl4211329078m88ksim846177796li938589690ijpeg

12892510098compress1087197592cc1

Superscalar’s

Skipper’sOvershootActualHeuristicJRSBenchmarks

Page 28: Chen-Yong Cher & T. N. Vijaykumar - Microarch · Chen-Yong Cher & T. N. Vijaykumar School of Electrical and Computer Engineering Purdue University vijay

Copyright © 2001 by Chen-Yong Cher & T. N. Vijaykumar 28

2110881.4go

13891613

1014

#slot

8551.0vortex4551.3perl5442.1m88ksim9551.2li5662.0ijpeg

4341.5compress7461.4cc1

#inst#out#in#gaps

Benchmarks