execution on the the eff ects of ooo ex ecution on memory ... · proposals to build large robs ......

37
PhD Research Proposal The Effects Of OoO Execution On The Memory System Aamer Jaleel University of Maryland ECE Dept. SLIDE 1 UNIVERSITY OF MARYLAND The Effects of OoO Execution on The Memory System Ph.D. Research Proposal Aamer Jaleel [email protected] Proposal Committee: Dr. Rajeev Barua {[email protected]} Dr. Bruce Jacob {[email protected]} Dr. Donald Yeung {[email protected]}

Upload: others

Post on 29-Sep-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 1

UNIVERSITY OF MARYLAND

The Eff ects of OoO Ex ecution on The Memor y System

Ph.D. Research Proposal

Aamer [email protected]

Proposal Committee:

Dr. Rajeev Barua {[email protected]}

Dr. Bruce Jacob {[email protected]}

Dr. Donald Yeung {[email protected]}

Page 2: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 2

UNIVERSITY OF MARYLAND

Overview

Motiv ation

The Problem

Effects on Repla y Traps

Effects on Cac he Performance

A New Metric -

Disorder

Absolute Disor der

Relative Disor der

Proposed Work

Sensitivity Studies

Dynamic Mec hanisms

Page 3: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 3

UNIVERSITY OF MARYLAND

Out-of-Or der Ex ecution

Overlap idle time caused b y long latenc y operations with

possible

useful w ork

Floating point latenc y: 10-15 cycles

Memor y latenc y: 100-2000 cycles

Widel y held belief that a pr ocessor’ s OoO effi cienc y depends on the n umber of instructions it vie ws at a given time

Large Instruction Windo ws / ROBs!!!

More Instruction Level Parallelism!!!!!!!!

Page 4: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 4

UNIVERSITY OF MARYLAND

OoO Hardware Structures

ROB, Integ er and Floating P oint Issue Queues, Load and Store Queues

ROBIQ

FQ

LQ

SQBP

IC

LP

RN

FETCHRENAME

UNIT

SCH

HD HD

HD

HD HD

TL

TL

TL

BP = Branch PredictorLP = Line Predictor

IC = Instruction CacheRN = Register Rename

Page 5: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 5

UNIVERSITY OF MARYLAND

Research Trend in Increasing ILP

Proposals to b uild lar ge ROBs

Akkar y et al., Lebec k et al., Skadr on et al.

Circuit tec hniques to b uild lar ge ROBs with out aff ecting c loc k cycle time

Brown et al., Henry et al., Onder et al.

Techniques to allo w for more load/store queue comm unications

Park et al., Akkar y et al.

Page 6: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 6

UNIVERSITY OF MARYLAND

Large Instruction Windo ws

Lebec k et al. sho w 35-250% perf ormance impr ovements with lar ge ROBs

However, when one consider s the discounted real eff ects , most of these perf ormance impr ovements disappear

Replay Traps

Cache Misses

Page 7: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 7

UNIVERSITY OF MARYLAND

Replay Traps - Bac kgr ound

Replay traps are required to

Force accesses to a par ticular memor y location in or der

Handle diff erent-siz ed accesses to the same memor y location

Two reco very sc hemes

Flush pipeline , and then re-f etch and re-execute all instructions fr om the repla y trap causing instruction

No fl ush, re-execute ONLY repla y trap causing instruction and all other instructions that directl y or indirectl y depend on it

Page 8: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 8

UNIVERSITY OF MARYLAND

Types of Repla y Traps

Load-Store Repla y

(a) Load-Store Repla y

2.

ST BYTE A (3)

3.

LD BYTE A (2)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

Page 9: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 9

UNIVERSITY OF MARYLAND

Types of Repla y Traps

Load-Store Repla y

Wrong Siz e Replay

(b) Wrong Siz e Replay

(a) Load-Store Repla y

2.

ST BYTE A (3)

3.

LD BYTE A (2)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

2.

ST BYTE A (2)

3.

LD HALF A (3)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

Page 10: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 10

UNIVERSITY OF MARYLAND

Types of Repla y Traps

Load-Store Repla y

Wrong Siz e Replay

Multi-Pr ocessor

Load-Load Repla y

(b) Wrong Siz e Replay

(a) Load-Store Repla y

2.

ST BYTE A (3)

3.

LD BYTE A (2)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

2.

ST BYTE A (2)

3.

LD HALF A (3)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

2.

ST BYTE A (2)

3.

LD BYTE A (1)

1.

LD BYTE A (4)

4.

LD BYTE B (3)

P2P1

(c) Load-Load Repla y

Page 11: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 11

UNIVERSITY OF MARYLAND

Types of Repla y Traps

Load-Store Repla y

Wrong Siz e Replay

Multi-Pr ocessor

Load-Load Repla y

Load-Miss Load Repla y

(b) Wrong Siz e Replay

(d) Load-Miss Load Repla y

(a) Load-Store Repla y

2.

ST BYTE A (3)

3.

LD BYTE A (2)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

2.

ST BYTE A (2)

3.

LD HALF A (3)

1.

LD BYTE A (1)

4.

LD BYTE B (4)

2.

ST BYTE A (2)

3.

LD BYTE A (1)

1.

LD BYTE A (4)

4.

LD BYTE B (3)

P2P1

(c) Load-Load Repla y

3.

LD BYTE A (3)

2.

ST BYTE A (2)

1.

+

LD BYTE A (1)

4.

LD BYTE B (4)

P2P1

Page 12: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 12

UNIVERSITY OF MARYLAND

The Problem

Aggressive OoO tec hniques result in an

Increase in frequenc y of

replay traps

Increase in n umber of L1

cache misses

appluartmgridswimgcc

gzipmcf

twolfFAvg

IAvg0

20

40

60

appluartmgridswimgcc

gzipmcf

twolfFAvg

IAvg0

10

20

30

40

% Total Ex ecution Time Spent in Handling Traps % Increase in L1 Cac he Misses Compared to 80-Entr y ROB

80 Entry Rob128 Entry Rob256 Entry Rob512 Entry Rob

Page 13: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 13

UNIVERSITY OF MARYLAND

Hypothesis

Increasing R OB siz es reor der ALU and memor y instructions

Re-ordering of ALU instructions poses little or no threats, BUT re-or dering of memor y instructions causes most of the negative eff ects

3 Processor Confi gurations

ALU-in / MEM-in : ALU and memor y instructions issued in-or der (

sequential

)

ALU-out / MEM-in : ALU instructions issued out-of-or der, memor y instructions issued in-order (

more speculation than ALU-in/MEM-in

)

ALU-out / MEM-out : Both ALU and memor y instructions are issued OoO (

superscalar

)

Page 14: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 14

UNIVERSITY OF MARYLAND

Experimental Frame work

Sim-Alpha

Validated Ex ecution Driven Sim ulator

Detailed DRAM Sim ulator

64K 2-way IL1/DL1 and 2MB 4-wa y Unifi ed L2

8 MSHRS per cac he

1024-entr y store-wait data structure

Suppor ts No/ Sequential /Stride Data Pref etch

Benc hmarks

SPEC2000 & Olden

Table 1: Processor Parameters

Configuration Name

ROB Size Issue WidthIssueQ Size

INT/FP# Functional

Units

**

LQ/SQSize

Alpha-80 80 2-32 Way 20/15 (4/4/1/1) x1 32/32Alpha-128 128 2-32 Way 40/30 (4/4/1/1) x2 64/64Alpha-256 256 2-32 Way 80/60 (4/4/1/1) x4 128/128Alpha-512 512 2-32 Way 160/120 (4/4/1/1) x8 256/256

**INT ALU/INT MULT/FP ALU/FP MULT

Page 15: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 15

UNIVERSITY OF MARYLAND

OoO And Repla y Trap Overhead

ALU-in/MEM-in --> ALU-out/MEM-out

Factor 8-15 increase in trap frequenc y

15-30% increase in trap o verhead

8-Way 808-Way 128

8-Way 2568-Way 512

0

10

20

30

40

50

Trap

Ove

rhea

d: C

ycle

s Lo

st in

Tra

ps

ALU-in / MEM-inALU-out / MEM-inALU-out / MEM-out

Page 16: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 16

UNIVERSITY OF MARYLAND

OoO And Cac he Misses

ALU-in/MEM-in --> ALU-out/MEM-out

25-75% increase in L1 Cac he Misses

80-125% increase in L2 Cac he Misses

8-Way 808-Way 128

8-Way 2568-Way 512

0

25

50

75

100

8-Way 808-Way 128

8-Way 2568-Way 512

0

25

50

75

100

125

150

175

200

% Increase in L2 Cache Misses% Increase in L1 Cache Misses

Normalized to # L1 Cache Misses of ALU-in/MEM-in

ALU-out / MEM-inALU-out / MEM-out

Page 17: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 17

UNIVERSITY OF MARYLAND

What Have We Learned?

In general, increasing a pr ocessor’ s OoO capability confl icts with the memor y ordering requirements and cac he perf ormance

Limiting the memor y instructions to be issued in pr ogram or der aids in reducing these negative eff ects

Downside of MEM in-or der: memor y ILP hur t

Future a ggressive OoO pr ocessor s need a mechanism to thr ottle the degree b y whic h the y issue memor y instructions out-of-or der

Page 18: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 18

UNIVERSITY OF MARYLAND

Intr oducing A Ne w Metric

Disorder

— The degree b y whic h an instruction is issued out-of-or der

Two types of disor der

Absolute disor der — On a pr ogram or der perspective

Relative disor der — On a per instruction executed per spective

INST 1

INST 2

INST 3

INST 4

INST 5

INST 6

INST 7

INST 6

INST 3

INST 1

INST 4

INST 5

INST 7

INST 2

Absolute Disor der

Initial Pr ogram Or der Executed Pr ogram Or der

Relative Disor der

Scheduler

Page 19: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 19

UNIVERSITY OF MARYLAND

Disor der Measurements

We measure disor der ONLY for memor y instructions

When a memor y instruction is f etched, it is assigned a sequential ID , i.e. the fi rst memor y instruction f etched g ets sequential ID 1, next g ets sequential ID 2, and so on.

In the e vent of a pipeline fl ush, the sequential ID is restored to the last successfull y retired sequential ID + 1

Disor der is computed after a memor y instruction has its dependencies resolved and is issued to e xecute

Page 20: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 20

UNIVERSITY OF MARYLAND

Absolute Disor der

Degree b y whic h an instruction is issued OoO compared to f etch or der

Computed b y fi nding the diff erence between the instruction issued and the instruction that

should’ve been issued

were the core in-or der

MEM

1

MEM

2

MEM

3

MEM

4

MEM

5

MEM

6

MEM

7

MEM

8

MEM

9

MEM

10

MEM

11

MEM

1

MEM

3

MEM

5

MEM

7

MEM

8

MEM

2

MEM

10

MEM

4

MEM

9

MEM

11

MEM

6

0

1

2

3

3

- 4

3

- 4

0

1

- 5

4 - Way Issue Order

Cycle 101: 1, 3Cycle 105: 5, 7, 8Cycle 126: 2, 10Cycle 139: 4Cycle 213: 9, 11Cycle 224: 6

PROGRAM ORDER ISSUE ORDER ABSOLUTE DISORDER

Page 21: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 21

UNIVERSITY OF MARYLAND

Relative Disor der

Degree b y whic h an instruction is issued OoO compared to other instructions

Computed b y fi nding the minim um absolute disor der between an instruction and all other instructions issued in the current and a pre vious c ycle

MEM

1

MEM

2

MEM

3

MEM

4

MEM

5

MEM

6

MEM

7

MEM

8

MEM

9

MEM

10

MEM

11

1

-3

2

2

2

3

-1

1

-2

2

2

4 - Way Issue Order

Cycle 101: 1, 3Cycle 105: 5, 7, 8Cycle 126: 2, 10Cycle 139: 4Cycle 213: 9, 11Cycle 224: 6

Relative Disorder of

10 - 2 = 810 - 5 = 510 - 7 = 310 - 8 = 2

MEM10

Minimum = 2

RELATIVE DISORDER

Page 22: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 22

UNIVERSITY OF MARYLAND

Disor der Trends

Vary Issue Widths fr om 4 to 32 wide

Vary ROB Sizes fr om 80 to 512 entries

Increasing Issue WidthsIn

crea

sing

RO

B S

izes

Absolute Disor derRelative Disor der

+ Disor der

%In

str

60%

30%

70%

9%

- Disor der

Page 23: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 23

UNIVERSITY OF MARYLAND

Disor der Results

Low disor der is due to dependenc y stalls and L1 miss penalty . High disor der is due to L2 miss penalty

Absolute Disor der

Increasing R OBs and not issue widths cause an increase in absolute disor der

< 1/3

rd

of TOTAL memor y instructions executed are issued in actual pr ogram or der

Relative Disor der

60 – 70% of memor y instructions issued are in c lose pr oximity with eac h other in the instruction stream

Spatial issue among memor y instructions, i.e. “when a pr ocessor speculates it contin ues to speculate”

Page 24: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 24

UNIVERSITY OF MARYLAND

Proposed Work

Sensitivity Studies

Cache Organization P arameter s

Load/Store Queue Studies

Dynamic Mec hanisms

Windo wing of Load/Store Queue

?????

Page 25: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 25

UNIVERSITY OF MARYLAND

Disor der Vs. Cache Organization

Cache Parameter s

Cache Size

Cache Line Siz e

Cache Associativity

Vary Cache Associativity

Vary

Cach

e Size

Page 26: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 26

UNIVERSITY OF MARYLAND

Disor der Vs. Load/Store Queue

Intr oduce a vir tual windo w into the load/store queue

Only instructions residing within the vir tual windo w may be issued

Other s wait until the vir tual windo w slides onto them

LSQ Tail

LSQ Head

LD/ST 1LD/ST 2

LD/ST 4

.LD/ST 5

LD/ST 0

LD/ST 3

.LD/ST N-1LD/ST N

Vir tualWindo wSize=Inf

Traditional Load/Store Queue

LD/ST 1LD/ST 2

LD/ST 4

.LD/ST 5

LD/ST 0

LD/ST 3

Vir tual Head

Vir tual Tail

.LD/ST N-1LD/ST N

LSQ Tail

LSQ Head

Vir tualWindo wSize=4

Vir tual Load/Store Queue

Page 27: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 27

UNIVERSITY OF MARYLAND

Windo wing of Load/Store Queue

Static Mec hanism (Sensitivity Stud y):

Staticall y set the siz e of the vir tual windo w based on pr ofi le inf ormation of the application in concern

Drawbac k: Memor y ILP is lost during periods of application e xecution where negative effects do not e xist

Dynamic Mec hanism:

Dynamicall y vary the siz e of the vir tual windo w based on application beha vior during execution

Vir tual windo w initiall y star ts at infi nity b ut as cer tain thresholds are reac hed the vir tual windo w siz e is reduced

Page 28: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 28

UNIVERSITY OF MARYLAND

Conc lusions

Though the well kno wn mec hanism of increasing ILP impr oves perf ormance , it can cause side eff ects in the memor y system

Characteriz ed the pr oblem in terms of

Increased Repla y Traps

Increased Cac he Misses

Sour ce of pr oblem is the reor dering of memor y instructions

Proposing to stud y mec hanisms to thr ottle the degree b y whic h memor y instructions are issued OoO

Page 29: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 29

UNIVERSITY OF MARYLAND

Questions?????

Page 30: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 30

UNIVERSITY OF MARYLAND

Impr oving P erformance

Execution Time = Cyc le time * CPI * Inst Cnt

Instruction Le vel Parallelism (ILP)

Pipelining

Multiple Issue Width

Out-of-Or der Ex ecution

Speculation

Cache Line Prediction

Branc h Prediction

Prefetching

Hardware Pref etching

Software Pref etching

Page 31: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 31

UNIVERSITY OF MARYLAND

Industr y Trends

Two Design Philosophies

Brainiacs

: Impr ove micr opr ocessor perf ormance b y increasing ILP

Speed Demons

: Impr ove micr opr ocessor perf ormance b y increasing c loc k speeds

Page 32: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 32

UNIVERSITY OF MARYLAND

Why Use a Vir tual Windo w?

Provide a mec hanism to allo w for lar ge ROBs to e xploit ALU instruction ILP y et provide the benefi ts of smaller R OBs

Since memor y instructions held in load/store queues (LSQ), could ha ve lar ge ROB siz es and small LSQs.

Effective in limiting n umber of memor y instructions in fl ight, hence reduces disor der

Ineffi cient design methodology as it can under utiliz e the ROB space

Allo w for lar ge ROB and LSQ siz es but create a

virtual window

in LSQ

Staticall y or d ynamicall y vary the siz e of vir tual windo w

Page 33: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 33

UNIVERSITY OF MARYLAND

Performance Graphs

art gccmcf

parserperlbmk

swimtwolf

vortex vpr

0

1

2

3

(ALU-out / MEM-out)

art gccmcf

parserperlbmk

swimtwolf

vortex vpr

0

1

2

3

art gccmcf

parserperlbmk

swimtwolf

vortex vpr

0

1

2

3

(ALU-in / MEM-in)

(ALU-out / MEM-in)

MemoryALUOverhead

abcdefghi

a - 2-Way ROB 80b - 4-Way ROB 80c - 8-Way ROB 80d - 4-Way ROB 128

e - 8-Way ROB 128f - 4-Way ROB 256g - 8-Way ROB 256h - 4-Way ROB 512i - 8-Way ROB 512

Page 34: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 34

UNIVERSITY OF MARYLAND

Performance Graphs

art gccmcf

parserperlbmk

swimtwolf

vortex vprbisort

em3dhealth mst

perimetertreeadd

0

1

2

3

4

5

6

7

art gccmcf

parserperlbmk

swimtwolf

vortex vprbisort

em3dhealth mst

perimetertreeadd

0

1

2

3

4

5

6

7

art gccmcf

parserperlbmk

swimtwolf

vortex vprbisort

em3dhealth mst

perimetertreeadd

0

1

2

3

4

5

6

7

(ALU-in / MEM-in)

(ALU-out / MEM-in)

(ALU-out / MEM-out)

Memory Stall in CommitNon-Memory Stall in CommitOverhead

abcdefghi

a - 2-Way ROB 80b - 4-Way ROB 80c - 8-Way ROB 80d - 4-Way ROB 128e - 8-Way ROB 128f - 4-Way ROB 256g - 8-Way ROB 256h - 4-Way ROB 512i - 8-Way ROB 512

Page 35: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 35

UNIVERSITY OF MARYLAND

The Problem - Cac he Misses

Aggressive OoO tec hniques result in an increase in application

cache misses

ammpapplu

apsi artequake

fma3dgalgel

lucasmesa

mgridswimwupwise

bzip2craftyeon gap gcc

gzipmcf

parser

perlbmktwolf

vortexvpr

FAVGIAVG

0

10

20

30

40

80 Entry Rob128 Entry Rob256 Entry Rob512 Entry Rob

% Increase in L1 Cac he Misses Compared to 80-Entr y ROB

Page 36: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 36

UNIVERSITY OF MARYLAND

The Problem - Repla y Traps

Aggressive OoO tec hniques result in an increase in

replay traps

Aggravate with increasing R OB siz es.

80 Entry Rob128 Entry Rob256 Entry Rob512 Entry Rob

ammpapplu

apsi artequake

fma3dgalgel

lucasmesa

mgridswimwupwise

bzip2craftyeon gap gcc

gzipmcf

parser

perlbmktwolf

vortexvpr

FAVGIAVG

0

20

40

60

80

% Total Ex ecution Time Spent in Handling Traps

Page 37: Execution On The The Eff ects of OoO Ex ecution on Memory ... · Proposals to build large ROBs ... ST BYTE A (3) 3. LD BYTE A (2) 1. LD BYTE A (1) 4. LD BYTE B (4) PhD Research Proposal

PhD Research

Proposal

The Effects Of OoOExecution On The

Memory System

Aamer Jaleel

University ofMaryland

ECE Dept.

SLIDE 37

UNIVERSITY OF MARYLAND

2-Way 808-Way 80

8-Way 1288-Way 256

8-Way 512-30

-25

-20

-15

-10

-5

0

Decrease in L1 Cache Misses

2-Way 808-Way 80

8-Way 1288-Way 256

8-Way 512

1

1e1

1e2

1e3

1e4

1e5

1e6

Trap Frequency: Instructions/Trap