1 1998 morgan kaufmann publishers chapter six enhancing performance with pipelining

43
1 1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

11998 Morgan Kaufmann Publishers

Chapter Six

Enhancing Performance with Pipelining

21998 Morgan Kaufmann Publishers

Definition

• Pipeline is an implementation technique in which multiple instructions are overlapped in execution.

• We’ll use a laundry analogy for pipelining to explain the main concepts.

• There are four stages in doing the laundry:

– put dirty clothes to the washer (wash)

– placed washed clothes in the dryer (dry)

– place the dry load on the table and fold (fold)

– put clothes away (store)

• What about the MIPS instruction?

31998 Morgan Kaufmann Publishers

Single-Cycle vs Pipelined Performance

• Look at lw, sw, add, sub,and, or, slt and beq.• Operation time for major functional components:

– 200ps for memory access– 200ps for ALU operation– 100ps for register file read or write

• Total execution time for 3 instructions:– 3x800ps=2.4 ns for a single-cycled,non-pipelined processor– 1.4 ns (see Figure in next page) for a pipelined processor

• Total execution time for 1003 instructions:– 1000x800ps + 2400 ps = 802.4 ns for a single-cycled,non-pipelined processor– 1000x200ps + 1400 ps= 201.4 ns for a pipelined processor

• Speedup is less than the number of stages because:– stages may be imperfectly balanced– overhead involved

41998 Morgan Kaufmann Publishers

Pipelining

• Improve performance by increasing instruction throughput

Each instruction still take the same time to execute• Ideal speedup is number of stages in the pipeline. Do we achieve this?

Instructionfetch

Reg ALUData

accessReg

8 nsInstruction

fetchReg ALU

Dataaccess

Reg

8 nsInstruction

fetch

8 ns

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14

...

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

51998 Morgan Kaufmann Publishers

Pipelining in MIPS- What makes it easy

• All instructions are the same length: instruction fetch (1st pipeline stage) and decoding(2nd stage) are much easier

• MIPS has just a few instruction formats, source register field in the same location ==> register file read and instruction decoding can be done at the same time

• Memory operands appear only in loads and stores (as opposed to 80x86, where we could operate on the operands in memory)

• Operands must be aligned in memory: need not worry about a single data transfer instruction requiring two data memory accesses.

61998 Morgan Kaufmann Publishers

Pipelining in MIPS- What makes it hard?

• Structural hazards: suppose we had only one memory

• Control hazards: need to worry about branch instructions

• Data hazards: an instruction depends on a previous instruction

71998 Morgan Kaufmann Publishers

Structural Hazards

• If we have a fourth instruction in the following figure?

• What happens between time 6 and 8 ns?

2 4 6 8 10 12 14

...

P rogramexecutionorder(in instruc tions)

Instructionfetch

Reg ALUData

accessReg

Time

lw $1, 100($0)

lw $2, 200($0)

lw $3, 300($0)

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 nsInstruction

fetchReg ALU

Dataaccess

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Programexecutionorder(in instructions)

81998 Morgan Kaufmann Publishers

Control Hazards

• Possible solution:

– stall: to pause before continuing the pipeline, not efficient if we have a long pipeline

– pipeline stall is also known as bubble

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)4 ns

Instructionfetch

Reg ALUData

accessReg

2ns

Instructionfetch

Reg ALUData

accessReg

2ns

2 4 6 8 10 12 14 16Programexecutionorder(in instructions)

The above figure assumes that we have extra hardware in place to resolve the branch in the second stage. Otherwise the pause will be

longer than 4ns.

91998 Morgan Kaufmann Publishers

Control Hazards

• Another solution: Predict

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

Instructionfetch

Reg ALUData

accessReg

2 ns

Instructionfetch

Reg ALUData

accessReg

2 ns

Programexecutionorder(in instructions)

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5 ,$6

or $7, $8, $9

Instructionfetch

Reg ALUData

accessReg

2 4 6 8 10 12 14

2 4 6 8 10 12 14

Instructionfetch

Reg ALUData

accessReg

2 ns

4 ns

bubble bubble bubble bubble bubble

Programexecutionorder(in instructions)

101998 Morgan Kaufmann Publishers

Control Hazards

• Delayed branch:

Instructionfetch

Reg ALUData

accessReg

Time

beq $1, $2, 40

add $4, $5, $6

lw $3, 300($0)

Instructionfetch

Reg ALUData

accessReg

2 ns

Instructionfetch

Reg ALUData

accessReg

2 ns

2 4 6 8 10 12 14

2 ns

(Delayed branch slot)

Programexecutionorder(in instructions)

111998 Morgan Kaufmann Publishers

Data Hazards

• Look at the following example:

add $s0, $t0, $t1sub $t2, $s0, $t3

• We need the result $s0 from the add instruction to do the subtraction.

• Is the data ready?

• Compiler cannot handle this issue

• Solution: forwarding or bypassing, i.e., getting the missing item early from the internal resources.

121998 Morgan Kaufmann Publishers

Graphical representation of the instruction pipeline

• IF: instruction fetch

• ID: instruction decode

• EX: execution

• MEM: memory access

• WB: write back

• Shading: element used, White: element not used

• Right-shading: read, Left-Shading: write

Time2 4 6 8 10

add $s0, $t0, $t1 IF ID WBEX MEM

131998 Morgan Kaufmann Publishers

Forwarding

• As soon as ALU add is finished, forward the result

add $s0, $t0, $t1

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBEX

IF ID MEMEX

Time2 4 6 8 10

MEM

WBMEM

141998 Morgan Kaufmann Publishers

Forwarding with stall

• For R-format instruction following a load that tries to use the data, load-use data hazard will occur.

• Need to stall in this case.

Time2 4 6 8 10 12 14

lw $s0, 20($t1)

sub $t2, $s0, $t3

Programexecutionorder(in instructions)

IF ID WBMEMEX

IF ID WBMEMEX

bubble bubble bubble bubble bubble

151998 Morgan Kaufmann Publishers

Reordering Code to Avoid Pipeline Stalls

• Original code:

# register $t1 has the address of v[k]lw $t0, 0($t1) # reg $t0 = v[k]lw $t2, 4($t1) # reg $t1=v[k+1]sw $t2, 0($t1) # v[k] = reg $t2sw $t0, 4($t1) # v[k+1]= reg $t0

• Data hazard occurs on register $t2 between the second lw and the first sw• Modified code removes the hazard

# register $t1 has the address of v[k]lw $t0, 0($t1) # reg $t0 = v[k]lw $t2, 4($t1) # reg $t1=v[k+1]sw $t0, 4($t1) # v[k+1]= reg $t0sw $t2, 0($t1) # v[k] = reg $t2

161998 Morgan Kaufmann Publishers

A Pipelined Datapath

• What do we need to add to actually split the datapath into stages?

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Instruction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

ReaddataAddress

Datamemory

1

ALUresult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

171998 Morgan Kaufmann Publishers

Pipelined Datapath

Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem?

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Datamemory

Address

181998 Morgan Kaufmann Publishers

IF Stage

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Instruction fetch

lw

Address

Datamemory

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Instruction decode

lw

Address

Datamemory

191998 Morgan Kaufmann Publishers

ID Stage

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Instruction fetch

lw

Address

Datamemory

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Instruction decode

lw

Address

Datamemory

201998 Morgan Kaufmann Publishers

EX Stage

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Execution

lw

Address

Datamemory

211998 Morgan Kaufmann Publishers

MEM Stage

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Memory

lw

Address

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writedata

ReaddataData

memory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Write back

lw

Writeregister

Address

97108/Patterson Figure 06.15

221998 Morgan Kaufmann Publishers

WB Stage

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Memory

lw

Address

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writedata

ReaddataData

memory

1

ALUresult

Mux

ALUZero

ID/EX MEM/WB

Write back

lw

Writeregister

Address

97108/Patterson Figure 06.15

231998 Morgan Kaufmann Publishers

Corrected Datapath

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0

Address

Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

Datamemory

1

ALUresult

Mux

ALUZero

ID/EX

241998 Morgan Kaufmann Publishers

Portions of the Datapath used by a load instruction

Instructionmemory

Address

4

32

0

Add Addresult

Shiftleft 2

Inst

ruct

ion

IF/ID EX/MEM MEM/WB

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readregister 1

Readregister 2

16Sign

extend

Writeregister

Writedata

Readdata

1

ALUresult

Mux

ALUZero

ID/EX

Address

Datamemory

251998 Morgan Kaufmann Publishers

Graphically Representing Pipelines

• Can help with answering questions like:

– how many cycles does it take to execute this code?

– what is the ALU doing during cycle 4?

– use this representation to help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

261998 Morgan Kaufmann Publishers

Pipeline Control

PC

Instructionmemory

Address

Inst

ruct

ion

Instruction[20– 16]

MemtoReg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0Registers

Writeregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1Write

data

Read

data Mux

1

ALUcontrol

RegWrite

MemRead

Instruction[15– 11]

6

IF/ID ID/EX EX/MEM MEM/WB

MemWrite

Address

Datamemory

PCSrc

Zero

AddAdd

result

Shiftleft 2

ALUresult

ALU

Zero

Add

0

1

Mux

0

1

Mux

271998 Morgan Kaufmann Publishers

• We have 5 stages. What needs to be controlled in each stage?– Instruction Fetch and PC Increment– Instruction Decode / Register Fetch– Execution: RegDst, ALUOp, ALUSrc– Memory Stage: Branch, MemRead, MemWrite– Write Back: MemReg, RegWrite

• How would control be handled in an automobile plant?– a fancy control center telling everyone what to do?– should we use a finite state machine?

Pipeline control

281998 Morgan Kaufmann Publishers

• Pass control signals along just like the data

Pipeline Control

Execution/Address Calculation stage control lines

Memory access stage control lines

Write-back stage control

lines

InstructionReg Dst

ALU Op1

ALU Op0

ALU Src Branch

Mem Read

Mem Write

Reg write

Mem to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

Control

EX

M

WB

M

WB

WB

IF/ID ID/EX EX/MEM MEM/WB

Instruction

291998 Morgan Kaufmann Publishers

Datapath with Control

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Me

mto

Re

g

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux

1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Re

gWrit

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Me

mW

rite

AddressData

memory

Address

301998 Morgan Kaufmann Publishers

• Problem with starting next instruction before first is finished

– dependencies that go backward in time are data hazards

Dependencies

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2:

DM Reg

Reg

Reg

Reg

DM

311998 Morgan Kaufmann Publishers

• Type 1.a EX/MEM.RegisterRd= ID/EX.RegisterRs

• Type 1.b EX/MEM.RegisterRd= ID/EX.RegisterRt

• Type 2.a MEM/WB.RegisterRd=ID/EX.RegisterRs

• Type 2.b MEM/WB.RegisterRd=ID/EX.RegisterRt

• Classify the dependencies in the following sequence:

sub $2, $1, $3 # Reg. $2 set by suband $12, $2, $5 # 1st operand ($2)or $13, $6, $2 # 2nd operand ($2)add $14, $2, $2sw $15, 100($2)

• sub-and: Type 1a hazard• sub-or: Type 2b• sub-and: no hazard, sub-sw: no hazard

Hazard Conditions

321998 Morgan Kaufmann Publishers

• Use temporary results, don’t wait for them to be written

– register file forwarding to handle read/write to same register

– ALU forwarding

Forwarding

what if this $2 was $13?

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecution order(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value of register $2 :

DM Reg

Reg

Reg

Reg

X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :

DM

331998 Morgan Kaufmann Publishers

Forwarding

PCInstruction

memory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

Rt

Rt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

341998 Morgan Kaufmann Publishers

• Load word can still cause a hazard:– an instruction tries to read a register following a load instruction that writes to the

same register.

• Thus, we need a hazard detection unit to stall the load instruction

Can't always forward

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

351998 Morgan Kaufmann Publishers

Stalling

• We can stall the pipeline by keeping an instruction in the same stage

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

Reg

IM

Reg

Reg

IM DM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)

IM Reg DM RegIM

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9 CC 10

DM Reg

RegReg

Reg

bubble

361998 Morgan Kaufmann Publishers

Hazard Detection Unit

• Stall by letting an instruction that won’t write anything go forward

PCInstruction

memory

Registers

Mux

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

0

Mux

IF/ID

Inst

ruct

ion

ID/EX.MemReadIF

/ID

Wri

te

PC

Wri

te

ID/EX.RegisterRt

IF/ID.RegisterRd

IF/ID.RegisterRt

IF/ID.RegisterRt

IF/ID.RegisterRs

RtRs

Rd

Rt EX/MEM.RegisterRd

MEM/WB.RegisterRd

371998 Morgan Kaufmann Publishers

• When we decide to branch, other instructions are in the pipeline!

• We are predicting branch not taken– need to add hardware for flushing instructions if we are wrong

Branch Hazards

Reg

Reg

CC 1

Time (in clock cycles)

40 beq $1, $3, 7

Programexecutionorder(in instructions)

IM Reg

IM DM

IM DM

IM DM

DM

DM Reg

Reg Reg

Reg

Reg

RegIM

44 and $12, $2, $5

48 or $13, $6, $2

52 add $14, $2, $2

72 lw $4, 50($7)

CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9

Reg

381998 Morgan Kaufmann Publishers

Flushing Instructions

PCInstruction

memory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft 2

Mux

391998 Morgan Kaufmann Publishers

Improving Performance

• Try and avoid stalls! E.g., reorder these instructions:

lw $t0, 0($t1)lw $t2, 4($t1)sw $t2, 0($t1)sw $t0, 4($t1)

• Add a branch delay slot

– the next instruction after a branch is always executed

– rely on compiler to fill the slot with something useful

401998 Morgan Kaufmann Publishers

More on improving performances

• Superpipelining: decompose the stage further (not always practical)• Superscalar: start more than one instruction in the same cycle (extra coordination r

equired)– CPI can be less than 1– IPC: instruction per clock cycle

• Dynamic pipelining:

lw $t0, 20($s2)

addu $t1, $t0, $t2

sub $s4, $s4, $t3

slti $t5, $s4, 20

– Combine extra hardware resources so later instructions can proceed in parallel.– More complicated pipeline control– More complicated instruction execution model

411998 Morgan Kaufmann Publishers

Superscalar MIPS

• Assume two instructions are issued per clock cycle, say one integer ALU operation or branch, the other load or store.

• Need to fetch and decode 64 bits of instruction

• Extra resources are required.

421998 Morgan Kaufmann Publishers

Dynamic Scheduling

• The hardware performs the scheduling?

– hardware tries to find instructions to execute

– out of order execution is possible

– speculative execution and dynamic branch prediction

Commitunit

Instruction fetchand decode unit

In-order issue

In-order commit

Load/Store

Floatingpoint

IntegerInteger …Functionalunits

Out-of-order execute

Reservationstation

Reservationstation

Reservationstation

Reservationstation

431998 Morgan Kaufmann Publishers

Real Stuff

• All modern processors are very complicated

– DEC Alpha 21264: 9 stage pipeline, 6 instruction issue

– PowerPC and Pentium: branch history table

– Compiler technology important