professor nigel topham director, institute for computing systems architecture school of informatics...

26
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Upload: chester-kevin-gordon

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Professor Nigel Topham

Director, Institute for Computing Systems Architecture

School of Informatics

Edinburgh University

Informatics 3

Computer Architecture

Page 2: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 2

Terminology

Instruction fetch: fetch instruction from memory

Instruction issue: decode instruction, check for

structural hazards, and send to execution units

Instruction execution: execute instruction once

dependences are cleared

Instruction completion (or retire or commit): finish

instruction and update processor state

Some combinations are possible:

– In-order issue, execution and completion

– Out-of-order issue and execution and in-order completion

– Out-of-order issue, execution and completion

Page 3: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 3

Dynamic Scheduling 1: Scoreboarding

Handles all RAW, WAR, and WAW with proper stalls, but allows independent instructions to proceed

Step 1: Issue (part of original ID stage)– Issue instruction to functional unit iff functional unit is free

and no earlier instruction writes to the same destination register (WAW)

Step 2: Read operands (part of original ID stage)– Wait until source registers become available from earlier

instructions through register file (RAW)

Step 3: Execute (original EXE stage)– Execute instruction and notify scoreboard when done

Step 4: Write result (original WB stage)– Wait until earlier instructions read operands before writing

to register file (WAR)

Page 4: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 4

Scoreboard Organization

Instruction status: either one of the four steps of the instruction operation (Issue, Read Op, Execute, Write)

Functional unit status:– Busy – functional unit is being used– Op – type of operation to be performed (e.g., add,

subtract, etc)

– Fi – destination register

– Fj, Fk – source registers

– Qj, Qk – functional units producing Fj and Fk

– Rj, Rk – flag to indicate when Fj and Fk are ready (set to No after operand is read)

Register result status: indicate which functional unit will write the register next (one per register)

Page 5: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 5

Instruction sequence:

Latencies:– Integer 1 cycle– FP add 2 cycles– FP multiply 10 cycles– FP divide 40 cycles

Functional units: 1 integer (also for ld/st), 1 FP adder, 2 FP multipliers, 1 FP divider

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Scoreboard Example

Page 6: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 6

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 1

Y Load F6 R2 Y

Integer

Scoreboard Example

Page 7: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 7

cycle 3cycle 4cycle 2Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

Y Load F6 R2 N

Integer

MultY F0 F2 F4 N Y

Mult1

Integer

SubY F8 F6 F2 N N

Add

Y Div F10 F0 F6 Mult1 NInteger N

Divide

Page 8: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 8

Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 5

Y Load F2 R3 Y

Integer

MultY F0 F2 F4 N Y

Mult1

SubY F8 F6 F2 Y N

Add

Y Div F10 F0 F6 Mult1 N Y

Divide

Integer

Integer

Page 9: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 9

cycle 6cycle 7cycle 8Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

Y Load F2 R3 N

Integer

MultY F0 F2 F4 N Y

Mult1

SubY F8 F6 F2 Y N

Add

Y Div F10 F0 F6 Mult1 N Y

Divide

Integer

Integer

H&P A.52

Page 10: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 10

Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 9

MultY F0 F2 F4 N N

Mult1

SubY F8 F6 F2 N N

Add

Y Div F10 F0 F6 Mult1 N Y

Divide

cycle 10

cycle 11

cycle 12

Page 11: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 11

Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 13

MultY F0 F2 F4 N N

Mult1

AddY F6 F8 F2 Y Y

Add

Y Div F10 F0 F6 Mult1 N Y

Divide

Page 12: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 12

cycle 14cycle 15cycle 16cycle 17cycle 18cycle 19cycle 20Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

MultY F0 F2 F4 N N

Mult1

AddY F6 F8 F2 N N

Add

Y Div F10 F0 F6 Mult1 N Y

Divide

H&P A.53

Page 13: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 13

Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 21

AddY F6 F8 F2 N N

Add

Y Div F10 F0 F6 N N

Divide

cycle 22

Page 14: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 14

Scoreboard Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Read Op Execute Write

Busy Op Fi FjName

IntegerMult1Mult2Add

Divide

Fk Qj Qk Rj Rk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 23

Y Div F10 F0 F6 N N

Divide

H&P A.54

cycle 62

Page 15: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 15

Scoreboard Summary

Dynamically schedules instructions

Forces instructions to wait on RAW, WAR, WAW dependences and structural hazards

First used in the CDC 6600 in 1964 and yielded performance improvements of 1.7 to 2.5 times

Hardware cost (size) of scoreboard equivalent to one of the functional units

Many more buses required to move around operands, results, and instructions

Page 16: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 16

Dynamic Scheduling 2: Tomasulo’s Algorithm

Handles RAW with proper stalls and eliminates WAR and WAW through register renaming

Step 1: Issue– Issue instruction to the reservation stations if there is a

free reservation station– Read operands if available or rename operands if pending

(WAR and WAW)

Step 2: Execute– Execute instruction when both operands are ready in the

reservation station (RAW)

Step 3: Write result– Send results to register file and to all reservation stations

requiring the values (RAW)

Results are communicated through common data bus (CDB) (forwarding)

Page 17: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 17

Address unit

IBM S/360 model 91 used Tomasulo’s Algorithm

Dynamic O-O-O execution

Tags used to name flow dependencies

5 reservation stations

6 load buffers

Issue instructions to reservation stations, load buffers and store buffers

Instructions wait in reservation stations or store buffers until all their operands are collected

Functional units broadcast result and tag on the Common Data Bus (CDB) for all reservation stations, store buffers and FP registers to pick up.

FP adders FP multipliersMemory unit

Address unit

From instruction fetch unit

InstructionQueue

FP registers

Store buffersLoad buffers

Reservationstations

1

23

4

5

6

11

.

.

.

ld f1, 4(r1)

st f4, 8(r2)

mul f3, f1, f2

add f4, f5, f3

Page 18: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 18

Data Structures for Tomasulo’s Algorithm

Reservation station (RS) fields:Op The operation to be performedQj, Qk Tags to identify the producing RS of each source operandVj, Vk Actual values of the two source operandsBusy Indicates if RS contains a valid instruction

Register file fields:Qi Tag to identify producing RS of next value for this

register

Load buffers:A The computed address from which to load

Store buffers:A The computed address to which a store will take

placeQj Tag to identify the producing RS of the store data

Page 19: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 19

Operation of Tomasulo’s Algorithm

Instruction Issue:Get next instruction from head of the issue queue

If reservation station RS is available then:For each p in { i, j } representing operand register u

If Reg[u].Qi == 0 then RS.Vp = Reg[u].value // value ready now

If Reg[u].Qi != 0 then RS.Qp = Reg[u].Qi // value not yet ready

RS.Busy = 1 // reserve this RS

RS.Op = instruction opcode // set the operation

Execution:Wait until (RS.Qj == 0) and (RS.Qk == 0), and whilst waiting:

For each p in { i, j } If CDB.tag == RS.Qp then { RS.Vp = CDB.value; RS.Qp = 0 }

When (RS.Qj == 0) and (RS.Qk == 0), perform operation in RS.Op

Write Result:When CDB is free, broadcast CDB = { tag = RS.id, value = RS.result }

and clear RS.Busy

Page 20: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 20

Tomasulo’s Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Execute Write

Busy Op VjName

Add_1Add_2Add_3Mult_1Mult_2

Vk Qj Qk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 1

Load_1

Y Mult R[F4]

Load_1

Mult_1

cycle 2cycle 3

Y Sub M[34+R[R2]] Load_1

Add_1

Page 21: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 21

Tomasulo’s Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Execute Write

Busy Op VjName

Add_1Add_2Add_3Mult_1Mult_2

Vk Qj Qk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 4

Load_1

Y Mult R[F4]

Load_1

Mult_1

Y Sub M[34+R[R2]] Load_1

Add_1

Y Div Mult_1R[F6]

Mult_2

cycle 5cycle 6

Y Add M[45+R[R3]] Add_1

Add_2

Page 22: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 22

Tomasulo’s Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Execute Write

Busy Op VjName

Add_1Add_2Add_3Mult_1Mult_2

Vk Qj Qk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 7

Y Mult R[F4]

Load_1

Mult_1

Y Sub M[34+R[R2]] Load_1

Add_1

Y Div Mult_1R[F6]

Mult_2

Y Add M[45+R[R3]] Add_1

Add_2

cycle 8

Page 23: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 23

Tomasulo’s Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Execute Write

Busy Op VjName

Add_1Add_2Add_3Mult_1Mult_2

Vk Qj Qk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 9

Y Mult R[F4]

Load_1

Mult_1

Y Div Mult_1R[F6]

Mult_2

Y Add M[45+R[R3]] Add_1

Add_2

cycle 10

cycle 11

Page 24: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 24

Tomasulo’s Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Execute Write

Busy Op VjName

Add_1Add_2Add_3Mult_1Mult_2

Vk Qj Qk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 12

Y Mult R[F4]

Load_1

Mult_1

Y Div Mult_1R[F6]

Mult_2

cycle 15

cycle 16

Page 25: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 25

Tomasulo’s Example

Instruction

L.D F6, 34(R2)L.D F2, 45(R3)MUL.D F0, F2, F4SUB.D F8, F6, F2DIV.D F10, F0, F6ADD.D F6, F8, F2

Issue Execute Write

Busy Op VjName

Add_1Add_2Add_3Mult_1Mult_2

Vk Qj Qk

F0 F2 F4 F6 F8 F10 F12 … F30

cycle 55

Y Div Mult_1R[F6]

Mult_2

Page 26: Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture

Inf3 Computer Architecture - 2006-2007 26

Tomasulo’s Advantages

Register renaming:

– Qj and Qk can come from any reservation station independent of the register file

in fact we could have many more reservation stations than registers

– Vj and Vk store the actual value to be used

Parallel release of all instructions dependent as soon as the earlier

instruction completes (both SUB.D and ADD.D get the value from Load_1 )

No need to wait on WAR and WAW (notice that ADD.D has issued before DIV.D

has read its f6 operand and will execute as soon as the SUB.D finishes)

Dynamic unrolling of loops in hardware with dynamic register renaming