lecture 5 section a.8 branch hazards and dynamic scheduling via scoreboarding
DESCRIPTION
CS 203A Advanced Computer Architecture. Lecture 5 Section A.8 Branch Hazards and Dynamic Scheduling via scoreboarding. Instructor: L.N. Bhuyan. Control Hazards. Branch problem: branches are resolved in EX stage 2 cycles penalty on taken branches - PowerPoint PPT PresentationTRANSCRIPT
Oct. 26, 2004 1
Lecture 5Section A.8
Branch Hazards and Dynamic Scheduling
via scoreboarding
Instructor: L.N. Bhuyan
CS 203AAdvanced Computer Architecture
Oct. 26, 2004 2
Control Hazards
• Branch problem: – branches are resolved in EX stage 2 cycles penalty on taken branchesIdeal CPI =1. Assuming 2 cycles for all branches and 32%
branch instructions new CPI = 1 + 0.32*2 = 1.64
• Solutions:– Reduce branch penalty: change the datapath – new adder
needed in ID stage.– Fill branch delay slot(s) with a useful instruction.– Fixed branch prediction.– Static branch prediction.– Dynamic branch prediction.
Oct. 26, 2004 3
Control Hazards – branch delay slots
• Reduced branch penalty:– Compute condition and target address in the ID
stage: 1 cycle stall.– Target and condition computed even when
instruction is not a branch.
• Branch delay slot filling:move an instruction into the slot right after the branch,
hoping that its execution is necessary. Three alternatives (next slide)
Limitations: restrictions on which instructions can be rescheduled, compile time prediction of taken or untaken branches.
Oct. 26, 2004 4
Example Nondelayed vs. Delayed Branch
add M1 ,M2,M3
sub M4, M5,M6
beq M1, M4, Exit
or M8, M9 ,M10
xor M10, M1,M11
Nondelayed Branch
Exit:
add M1 ,M2,M3
sub M4, M5,M6
beq M1, M4, Exit
or M8, M9 ,M10
xor M10, M1,M11
Delayed Branch
Exit:
Oct. 26, 2004 5
Control Hazards: Branch Prediction
• Idea: doing something is better than waiting around doing nothingo Guess branch target, start executing at guessed positiono Execute branch, verify (check) your guess+ minimize penalty if guess is right (to zero)– May increase penalty for wrong guesseso Heavily researched area in the last 15 years
• Fixed branch prediction.Each of these strategies must be applied to all branch
instructions indiscriminately.– Predict not-taken (47% actually not taken):
continue to fetch instruction without stalling; do not change any state (no register write); if branch is taken turn the fetched instruction into no-op,
restart fetch at target address: 1 cycle penalty.
Oct. 26, 2004 6
Control Hazards: Branch Prediction
– Predict taken (53%): more difficult, must know target before branch is decoded. no advantage in our simple 5-stage pipeline.
• Static branch prediction.– Opcode-based: prediction based on opcode itself and
related condition. Examples: MC 88110, PowerPC 601/603.– Displacement based prediction: if d < 0 predict taken, if d
>= 0 predict not taken. Examples: Alpha 21064 (as option), PowerPC 601/603 for regular conditional branches.
– Compiler-directed prediction: compiler sets or clears a predict bit in the instruction itself. Examples: AT&T 9210 Hobbit, PowerPC 601/603 (predict bit reverses opcode or displacement predictions), HP PA 8000 (as option).
Oct. 26, 2004 7
Control Hazards: Branch Prediction
• Dynamic branch prediction– Based on the history of a particular branch -
Later
Oct. 26, 2004 9
MIPS FP Pipe Stages
FP Instr 1 2 3 4 5 6 7 8 …Add, Subtract U S+A A+RR+SMultiply U E+M M M M N N+ARDivide U A R D28 … D+A D+R, D+R, D+A,
D+R, A, RSquare root U E (A+R)108 … A RNegate U SAbsolute value U SFP compare U A RStages:
M First stage of multiplierN Second stage of multiplierR Rounding stageS Operand shift stageU Unpack FP numbers
A Mantissa ADD stage
D Divide pipeline stage
E Exception test stage
Oct. 26, 2004 10
R4000 Performance• Not ideal CPI of 1:
– Load stalls (1 or 2 clock cycles)– Branch stalls (2 cycles + unfilled slots)– FP result stalls: RAW data hazard (latency)– FP structural stalls: Not enough FP hardware (parallelism)
00.5
11.5
22.5
33.5
44.5
eqnto
tt
esp
ress
o
gcc li
doduc
nasa
7
ora
spic
e2g6
su2co
r
tom
catv
Base Load stalls Branch stalls FP result stalls FP structural
stalls
Oct. 26, 2004 11
FP Loop: Where are the Hazards?
Loop: LD F0,0(R1) ;F0=vector element ADDD F4,F0,F2 ;add scalar from F2 SD 0(R1),F4 ;store result SUBI R1,R1,8 ;decrement pointer 8B (DW) BNEZ R1,Loop ;branch R1!=zero NOP ;delayed branch slot
Instruction Instruction Latency inproducing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
Load double Store double 0
Integer op Integer op 0
• Where are the stalls?
Oct. 26, 2004 12
FP Loop Showing Stalls
• 9 clocks: Rewrite code to minimize stalls?
Instruction Instruction Latency inproducing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
1 Loop: LD F0,0(R1) ;F0=vector element
2 stall
3 ADDD F4,F0,F2 ;add scalar in F2
4 stall
5 stall
6 SD 0(R1),F4 ;store result
7 SUBI R1,R1,8 ;decrement pointer 8B (DW)
8 BNEZ R1,Loop ;branch R1!=zero
9 stall ;delayed branch slot
Oct. 26, 2004 13
Minimizing Stalls Technique 1: Compiler Optimization
6 clocks
Instruction Instruction Latency inproducing result using result clock cycles
FP ALU op Another FP ALU op 3
FP ALU op Store double 2
Load double FP ALU op 1
1 Loop: LD F0,0(R1)
2 stall
3 ADDD F4,F0,F2
4 SUBI R1,R1,8
5 BNEZ R1,Loop ;delayed branch
6 SD 8(R1),F4 ;altered when move past SUBI
Swap BNEZ and SD by changing address of SD
Oct. 26, 2004 14
HW Schemes: Instruction Parallelism• Compiler or Static instruction scheduling can avoid some
pipeline hazards.– e.g. filling branch delay slot.
• Why in HW at run time?– Works when can’t know dependence at compile time
WAW can only be detected at run time
– Compiler simpler– Code for one machine runs well on another
• Key idea: Allow instructions behind stall to proceedDIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14– Enables out-of-order execution => out-of-order completion– But, both structural and data hazards are checked in MIPS
ADDD is stalled at ID, SUBD can not even proceed to ID.
Oct. 26, 2004 15
HW Schemes: Instruction Parallelism
• Out-of-order execution divides ID stage:1. Issue—decode instructions, check for structural hazards, Issue in
order if the functional unit is free and no WAW.2. Read operands (RO)—wait until no data hazards, then read
operands ADDD would stall at RO, and SUBD could proceed with no stalls.
• Scoreboards allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions.
(WAR?)
(WAR?)
Focusing on FP operations – assume no MEM stages
IF ISSUE
… RO EX1 … EXm
RO EX1 … EXn
… RO EX1 … EXp
WB?
WB?
WB
…
Oct. 26, 2004 16
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards
• Solutions for WAR– CDC 6600: Stall Write to allow Reads to take place; Read registers
only during Read Operands stage.– Tomasulo: Register Renaming
• For WAW, must detect hazard: stall in the Issue stage until other completes
• Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units
• Scoreboard replaces ID with 2 stages (Issue and RO)• Scoreboard keeps track of dependencies, state or
operations– Monitors every change in the hardware.– Determines when to read ops, when can execute, when can wb.– Hazard detection and resolution is centralized.
Oct. 26, 2004 17
Four Stages of Scoreboard Control1.Issue—decode instructions & check for structural
hazards (ID1) If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the scoreboard issues the instruction to the functional unit and updates its internal data structure. If a structural or WAW hazard exists, then the instruction issue stalls, and no further instructions will issue until these hazards are cleared.
2.Read operands—wait until no data hazards, then read operands (ID2)
A source operand is available if no earlier issued active instruction is going to write it, or if the register containing the operand is being written by a currently active functional unit. When the source operands are available, the scoreboard tells the functional unit to proceed to read the operands from the registers and begin execution. The scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order.
Oct. 26, 2004 18
Four Stages of Scoreboard Control
3.Execution—operate on operands (EX) The functional unit begins execution upon receiving
operands. When the result is ready, it notifies the scoreboard that it has completed execution.
4.Write result—finish execution (WB) Once the scoreboard is aware that the functional unit has
completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, then it stalls the instruction.Example:
DIVD F0,F2,F4 ADDD F10,F0,F8 SUBD F8,F8,F14 CDC 6600 scoreboard would stall SUBD until ADDD reads
operands
Oct. 26, 2004 19
Three Parts of the Scoreboard
1.Instruction status—which of 4 steps the instruction is in
2.Functional unit status—Indicates the state of the functional unit (FU). 9 fields for each functional unit
Busy—Indicates whether the unit is busy or notOp—Operation to perform in the unit (e.g., + or –)Fi—Destination registerFj, Fk—Source-register numbersQj, Qk—Functional units producing source registers Fj, FkRj, Rk—Flags indicating when Fj, Fk are ready and not yet read.
Set to No after operand are read.
3.Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register
Oct. 26, 2004 20
Detailed Scoreboard Pipeline Control
Read operands
Execution complete
Instruction status
Write result
Issue
Bookkeeping
Rj No; Rk No
f(if Qj(f)=FU then Rj(f) Yes);f(if Qk(f)=FU then Rj(f) Yes);
Result(Fi(FU)) 0; Busy(FU) No
Busy(FU) yes; Op(FU) op; Fi(FU) `D’; Fj(FU) `S1’;
Fk(FU) `S2’; Qj Result(‘S1’); Qk Result(`S2’); Rj not Qj; Rk not Qk; Result(‘D’) FU;
Rj and Rk
Functional unit done
Wait until
f((Fj( f )!=Fi(FU) or Rj( f )=No) &
(Fk( f )!=Fi(FU) or
Rk( f )=No))
Not busy (FU) and not result(D)
A.55 on page A-76WAR
WAW
Oct. 26, 2004 21
Scoreboard Example
• The following numbers are to illustrate behavior, not representative
• LD – 1 cycle– (compute address + data cache access)
• ADDDs and SUBs are 2 cycles• Multiply is 10 cycles• Divide is 40 cycles
Oct. 26, 2004 22
Scoreboard Example
Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30FU
Oct. 26, 2004 23
Scoreboard Example Cycle 1
Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1LD F2 45+ R3MULTDF0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 YesMult1 NoMult2 NoAdd NoDivide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F301 FU Integer
Oct. 26, 2004 24
Scoreboard Example Cycle 2Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
Note: Can’t issue I2 because Integer unit is busy. Can’t issue next instruction due to in-order issue
Oct. 26, 2004 25
Scoreboard Example Cycle 3Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
Oct. 26, 2004 26
Scoreboard Example Cycle 4Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F6 R2 NoMult1 NoMult2 NoAdd NoDivide No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU
Oct. 26, 2004 27
Scoreboard Example Cycle 5Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5MULTD F0 F2 F4SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 YesMult1 NoMult2 NoAdd NoDivide No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
Now I2 is issued
Oct. 26, 2004 28
Scoreboard Example Cycle 6
Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6MULTD F0 F2 F4 6SUBD F8 F6 F2DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd NoDivide No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult Integer
Oct. 26, 2004 29
Scoreboard Example Cycle 7
Instruction status Read Execution WriteInstruction j k Issue operandscomplete ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger Yes Load F2 R3 NoMult1 Yes Mult F0 F2 F4 Integer No YesMult2 NoAdd Yes Subd F8 F6 F2 Integer Yes NoDivide No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult Integer Add
I3 stalled at read because I2 isn’t complete
Oct. 26, 2004 30
Scoreboard Example Cycle 8Instruction status Read EX WriteInstruction j k Issue Op compl. ResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6SUBD F8 F6 F2 7DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 Yes Mult F0 F2 F4 Yes YesMult2 NoAdd Yes Sub F8 F6 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
Oct. 26, 2004 31
Scoreboard Example Cycle 9
Instruction status Read EX WriteInstruction j k IssueOp completeResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
10 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
2 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide
Note: I3 and I4 read operands because F2 is now available. ADDD (I6) can’t be issued because SUBD (I4) uses the adder
Oct. 26, 2004 32
Scoreboard Example Cycle 11Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9SUBDF8 F6 F2 7 9 11DIVDF10 F0 F6 8ADDDF6 F8 F2Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
8 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
0 Add Yes Sub F8 F6 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
Note: Add takes 2 cycles, so nothing happens in cycle 10. MUL continues.
Oct. 26, 2004 33
Scoreboard Example Cycle 12Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
7 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide
Oct. 26, 2004 34
Scoreboard Example Cycle 13Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13Functional unit status dest S1 S2 FU for j FU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
6 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 Yes YesDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide
Now ADDD is issued because SUBD has completed
Oct. 26, 2004 35
Scoreboard Example Cycle 14Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
5 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
2 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide
Oct. 26, 2004 36
Scoreboard Example Cycle 15
Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
4 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
1 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide
Note: ADDD takes 2 cycles, so no change
Oct. 26, 2004 37
Scoreboard Example Cycle 16Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
3 Mult1 Yes Mult F0 F2 F4 No NoMult2 No
0 Add Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide
ADDD completes, but MULTD and DIVD go on
Oct. 26, 2004 38
Scoreboard Example Cycle 17Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
2 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide
ADDD stalls, can’t write back due to WAR with DIVD. MULT and DIV continue
Oct. 26, 2004 39
Scoreboard Example Cycle 18Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
1 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide
MULT and DIV continue
Oct. 26, 2004 40
Scoreboard Example Cycle 19Instruction status Read ExecutionWriteInstruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger No
0 Mult1 Yes Mult F0 F2 F4 No NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Mult1 No Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide
19 MULT completes after 10 cycles
Oct. 26, 2004 41
Scoreboard Example Cycle 20Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 Yes Yes
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide
MULTD completes and writes to F0
Oct. 26, 2004 42
Scoreboard Example Cycle 21Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd Yes Add F6 F8 F2 No NoDivide Yes Div F10 F0 F6 No No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide
Now DIVD reads because F0 is available
Oct. 26, 2004 43
Scoreboard Example Cycle 22Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Divide
ADDD writes result because WAR is removed.
Oct. 26, 2004 44
Scoreboard Example Cycle 61Instruction j k IssueoperandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTD F0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61ADDD F6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for jFU for kFj? Fk?
TimeName Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd NoDivide Yes Div F10 F0 F6 No No
Register result statusClock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide
DIVD completes execution
Oct. 26, 2004 45
Scoreboard Example Cycle 62
Instruction status Read ExecutionWriteInstruction j k Issue operandscompleteResultLD F6 34+ R2 1 2 3 4LD F2 45+ R3 5 6 7 8MULTDF0 F2 F4 6 9 19 20SUBD F8 F6 F2 7 9 11 12DIVD F10 F0 F6 8 21 61 62ADDDF6 F8 F2 13 14 16 22Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj RkInteger NoMult1 NoMult2 NoAdd No
0 Divide NoRegister result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F3062 FU
Execution is finished