ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 1 / 24C Copyright 2006 Gandhi Puvvada
1 [Based on question #4.1 of Summer 95 Final] Pipelined Ripple_Carry Adder: Given below is an arrangement where a 4-bit register file is to be used with the pipelined ripple carry adder discussed in your class notes. The register file has 8 registers and also two read ports and one write port. We need to be able to perform only two instructions using this set-up, an ADD and a NOP (NO-Operation). In the ADD, you add two source registers and store the result in the destination register. In the NOP, it does not matter whether you add or not, you should NOT STORE any result. Here we are NOT designing any HDU (Hazard Detection Unit) or FU (Forwarding Unit) to deal with data dependencies. Let us assume that the compiler is responsible for inserting NOPs to take care of any dependencies.The instructions are 10-bits long and the formats are given below. The single-bit opcode is a "1" for ADD and a "0" for NOP.
Instructions keep coming into the IF/ID register on every clock. You are not responsible for instruction fetching.
Complete the datapath and control on the next page. Mark the sizes of all the stage registers. Controlbits can be carried along with data in the stage registers. Here we are ignoring the final carry C4 andstoring the 4-bit result. Do NOT be misled by Miss Bruin’s design below!
Instruction Opcode rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg
size of the fields => 1 bit 3 bits 3 bits 3 bits
add rd, rs, rt 1 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
nop 0 rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
opc
ode
r
s2
r
s1
rs
0
rt
2
rt
1
rt
0
r
d2
rd
1
rd
0
IF/I
D
Size
= 1
0bit
R1A
2R
1A1
R1A
0R
2A2
R2A
1R
2A0
WA
2W
A0
WA
1
R1D
3R
1D2
R1D
1R
1D0
R2D
3R
2D2
R2D
1R
2D0
WD
3
WD
2
WD
1
WD
0
WR
ITE
CL
KSY
S_C
LK
RE
GIS
TE
R F
ILE
AB
Co
Ci
S
AB
Co
Ci
S
AB
Co
Ci
S
AB
Co
Ci
S
ID/E
X1
Size
=
EX
4/W
BSi
ze =
EX
3/E
X4
Size
=
EX
2/E
X3
Size
=
EX
1/E
X2
Size
=
EX
1
WB
EX
2
EX
3
EX
4
IF ID
D Q
DD
DD
DD
DD
D
Q
Rea
d_A
ddre
ss_1
Rea
d_A
ddre
ss_2
Wri
te_A
ddre
ss
Rea
d_D
ata_
2Write_Data
Rea
d_D
ata_
1
rs =
Sou
rce
Reg
1rt
= S
ourc
e R
eg 2
rd =
Des
tinat
ion
Reg
Miss Bruin’s Design
12 b
its
12 b
its
11 b
its
10 b
its
8 b
its
EE457 Lab 6 Part 4 Revised by Gandhi Puvvada and Wei-jen Hsu Based on ee457_Lab6_Part4.fm of 10/15/04 by Gandhi Puvvada
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 2 / 24C Copyright 2006 Gandhi Puvvada
List what major design errors you corrected in Miss Bruin’s design. __________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
opcode rs2 rs1 rs0 rt2 rt1 rt0 rd2 rd1 rd0
IF/ID
Size = 10bit
R1A2 R1A1 R1A0 R2A2 R2A1 R2A0 WA2 WA0WA1
R1D3 R1D2 R1D1 R1D0 R2D3 R2D2 R2D1 R2D0
WD3WD2
WD1
WD0
WRITE
CLKSYS_CLKREGISTER FILE
A BCo Ci
S
A BCo Ci
S
A BCo Ci
S
A BCo Ci
S
ID/EX1Size =
EX4/WBSize =
EX3/EX4Size =
EX2/EX3Size =
EX1/EX2Size =
EX1
WB
EX2
EX3
EX4
IF
ID
D
Q
D D D D D D D D D
Q Q Q Q Q Q Q Q Q
Read_Address_1 Read_Address_2 Write_Address
Read_Data_2W
rite
_Dat
aRead_Data_1
rs = Source Reg 1 rt = Source Reg 2 rd = Destination Reg
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 3 / 24C Copyright 2006 Gandhi Puvvada
2 [Based on question 5 of Summer 2003 Midterm and question 8 of Spring 1994 Final] Pipeline Design (Stalling / Flushing / Forwarding):
2.1 Bubbles are produced ________________________________________________________ (in stalling only/in flushing only/in stalling as well as in flushing/in neither stalling nor flushing).
2.2 In the early-branch design of the pipeline CPU (current lab6 based on 3rd ed.), flushing and stalling ___________________ (never occur in the same clock cycle/may sometimes occur in the same clock cycle/always occur in the same clock cycle).
In a late-branch design (based on the first edition), if the branch below is successful, do flushing and stalling both occur together or one would prevent the other? Explain.
beq $1, $2, TARGETlw $4, 40 ($5)or $8, $4, $6
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.3 There are 9 (1+4+2+2) control signals generated by the control unit. Eight of these (8 out of 9) are going from the ID stage to the EX stage. Do you need to convert all the 8 signals to zero when you stall an instruction in the ID stage? Please explain below.
2.4 To ___________ (stall/flush) an instruction in ID stage, you inhibit (prevent) updating of the following register(s). (circle as many of the following as you wish) PC , IF/ID , ID/EX , EX/MEM , MEM/WB You never inhibit (prevent) updating of a stage register if you are currently _______________ _______________________________________________________________________ (flushing / stalling / can not fill this blank with either of the previous two choices).
2.5 In this question we consider the late-branch design of the first edition with one HDU in ID stage and one FU in EX stage, and an internally forwarding register file.
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 4 / 24C Copyright 2006 Gandhi Puvvada
In the answers below, if there is a stalling, state the reason for stalling and which instruction(s) in which stage(s) are being stalled. If there is a forwarding, state the reason and also state which instruction from which stage is offering forwarding help to which instruction in which stage.
All the three streams use the same 3 instructions in different order.
For stream #1 above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of theseRemark: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
For stream #2 above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of theseRemark: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
For stream #3 above, the following occur(s): (circle all correct choices) (i) hazard detection and stalling by HDU (ii) forwarding by FU(iii) internal forwarding in the reg. file (iv) none of theseRemark: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.5.1 Now reconsider the above three streams in the context of the early-branch design based on the current lab 6. Explain any differences or striking resemblances to your three answers above._________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
Stream #1 Stream #2 Stream #3add $3 , $3 , $1; lw $3 , 40($5); lw $3 , 40($5);or $6 , $5 , $4; or $6 , $5 , $4; add $3 , $3 , $1;lw $3 , 40($5); add $3 , $3 , $1; or $6 , $5 , $4;
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 5 / 24C Copyright 2006 Gandhi Puvvada
2.6 In this question we consider the early-branch design of our current lab 6 with two HDUs (HDU and HDU_Br) and two FUs (FU and FU_Br). Of course the register file is an internally forwarding register file. Identify the dependencies in the following instruction streams and how they should be resolved:
For the stream #1 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________
_____________________________________________________________________________________________________________________________________________________________________________________________________________________
For the stream #2 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________
_____________________________________________________________________________________________________________________________________________________________________________________________________________________
For the stream #3 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________
_____________________________________________________________________________________________________________________________________________________________________________________________________________________
For the stream #4 above, the following occur(s): (circle all correct choices) (i) HDU_Br initiated stalling (ii) HDU initiated stalling(iii) forwarding by FU_Br (iv) forwarding by FU(v) internal forwarding in the reg. file (vi) none of these)Remark:_____________________________________________________________________
_______________________________________________________________________
Stream #1 Stream #2add $2 , $2 , $2; add $2 , $3 , $4;sub $1 , $2 , $3; sub $5 , $6 , $7;beq $2 , $0 , loop1; beq $5 , $2 , loop1;
Stream #3 Stream #4lw $4 , $3(40); lw $4 , $3(40);beq $4 , $0 , loop1; sub $5 , $6 , $7;
beq $4 , $0 , loop1;
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 6 / 24C Copyright 2006 Gandhi Puvvada
______________________________________________________________________________________________________________________________________________
Summary: In the lab #6 design for the early-branch, we stall the branch instruction for _________ (0/1/2/3/arbitrary) clock cycles if it is dependent on an R-type instruction __________________ (in EX stage / in MEM stage). We stall the branch instruction for _________ (0/1/2/3/arbitrary) clock cycles if it is dependent on an LW instruction in EX stage (i.e. beq is dependent on lw immediately ahead of it) . We stall the branch instruction for _________ (0/1/2/3/arbitrary) clock cycles if it is dependent on an LW instruction in MEM stage.
The result of an R-type instruction is available at the end of _______ (EX/MEM/WB) stage, and the result of an LW instruction is available at the end of _______ (EX/MEM/WB) stage. However, we choose not to forward these results (to beq) from the same stage where they are generated because _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.7 Whenever a load word (lw) instruction is followed by a dependent instruction (dependent on the word being loaded), the HDU detects the hazard and inserts a bubble. This being the case, to reduce the hardware, the compiler (a simple-minded design of a compiler) can be asked to put a NOP (no operation instruction) between such instructions without losing any additional performance. TRUE / FALSE
In the case of an early-branch design, can we use the same principle in the case of control hazards with conditional branch instructions by asking compiler to put one NOP after every conditional branch instruction to avoid the hardware associated with flushing the instruction in IF stage? Tell us first if this suggestion is feasible (meaning, it will produce correct output for the program)? If it is feasible, do you change (lose or gain) performance by doing so? Compare with the above case of lw.
2.8 In this question we focus on the specific point of tapping of the branch control signal in the ID stage for (a) ANDing with the equality inference and (b) for HDU_Br to produce STALL_BEQ. Reproduced below is the relevant extract of the block diagram. In particular, note that both the AND gate in ID-stage and the HDU_Br take the branch control signal from the output of control unit (Point B in the figure).
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 7 / 24C Copyright 2006 Gandhi Puvvada
Mr. Bruin claims that he discovered a problem in this design. He argues that the branch control signal for the AND gate should be taken after the flush mux (Point C) in the design to avoid erroneous branching. For example consider the following stream:
lw $4 , $3(40) ;beq $4 , $0 , loop1 ;
The BEQ instruction should be stalled for 2 clock cycles to resolve its dependency on the LW. However, if register $4 contains 0 before the execution of LW, the AND gate sees a 1 on both of its inputs and would take the branch based on wrong value of $4!!So Mr. Bruin concludes that a false branch will occur. Comment on Mr. Bruin’s discovery. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________He further offers a solution by moving the tapping of branch control signal from point B to point C instead. Evaluate the proposed solution by answering the following:
It is _______________________________ (a must / a feasible change but does not make any difference / a feasible change that improves the design / a sin) to move the tapping of branch control signal for the AND gate from point B to point C. Explain:________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
0
opco
de Co
ntr
ol(PC
)
EX
MEWB
IF/ID ID/EXID-Stage
HDU_Br
STALL_BEQSTALL_LW
STALL
Branch01
Branch
=
A
B
C
Hazarddetection
unit
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 8 / 24C Copyright 2006 Gandhi Puvvada
It is _______________________________ (a must / a feasible change but does not make any difference / a feasible change that improves the design / a sin) to move the tapping of branch control signal for the HDU_Br from point B to point C. Explain:________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________Another person suggests that instead of waiting for the control unit to generate the branch control signal, the OPCODE field can be re-coded so that we can identify BEQ instruction by inspection of a single bit among the six-bit OPCODE field. With this modification, we can bypass the control unit and get branch control signal from point A in the figure. Is this a good suggestion or bad one? Are there any other things we should take care of ? Consider the following control sequence. Notice that in case the first BEQ is taken, the second BEQ should be flushed.
beq $0 , $1 , loop1 ;beq $4 , $2 , loop2 ;
________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.9 Take a closer look at the muxes used to provide forwarding help in the EX stage, reproduced below on the left hand side:
We observe that the two muxes on the left are arranged in the particular order so that the forwarding help with higher priority (help from MEM stage) is fed into the second mux. Is this ordering significant? If the order of the muxes is reversed (as given in the "Modified design" on the right-hand side), can it be made to work? If so, what aspects/precautions need to be taken into consideration in the design of the FU (forwarding unit)? Answer the following questions:
FW_R
S_W
B
FW_R
S_M
EM
11
0
0
original read data
forwardedhelp fromWB stage
forwardedhelp fromMEM stage
FW_R
S_M
EM_n
ew
FW_R
S_W
B_n
ew
11
0
0
original read data
forwardedhelp fromMEM stage
forwardedhelp fromWB stage
Original lab design Modified design
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 9 / 24C Copyright 2006 Gandhi Puvvada
In the following instruction sequences, we need the forwarded value for $3 ($rs). What should the 2 control signals be?
add $10, $11, $12 ;add $3 , $3 , $3 ;or $6 , $3 , $4 ;
In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)
add $3 , $3 , $3 ;add $10, $11, $12 ;or $6 , $3 , $4 ;
In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)
add $3 , $3 , $3 ;add $3 , $3 , $3 ;or $6 , $3 , $4 ;
In the original design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)In the modified design, FW_RS_WB= (0/1/X), FW_RS_MEM= (0/1/X)
From the observations made in above instruction sequences, can we generate the 2 forwarding control signals independent of each other (a) in the original design and (b) in the modified design?
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.10 [Based on Question #6 of Fall 2006 midterm]FU_Br and FU in a 5-stage early branch design:
Your friend says that the MEM hazard cases shown in the above two streams are attended to by the FU_Br in ID stage. Agree / Disagree.
RegInstr.
HDU
Data
FU
IF ID EX MEM WB
BRANCH
BR
1
FU_Br
PC
cont
rol
HDU_Br
Zero
i add $1, $2, $3
i+1 xor $11, $12, $13
i+2 beq $4, $1, loop
i add $1, $2, $3
i+1 xor $11, $12, $13
i+2 sub $4, $1, $5
Remove??
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 10 / 24C Copyright 2006 Gandhi Puvvada
He further argues that one set of forwarding muxes in EX stage attending to the same very hazard redundantly (MEM hazard between a dependent instruction in EX stage and donor instruction in WB stage) can be removed. Agree / Disagree. Explain with a suitable example:
IF ID EX M WBCC1CC2CC3
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 11 / 24C Copyright 2006 Gandhi Puvvada
3 [Based on Question #4 of Fall 1995 Final] Modified Pipeline Design (7-stage pipeline) :
Pipelined CPU: A variation of the 5-stage pipeline CPU is the following 7-stage pipeline CPU. Here we assume that the memory accesses take two clocks - one for TLB access and the second for cache access. Hence we have IF1 and IF2 in the place of IF stage and similarly MEM1 and MEM2 in the place of MEM stage. Many details are omitted in the simple block diagrams given below. As before, we always try to resolve dependency problems through forwarding to the extent possible and will resort to stalling if forwarding cannot help.
Late Branch
Early Branch
RegInstr.TLB
Instr.cache
DataTLB
Datacache
FU
PC
IF1 IF2 ID EX MEM1 MEM2 WB
Zero
Zero
BRANCH
BR
1
7-stage pipelined version of the late-branch design of the 1st edition
HDU
cont
rol
RegInstr.TLB
Instr.cache
HDU
DataTLB
Datacache
FU
IF1 IF2 ID EX MEM1 MEM2 WB
BRANCH
BR
1
7-stage pipelined version of the early-branch design of the 3rd ed. and our lab 6
FU_Br
PC
contr
ol
HDU_Br
Zero
Remove mux Pair 2? See Q 3.1.
pair
#1pa
ir #2
pair
#3
pair
#1
(Treat it as removed for Q3.2)
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 12 / 24C Copyright 2006 Gandhi Puvvada
3.1 The two pairs of forwarding muxes in ID stage (in the early branch design) provide forwarding help from R-type instructions in MEM1 and MEM2 to beq (and also other instructions) in ID stage. Let us investigate whether we really need 3 pairs of forwarding muxes in the EX stage.These muxes (#1, #2, and #3) provide forwarding help to a dependent instruction in the EX from (a) an R-type or lw instruction in WB stage, (b) an R-type instruction in MEM2 stage, and (c) an R-type instruction in MEM1 stage respectively (in that specific order to implement the needed priority). Mr. Trojan argues that the mux pair #2 can be removed but not the mux pair #1. Explain.
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.2 Compare the original 5-stage late-branch and early-branch pipelines with these 7-stage versions by answering questions in the tables on the next 7 pages (sorry, it is a long question).
3.3 Flushing of the two instructions in the IF1 and IF2 stages in the case of the 7-stage pipeline:
Note: This part of the design is common to both branch implementations (late or early).
The flushing arrangement shown on the side is extracted from the earlier diagrams. As you can see it is hardly complete.Two of your assistants submitted the following designs to you. You are asked to finalize this design. You can adopt any one of them as is, or take any one of them and modify to your liking.
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
RESET RESET
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
RESET RESET
Assistant #2’sdesign of flush
Assistant #1’sdesign of flush
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 13 / 24C Copyright 2006 Gandhi Puvvada
Dep
ende
ncy
of a
R-t
ype
inst
ruct
ion
on a
load
wor
d in
stru
ctio
n, st
allin
g by
HD
U to
res
olve
the
depe
nden
cy p
robl
em:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
i
l
w $
1, 6
0($2
)
i+1
a
dd
$4,
$1,
$6
Any
bub
bles
? H
ow m
any?
Whe
re a
re th
ey in
sert
ed?
Com
plet
e th
e Ti
me-
Spac
e di
agra
ms.
This
exa
mpl
e is
com
plet
ed b
y us
.
Bub
bles
= _
__1_
____
_ (0
/1/2
/3)
Bub
bles
= _
____
1___
__ (0
/1/2
/3)
Bub
bles
= _
___2
____
__ (0
/1/2
/3)
Bub
bles
= _
___2
____
__ (0
/1/2
/3)
i
lw $
1, 6
0($2
)
i+1
sub
$10
, $11
, $12
i+2
add
$4,
$1,
$6
Any
bub
bles
? H
ow m
any?
Whe
re a
re th
ey in
sert
ed?.
Bub
bles
= _
____
____
__ (0
/1/2
/3)
Bub
bles
= _
____
____
__ (0
/1/2
/3)
Bub
bles
= _
____
____
__ (0
/1/2
/3)
Bub
bles
= _
____
____
__ (0
/1/2
/3)
How
man
y co
mpa
rato
rs d
oes
the
HD
U (n
ot H
DU
_Br)
have
? W
here
do
the
dest
ina-
tion
regi
ster
add
r. in
puts
to
the
com
para
tors
com
e fr
om?
# of
com
para
tors
= _
____
Des
tinat
ion
reg.
add
r. in
put(s
) com
e(s)
from
:
# of
com
para
tors
= _
____
Des
tinat
ion
reg.
add
r. in
put(s
) com
e(s)
from
:
# of
com
para
tors
= _
____
Des
tinat
ion
reg.
add
r. in
put(s
) com
e(s)
from
:
# of
com
para
tors
= _
____
Des
tinat
ion
reg.
add
r. in
put(s
) com
e(s)
from
:
Del
ay sl
ots f
or lw
: To
avoi
d th
e
use
of H
DU
, ho
w m
any
dela
y
slot
s sho
uld
we
decl
are
for
lw?
# of
Del
ay sl
ots =
___
___
# of
Del
ay sl
ots =
___
___
# of
Del
ay sl
ots =
___
___
# of
Del
ay sl
ots =
___
___
lwad
d
add
lwad
dlw
add
lwad
d
as 5-
stage
late
-bra
nch
lwad
d
add
add
add
lw
lw
lw
lwad
d
as 7-
stage
late
-bra
nch
lwsu
bad
dlw
sub
add
lwsu
bad
dlw
sub
add
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 14 / 24C Copyright 2006 Gandhi Puvvada
Dep
ende
ncy
of a
R-t
ype
inst
ruct
ion
on a
noth
er R
-typ
e in
stru
ctio
n; F
orw
ardi
ng:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
i
ad
d $
5, $
7, $
9
i+1
xo
r $
1, $
2, $
3
i+2
or
$
10,
$11,
$12
i+3
su
b $
3, $
5, $
1
Exp
lain
forw
ardi
ng to
inst
ruct
ion
(i+3)
sub
rece
ives
late
st $
1 fr
om x
or w
hen
sub
is
in _
____
_ st
age
and
xor
is in
___
___
stag
e un
der t
he c
ontro
l of
____
____
____
____
_(F
U/in
tern
al fo
rwar
d-in
g in
regi
ster
file
).su
b re
ceiv
es la
test
$5
from
__
____
____
____
____
____
____
____
__du
e to
___
____
____
___
____
____
____
____
(FU
/inte
rnal
forw
ard-
ing
in re
gist
er fi
le).
sub
rece
ives
late
st $
1 fr
om x
or fi
rst t
ime
whe
n su
b is
in _
____
_ st
age
and
xor
is in
__
____
stag
e un
der t
he
cont
rol o
f __
____
__
____
____
(FU
_Br/F
U/
inte
rnal
forw
ardi
ng in
re
gist
er fi
le).
It re
ceiv
es th
e sa
me
valu
e ag
ain
seco
nd
time
whe
n su
b is
in
____
__ st
age a
nd x
or is
in
___
___
stag
e un
der
the
cont
rol o
f __
____
__ (F
U_B
r/FU
/in
tern
al fo
rwar
ding
in
regi
ster
file
).su
b re
ceiv
es la
test
$5
from
___
____
____
___
____
____
____
____
__du
e to
___
____
____
___
____
____
____
____
(FU
_Br/F
U/in
tern
al
forw
ardi
ng in
regi
ster
fil
e).
sub
rece
ives
late
st $
1 fr
om x
or w
hen
sub
is
in _
____
_ st
age
and
xor
is in
___
___
stag
e un
der t
he c
ontro
l of
____
____
____
____
_(F
U/in
tern
al fo
rwar
d-in
g in
regi
ster
file
).su
b re
ceiv
es la
test
$5
from
add
whe
n su
b is
in
___
___
stag
e an
d ad
d is
in _
____
_ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ard-
ing
in re
gist
er fi
le).
sub
rece
ives
late
st $
1 fr
om
xor
first
tim
e w
hen
sub
is in
__
____
stag
e an
d xo
r is
in
____
__ st
age u
nder
the c
ontro
l of
___
____
_ (F
U_B
r/FU
). In
the
abse
nce
of m
ux p
ar #
2,
it _
____
____
_ (r
ecei
ves/
does
n’t r
ecei
ve)
the
sam
e va
lue
agai
n se
cond
tim
e af
ter
1 cl
ock.
su
b re
ceiv
es la
test
$5
from
ad
d fi
rst t
ime
whe
n su
b is
in
____
__ st
age
and
add
is in
__
____
stag
e und
er th
e con
trol
of _
____
____
____
____
(FU
_Br/F
U/in
tern
al fo
rwar
d-in
g in
regi
ster
file
). It
rece
ives
th
e sa
me
valu
e ag
ain
seco
nd
time
whe
n su
b is
in _
____
_ st
age
and
add
is in
___
___
stag
e un
der t
he c
ontro
l of
____
____
(FU
_Br/F
U/in
ter-
nal f
orw
ardi
ng in
regi
ster
file
).
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 15 / 24C Copyright 2006 Gandhi Puvvada
FU_B
r, F
U d
etai
ls:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
How
man
y co
mpa
rato
rs d
oes
the
forw
ardi
ng u
nit i
n ID
stag
e (F
U_B
r, no
t FU
) hav
e?
How
big
are
the
forw
ardi
ng
mux
es (n
-bit
wid
e m
-to-
1
mux
)? H
ow m
any?
Whe
re
do th
e da
ta in
puts
to th
e
mux
es c
ome
from
?
# of
com
para
tors
in F
U_B
r =
____
____
____
____
__Fo
rwar
ding
mux
(es)
in
the
A-le
g of
equ
ality
ch
ecke
r (si
ze a
nd n
um-
ber (
whi
ch is
sam
e for
the
B-le
g)) =
___
____
____
____
____
____
____
___
Dat
a in
puts
for t
his/
thes
e co
me
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
# of
com
para
tors
in F
U_B
r =
____
____
____
____
__Fo
rwar
ding
mux
(es)
in
the
A-le
g of
equ
ality
ch
ecke
r (si
ze a
nd n
um-
ber (
whi
ch is
sam
e for
the
B-le
g)) =
___
____
____
____
____
____
____
___
Dat
a in
puts
for t
his/
thes
e co
me
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
How
man
y co
mpa
rato
rs d
oes
the
forw
ardi
ng u
nit i
n E
X
stag
e (F
U, n
ot F
U_B
r) h
ave?
How
big
are
the
forw
ardi
ng
mux
es (n
-bit
wid
e m
-to-
1
mux
)? H
ow m
any?
Whe
re
do th
e da
ta in
puts
to th
e
mux
es c
ome
from
?
# of
com
para
tors
in F
U =
____
____
____
____
___
Forw
ardi
ng m
ux(e
s) in
th
e A
-leg
of A
LU (s
ize
and
num
ber (
whi
ch is
sa
me
for t
he B
-leg)
) =
____
____
____
____
___
____
____
____
____
___
Dat
a in
puts
for t
his/
thes
e co
me
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
# of
com
para
tors
in F
U =
____
____
____
____
___
Forw
ardi
ng m
ux(e
s) in
th
e A
-leg
of A
LU (s
ize
and
num
ber (
whi
ch is
sa
me
for t
he B
-leg)
) =
____
____
____
____
___
____
____
____
____
___
Dat
a in
puts
for t
his/
thes
e co
me
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
Not
e: M
ux P
air #
2 is
re
mov
ed.
# of
com
para
tors
in F
U =
____
____
____
____
___
Forw
ardi
ng m
ux(e
s) in
th
e A
-leg
of A
LU (s
ize
and
num
ber (
whi
ch is
sa
me
for t
he B
-leg)
) =
____
____
____
____
___
____
____
____
____
___
Dat
a in
puts
for t
his/
thes
e co
me
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
Same as 5
-stage la
te-branch
TRUE /
FALSE
Not applica
ble
Not applica
ble
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 16 / 24C Copyright 2006 Gandhi Puvvada
Prio
rity
in F
U a
nd F
U_B
r: N
ote:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
Prio
rity
in F
U (F
U, n
ot
FU_B
r): F
orw
ardi
ng to
a
depe
nden
t ins
truc
tion
stan
d-
ing
in E
X st
age.
Opt
to fo
r-
war
d fr
om th
e ne
arer
than
the
fart
her
The
FU p
refe
rs to
allo
w
forw
ardi
ng h
elp
from
the
____
____
__ (M
EM/W
B)
over
___
____
____
____
_(M
EM/W
B).
Prio
rity
is im
plem
ente
d by
pla
cing
the f
orw
ardi
ng
mux
es re
ceiv
ing
forw
ard-
ing
help
from
__
____
____
____
__
(MEM
/WB
) ups
tream
of
the
forw
ardi
ng m
uxes
re
ceiv
ing
forw
ardi
ng h
elp
from
__
____
____
____
__(M
EM/W
B).
The
FU p
refe
rs to
allo
w
forw
ardi
ng h
elp
from
the
____
____
__ (M
EM1/
MEM
2/W
B) o
ver
____
____
____
____
(MEM
1/M
EM2W
B) a
s w
ell a
s ___
____
____
___
(MEM
1/M
EM2W
B) F
ur-
ther
___
____
____
____
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
__.
Not
e: M
ux P
air #
2 is
re
mov
ed.
Prio
rity
is im
plem
ente
d by
pl
acin
g th
e fo
rwar
ding
m
uxes
rece
ivin
g fo
rwar
d-in
g he
lp fr
om
____
____
____
____
(M
EM1/
MEM
2/W
B)
upst
ream
of t
he f
orw
ard-
ing
mux
es re
ceiv
ing
for-
war
ding
hel
p fr
om
____
____
____
____
(MEM
1/M
EM2/
WB
).
Prio
rity
in F
U_B
r (F
U_B
r,
not F
U):
For
war
ding
to a
BE
Q in
stru
ctio
n st
andi
ng in
ID st
age.
Opt
to fo
rwar
d
from
the
near
er th
an th
e
fart
her
No
prio
rity
need
s to
be
impl
emen
ted
in F
U_B
r.TR
UE
/ F
ALS
EEx
plai
n: _
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
Prio
rity
is im
plem
ente
d by
pl
acin
g th
e fo
rwar
ding
m
uxes
rece
ivin
g fo
rwar
d-in
g he
lp fr
om
____
____
____
____
(E
X/M
EM1/
MEM
2/W
B)
upst
ream
of t
he f
orw
ard-
ing
mux
es re
ceiv
ing
for-
war
ding
hel
p fr
om
____
____
____
____
(EX
/MEM
1/M
EM2/
WB
).
Not ap
plica
ble
Not ap
plica
ble
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 17 / 24C Copyright 2006 Gandhi Puvvada
Dep
ende
ncy
of a
BE
Q in
stru
ctio
n on
a R
-typ
e in
stru
ctio
n; S
talli
ng th
roug
h H
DU
_Br,
For
war
ding
thro
ugh
FU_B
r/FU
:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
i
b
eq $
2, $
4, T
arge
t
How
man
y in
stru
ctio
ns fo
l-
low
ing
a su
cces
sful
bra
nch
are
flush
ed?
# of
inst
ruct
ions
that
nee
d to
be
flush
ed =
____
____
____
____
___
# of
inst
ruct
ions
that
nee
d to
be
flush
ed =
____
____
____
____
___
# of
inst
ruct
ions
that
nee
d to
be
flush
ed =
____
____
____
____
___
# of
inst
ruct
ions
that
nee
d to
be
flush
ed =
____
____
____
____
___
i
a
dd $
1, $
2, $
3
i+1
b
eq $
1, $
0, l
oop
How
man
y cl
ock
cycl
es d
oes
the
BE
Q h
ave
to b
e st
alle
d?
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U/in
tern
al fo
rwar
ding
in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U_B
r/FU
/inte
rnal
for-
war
ding
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U/in
tern
al fo
rwar
ding
in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U_B
r/FU
/inte
rnal
for-
war
ding
in re
gist
er fi
le).
i
a
dd $
1, $
2, $
3
i+1
xo
r $
11,
$12,
$13
i+2
be
q $
1, $
0, l
oop
How
man
y cl
ock
cycl
es d
oes
the
BE
Q h
ave
to b
e st
alle
d?
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U/in
tern
al fo
rwar
ding
in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U_B
r/FU
/inte
rnal
for-
war
ding
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U/in
tern
al fo
rwar
ding
in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
add
whe
n be
q is
in
____
___
stag
e an
d ad
d is
in
___
____
__ st
age u
nder
th
e co
ntro
l of
____
____
____
____
_(F
U_B
r/FU
/inte
rnal
for-
war
ding
in re
gist
er fi
le).
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 18 / 24C Copyright 2006 Gandhi Puvvada
Dep
ende
ncy
of a
BE
Q in
stru
ctio
n on
a lw
inst
ruct
ion;
Sta
lling
thro
ugh
HD
U_B
r, F
orw
ardi
ng th
roug
h FU
_Br/
FU:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
i
lw
$
1, $
2(40
)
i+1
b
eq $
1, $
0, l
oop
How
man
y cl
ock
cycl
es d
oes
the
BE
Q h
ave
to b
e st
alle
d?
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ardi
ng
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
_Br/F
U/in
tern
al fo
r-w
ardi
ng in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ardi
ng
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
_Br/F
U/in
tern
al fo
r-w
ardi
ng in
regi
ster
file
).
i
lw
$
1, $
2(40
)
i+1
ad
d $
6, $
5, $
4
i+2
b
eq $
1, $
0, l
oop
How
man
y cl
ock
cycl
es d
oes
the
BE
Q h
ave
to b
e st
alle
d?
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ardi
ng
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
_Br/F
U/in
tern
al fo
r-w
ardi
ng in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ardi
ng
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
_Br/F
U/in
tern
al fo
r-w
ardi
ng in
regi
ster
file
).
i
lw
$
1, $
2(40
)
i+1
ad
d $
6, $
5, $
4
i+2
or
$
16, $
15, $
14
i+3
b
eq $
1, $
0, l
oop
How
man
y cl
ock
cycl
es d
oes
the
BE
Q h
ave
to b
e st
alle
d?
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ardi
ng
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
_Br/F
U/in
tern
al fo
r-w
ardi
ng in
regi
ster
file
).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
/inte
rnal
forw
ardi
ng
in re
gist
er fi
le).
# of
clo
ck c
ycle
s beq
ne
eds t
o be
stal
led
=__
____
____
____
____
_be
q re
ceiv
es la
test
$1
from
lw w
hen
beq
is in
__
____
_ st
age
and
lw is
in
___
____
__ st
age
unde
r the
con
trol o
f __
____
____
____
___
(FU
_Br/F
U/in
tern
al fo
r-w
ardi
ng in
regi
ster
file
).
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 19 / 24C Copyright 2006 Gandhi Puvvada
Mis
cella
neou
s:
Des
ign
item
In 5
-sta
ge la
te-b
ranc
hIn
5-s
tage
ear
ly-b
ranc
hIn
7-s
tage
late
-bra
nch
In 7
-sta
ge e
arly
-bra
nch
How
man
y co
mpa
rato
rs d
oes
the
HD
U_B
r) h
ave?
Des
tinat
ion
regi
ster
add
r.(s)
com
e(s)
to H
DU
_Br
from
.....
# of
com
para
tors
in
HD
U_B
r = _
____
____
Des
t. R
eg. a
ddr.(
s)
com
e(s)
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
__
# of
com
para
tors
in
HD
U_B
r = _
____
____
Des
t. R
eg. a
ddr.(
s)
com
e(s)
from
___
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
____
__
Tho
ugh
it is
not
des
irab
le to
“del
ay”
the
BE
Q e
xecu
tion,
how
late
in th
e pi
pelin
e ca
n
you
exec
ute
the
BE
Q in
str.
?
The
late
st st
age
for e
xe-
cutin
g B
EQ is
___
____
_(E
X/M
EM/W
B).
The
late
st st
age
for e
xe-
cutin
g B
EQ is
___
____
_(E
X/M
EM1/
MEM
2/W
B).
The
ear
liest
a B
EQ
can
be
exec
uted
from
is:
The
earli
est s
tage
for e
xe-
cutin
g B
EQ is
___
____
_(I
F/ID
/EX
).
The
earli
est s
tage
for e
xe-
cutin
g B
EQ is
___
____
_(I
F1/IF
2/ID
/EX
).
Not ap
plica
ble
Not ap
plica
ble
Not ap
plica
ble
Not ap
plica
ble
Not ap
plica
ble
Not ap
plica
ble
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 20 / 24C Copyright 2006 Gandhi Puvvada
4 [Based on Question #5 of Summer 2004 Midterm] Modified Pipeline Design (4-stage pipeline) :
4.1 Pipelined CPU design:
Refer to your lab #6 5-stage pipeline design.
For the sake of this problem let us assume that we have a very fast ALU and a very fast Data Memory. Because they are very fast we could combine the EX-stage and the MEM-stage into one stage called EXMEM. Also to make the problem simpler, in this question we don’t consider forwarding help for BEQ instructions. Hence the FU_Br in ID stage has been removed. A BEQ instruction is stalled until the dependency is resolved. On the next page, a partially modified 4-stage design is presented. The HDU is not needed in this design and is removed. The input connections to the FU and HDU_Br are reduced.
Complete the forwarding paths to carry forwarding data to the forwarding MUXes and also input connections to the FU (forwarding unit) on the next page.
4.2 Compare and contrast the 5-stage pipeline design of lab #6 with the 4-stage pipeline design on the next page.
4.2.1 Unlike in the 5-stage pipeline, we do not need the regular HDU for LW dependency in the 4-stage pipeline because _______ _________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________However, we still need HDU_Br to stall the BEQ instructions. Answer the following questions about stalling happened in the instruction sequences:
Stream #1:lw $4 , $3(40) ;add $10, $4 , $6 ;
For this stream #1, ________ clock cycles is needed for stalling. Remark:______________________________________________________________________________________________________________________________________________________________________________________________________________
Stream #2:lw $4 , $3(40) ;beq $10, $4 , loop1 ;
For this stream #2, ________ clock cycles is needed for stalling. Remark:______________________________________________________________________________________________________________________________________________________________________________________________________________
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 21 / 24C Copyright 2006 Gandhi Puvvada
Stream #3:add $4 , $3, $2 ;beq $10, $4, loop1 ;
For this stream #3, ________ clock cycles used for stalling. Remark:___________________________________________________________________________________________________________________________________________________________________________________________________________
4.2.2 In the 5-stage pipeline, the PCWrite is under the control of ____________________________ _____________________________________ (HDU/HDU_Br/FU/FU_Br/Successful Branch/Successful Jump/Combination of these/none of these/none, no need to control, activated all the time).In the 4-stage pipeline, the PCWrite is under the control of ____________________________ _____________________________________ (HDU/HDU_Br/FU/FU_Br/Successful Branch/Successful Jump/Combination of these/none of these/none, no need to control, activated all the time).
4.2.3 The forwarding unit (FU) in the case of the 5-stage pipeline has _____(0/1/2/3/4/5/6) ________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators where as the FU in the case of the 4-stage pipeline has _____ (0/1/2/3/4/5/6) _________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators.
The FU in the case of the 4-stage pipeline produces ____________________ (one/two) outputs, of size __________ (1-bit / each 1-bit / 2-bit / each 2-bit) to control the forwarding muxes.
4.2.4 The HDU_Br (Hazard Detection Unit assisting beq) in the case of the 5-stage pipeline has _____(0/1/2/3/4/5/6) ________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators where as the same in the case of the 4-stage pipeline has _____ (0/1/2/3/4/5/6) _________ (1-bit/2-bit/3-bit/4-bit/5-bit/32-bit) comparators.
4.2.5 _______ (Like / Unlike) in the case of the 5-stage pipeline, we ____________ (need / don’t need) prioritization in the 4-stage pipeline in providing forwarding help to the instr #3 in the sequence of adds on the right.
4.2.6 If the clock frequency is the same for the two pipelines and we ignore the control (branch) hazard, the performance of the 4-stage pipeline is________________________________________ (better than / equal to / worse than / sometimes better than and sometimes worse than) the 5-stage pipeline performance.Explain. ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
4.2.7 In the 4-stage pipeline, since the ALU and the Memory are both in one stage, they can work simultaneously and this merging of ALU with Memory in a single stage does not call for extending the clock period (even if we use the original ALU and Data memory which are NOT fast). TRUE / FALSE Explain. ___________________________________________________ ______________________________________________________________________________
instr #1 add $2, $2, $2instr #2 add $2, $2, $2instr #3 add $2, $2, $2
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 22 / 24C Copyright 2006 Gandhi Puvvada
04 Instruction
memory
PC
+
r1 r2
R1
R2
w W
opcode rs rt rd shift funct
Registers
Control
(PC)
(rs) (rt)
ALU
rt rdA
LU ctrl
Sig
nex
t.
EXME
WB
ALU
Src
ALU
Op
Reg
Dst
ALU
Src
RegDst
ALU
Op
Reg
Writ
e_EX
Datamemory
@ W
R
MemRead
MemWrite
IF.F
lush
WR
WB MEM_data REG_data
RegWrite
Mem
toR
eg
+
=
functs_ext
Shift
Left
2Zero
Forw
ardi
ng U
nit
IF/I
DIF
-Sta
geID
/EX
ME
MID
-Sta
geE
XM
EM
-Sta
geE
XM
EM
/WB
WB
-Sta
ge
rs
WriteRegister_EX
HD
U_B
r
STA
LL_B
EQ
STA
LL
Bra
nch
0 1
01
1
1
1
0
00
0
0 1
Bra
nch
1
fow
ardi
ng_m
ux_c
ontr
ol
Ear
ly B
ranc
h 4-
stag
e
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 23 / 24C Copyright 2006 Gandhi Puvvada
Forw
ardi
ngun
it
Haz
ard
dete
ctio
nun
it
04
0
0
Instructionmemory
PC
+
r1 r2
R1
R2
w W
opcode rs rt rd shift funct
Registers
Control
(PC)
(rs) (rt)
ALU
rs rt rd functshift
ALU ctrl
Sign ext.
EXME
WB
ALU
Src
ALU
Op
Reg
Dst
ALU
Src
Reg
Dst
ALU
Op
Mem
Rea
d
+
(PC)
Z
Datamemory
WR
ME
WB ALU_result
@ W
R
MemRead
MemWrite
Store_data
RegWrite
(PC)
Branch
ID.F
lush
IF.Flush
EX.F
lush
WR
WB MEM_data REG_data
RegWrite
Mem
toR
eg
Orig
inal
dra
win
g pr
ovid
ed b
y Pr
of. D
uboi
sPi
pelin
ed C
PU (L
ate
Bra
nch
from
1st
Ed.
) for
the
EE
457
clas
s Lab
#6
Shift
Left
2
3/26
/200
0
IF/I
DIF
-Sta
geID
/EX
ID-S
tage
EX
/ME
ME
X-S
tage
ME
M-S
tage
ME
M/W
B WB
-Sta
ge
Lat
e B
ranc
h (O
LD
Lab
6)
ee457_Lab6_Part4_r3.fm 7/22/07
EE457 Lab #6 / part 4 24 / 24C Copyright 2006 Gandhi Puvvada
Haz
ard
dete
ctio
nun
it
04 Instruction
memory
PC
+
r1 r2
R1
R2
w W
opcode rs rt rd shift funct
Registers
Control
(PC)
(rs) (rt)
ALU
rt rd
ALU ctrl
Sig
nex
t.
EXME
WB
ALU
Src
ALU
Op
Reg
Dst
ALU
Src
RegDst
ALU
Op
Reg
Writ
e_EX
Datamemory
WR
ME
WB ALU_result
@ W
R
MemRead
MemWrite
Store_data
RegWrite
IF.F
lush
WR
WB MEM_data REG_data
RegWrite
Mem
toR
eg
+
=
functs_ext
Shift
Left
2Zero
Forw
ardi
ng U
nit
Des
igne
d by
: Gan
dhi P
uvva
daD
etai
led
impl
emen
tatio
n of
Ear
ly B
ranc
h su
gges
ted
in 3
rd E
d.10
/18/
06
IF/I
DIF
-Sta
geID
/EX
ID-S
tage
EX
/ME
ME
X-S
tage
ME
M-S
tageM
EM
/WB
WB
-Sta
ge
rs
Mem
Rea
d_EX
Mem
Rea
d_M
EM
WriteRegister_EXFU
_Br
FW_RS_WB
FW_RS_MEM
FW_RT_WB
FW_RT_MEM
FW_RT
FW_RS
WriteRegister_MEM
Writ
eReg
iste
r_M
EMH
DU
_Br
STA
LL_B
EQST
ALL_
LW
STA
LL
Branch
0 1
0 1 10
01
11
11
1
00
00
0
0
0 1
Branch
1
fow
ardi
ng_m
ux_c
ontr
ol
Dra
wn
by: W
ei-je
n H
su
Ear
ly B
ranc
h(C
urre
nt L
ab6)