ee457 midterm (~20-25%)
TRANSCRIPT
March 23, 2016 8:26 pm EE457 MT - Spring 2016 1 / 13 C Copyright 2016 Gandhi Puvvada
EE457 Midterm (~20-25%)Closed-book Closed-notes Exam; No cheat sheets; No cell phones or computers
Calculators and Verilog Guides are not allowed.
Spring 2016Instructor: Gandhi Puvvada
Thursday, 3/24/2016 (A 2H 50M exam)05:00 PM - 07:50 PM (170 min) in HAR101
Ques# Topic Page# Time Points Score
1 Lab 7 Part 3 Subpart 2 2
2 Branch Delay Slot and Lab 6 P4 3
3 Lab 7 Part 1 3-element adder 4-7
4 Virtual Memory 7
5 LW delay slot in Early Branch 8-9
6 Cache 10-11
7 Multi-cycle CPU 11-12
Total Cover+11+ Blank = 13
min.
Perfect Score
Student’s Last Name: _______________________________________
Student’s First Name: _______________________________________
Student’s DEN D2L username: [email protected]
March 23, 2016 8:26 pm EE457 MT - Spring 2016 2 / 13 C Copyright 2016 Gandhi Puvvada
1 ( points) min. Lab 7 Part 3 Subpart 2:
1.1 Complete the STALL_12 generation logic and connections to the four EN (enables).
1.1.1 Redesign the STALL_12 generation logic on the side using the D-FF which is preset (rather than cleared) using the low-active RESET_B. You may use either zero or one inverter at most. You are not allowed to use two inverters or more. Would you recommend producing STALL_12 in Verilog in a separate combinational OFL or in the same clocked always block? ____________________________
1.2 Don’t we need a WBFF (Wrist-Band Flip-Flop) here? Did we forget to add a WBFF? Y / NExplain _____________________________________________________________________________________________________________________________________________________________________________________________________________________________
1.3 After you finished the architectural design, the VLSI engineer, for her layout convenience, has swapped the order of the [SUB3+R1 mux] with [ADD4+R2 mux] as shown on the side and did not do any other changes. Is she a Trojan or a Bruin? Tr/Br Will there be any change in the unsigned overflow behavior of the overall result? Yes / No
PC
XA
Reg. File
XA
RA
RDR-Write
0
1
0
10
1
A
Cout
A
Cout
Comp Station in ID Stage
ID_XMEX12
P
IF ID EX12 WBComp Station in ID Stage
Q
ID_XA EX12_RA
P=Q
ID_XMEX12 = ID_XA Matched with EX12_RA
XD
EN
XM
EX12
A-3 A+4
FU
EN
RD
Writ
e
RA
XD
EX12_RA
EX12_ADD4
EX12_SUB3
EX12_ADD1 WB_RA
WB_Write
WB_RDX_Mux
R1_MuxR2_Mux
SKIP
1
SKIP
2
Qualifying signals
LAB 7 Part 3 with EX1 and EX2 merged Block Diagram
I-MEMEN
ADD4SUB3
EN
FORW
D QCLKCLRCLK
ADD
4SU
B3A
DD
1
RA
MO
V
ADD
4SU
B3A
DD
1
RA
MO
V
EX12_MOV
RESET_B
RESET_BRESET_B
RESET_B
RESET_B
STALL_12STALL_12Q
D QCLK
PRE
CLK
RESET_B
0
1
0
1 A
Cout
A
Cout
A-3A+4R1_Mux
R2_Mux
SKIP
1
SKIP
2
ADD4SUB3
March 24, 2016 6:50 am EE457 MT - Spring 2016 3 / 13 C Copyright 2016 Gandhi Puvvada
2 ( points) min. Branch Delay Slot
2.1 Filling the delay slot:
Can a delay slot be declared for(i) a conditional branch (beq/bne) (ii) an unconditional jump (j) (iii) a jal (iv) a jr $31Please circle all applicable.
Who fills the delay slot?Hardware / Compiler
For a conditional branch at the end of a loop with 1000 iterations, state your order of preference to fill the delay slot among the four choices : "a","b","c" as shown on the side and "d" is to just place a NOP. _______________________ Note: if a subset of choices are of equal priority, put them in parentheses Example ("a","c")
Repeat for a jump (j) instruction: _____________________________________________("a","b","c","d")Repeat for a jump and link (jal) instruction: ______________________________________("a","b","c","d")Repeat for a MIPS return from a subroutine (jr$31) instruction: ________________________("a","b","c","d")
State if the option "b" in the textbook figure above is easy, or moderately difficult or impossible for each of the four below:(i) conditional branches (beq/bne) Easy / Moderately difficult / Impossible
(ii) unconditional jumps (j) Easy / Moderately difficult / Impossible
(iii) subroutine calls (jal) Easy / Moderately difficult / Impossible
(iv) returns from subroutines (jr $31) Easy / Moderately difficult / Impossible
2.2 Reproduced on the side are two copies of a figure from Q3.3 of Lab 6 Part 4. In that assignment, we corrected it assuming ________ (zero/one/two) delay slots. Revise the two figures on the side for the remaining two assumptions.
Extract from your textbookpts
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
RESET RESET
Assistant #2’sdesign of flush
Instr.TLB
Instr.cache
IF1 IF2 ID
BR1
7-stage pipeline
PC
cont
rol
RESET RESET
Assistant #2’sdesign of flush
You are revising this based on the new assumption of ____ (0/1/2) delay slots.
You are revising this based on the new assumption of ____ (0/1/2) delay slots.
pts
it _______________________(as bad as / worse than) the hardware flush solution. For a beq, bne instructions declaring a delay slot and filling it with NOPs makes
it _______________________(as bad as / worse than) the hardware flush solution. For a j, jal, jr$31 instructions declaring a delay slot and filling it with NOPs makes
March 23, 2016 8:26 pm EE457 MT - Spring 2016 4 / 13 C Copyright 2016 Gandhi Puvvada
3 ( points) min. Lab 7 Part 1 3-element adder pipeline
The Lab 7 Part 1 performs ADD $R, $Z, $Y, $X or a NOP. Our VLSI engineer, Mr Bruin, was adding our 5-stage pipelined 3-element adder (Lab 7 Part 1) as an associate processor to a bigger processor and wanted to fit it in the leftover silicon in the big chip. However, he needs to add a dummy state to cover the wire delay between two parts of the silicon which are far apart. He has a choice of adding the dummy state either (A) between the original EX1 and EX2 or (B) between the original EX2 and WB. Your choice is ____ (A / B). Explain why? ________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
Do you need a Z3_mux in the EX3 (DUMMY) stage of the (B) design above? ______ Y / N____________________________________________________________________________ ___________________________________________________________________________________
Can you stall the dependent instruction in EX1 stage instead of in the ID stage either in the original 5-stage pipeline or in this 6-stage pipeline? __________________________________________________________________________________________________________________________________________Dependency for the Z register on a senior _________ (did / didn’t) lead to a stall in the original 5-stage pipeline. Dependency for the Z register on a senior _________ (does/ doesn’t) lead to a stall in this/these 6-stage pipeline(s) __ _____ __ ______ ______________ (say #A and #B or something as appropriate).
Design A is drawn in two ways on the next two pages with Z2 and Z3 muxes gathered in the dummy stage.We can remove 3 of the 9 comparators in the design with Z2_mux and Z3_mux in natural order . T / FWe can remove 3 of the 9 comparators in the design with Z2_mux and Z3_mux in unnatural order . T / F
A senior who became a NOP or in the processes of becoming a NOP due to overflow _______________ (may be / should not be) allowed to provide forwarding help to a junior.
pts
Z1_mux Z2_muxZ3_mux
IF ID EX1 EX2(DUMMY) WBEX3A
Z1_mux Z2_muxIF ID EX1 EX2 WBEX3(DUMMY)B
pts
3pts
6pts
6pts
3pts
March 23, 2016 8:26 pm
EE457 MT - Spring 2016 5 / 13
CC
opyright 2016 Gandhi Puvvada
30pts
PC
EN
RUN
ZA
YA
XA
RA
ZD
YD
XD
RA
RUN
RA
Reg. File
ZA
YA
XA
RA
RD
R-Write
EN
0
1
0
1
0
1
0
1
A
B
Add
er
Cout
S
A
B
S
Cout
Add
er
EN
Comp Station in ID Stage
ID_ZMEX1
ID_YMEX1
P=Q
P Q
P=Q
P Q
ID_ZA
ID_YA
ID_XMEX1
ID_XAEX1_RA
P=Q
P Q
P=Q
P Q
ID_ZA
ID_YA
ID_XA
ID_X
MEX
1=ID
_XA
Mat
ched
with
EX
1_R
A
Pipelined 3-element Adder P PQ Q
Block Diagram
IF ID EX1
Comp Station in ID Stage
EX3
EX2_RA
ID_XMEX2
ID_YMEX2
ID_ZMEX2
Y_Mux
X_Mux
Z1_mux Z3_mux
ZD
RA
EN
INS-
ME
M
EX1_
CO
UT
EX1_RA EX3_RA WB_RA
LAB 7 Part1 with a dummy stage
WB
ID_XA
ID_YA
ID_ZA
EN
RU
N
RU
N
XD
+YD
XD
+YD
+ZD
XD
YD
ZD
STALL
STA
LL_
B
X_FORW1
Y_FORW1
Z_FORW1 Z_FORW3WB_RD
WB_WRITE
EX3_
COU
T
ID_R
A
P=Q P=Q
Complete the design (6 EN controls, forward-ing paths, bubble-injection, etc.)
Generate on the next to next page: Z_FORW2, Z_FORW3
P=Q
P Q
P=Q
P Q
ID_ZA
ID_YA
ID_XA
P Q
EX3_RA
ID_XMEX3
ID_YMEX3
ID_ZMEX3
P=Q
0
1
Z2_mux
ZD
RA
EN
EX2_RA
RU
NX
D+Y
D
Z_FORW2
EX2
I.F.
R.F
EX1_RUN_IN EX3_RUN_INEX2_RUN_IN
WB_RA
WB_WRITE
WB_RD
Z2_mux and Z3_mux in natural order
March 23, 2016 8:26 pm
EE457 MT - Spring 2016 6 / 13
CC
opyright 2016 Gandhi Puvvada
PC
EN
RUN
ZA
YA
XA
RA
ZD
YD
XD
RA
RUN
RA
Reg. File
ZA
YA
XA
RA
RD
R-Write
EN
0
1
0
1
0
1
0
1
A
B
Add
er
Cout
S
A
B
S
Cout
Add
er
EN
Comp Station in ID Stage
ID_ZMEX1
ID_YMEX1
P=Q
P Q
P=Q
P Q
ID_ZA
ID_YA
ID_XMEX1
ID_XAEX1_RA
P=Q
P Q
P=Q
P Q
ID_ZA
ID_YA
ID_XA
ID_X
MEX
1=ID
_XA
Mat
ched
with
EX
1_R
A
Pipelined 3-element Adder P PQ Q
Block Diagram
IF ID EX1
Comp Station in ID Stage
EX3
EX2_RA
ID_XMEX2
ID_YMEX2
ID_ZMEX2
Y_Mux
X_Mux
Z1_mux Z2_mux
ZD
RA
EN
INS-
ME
M
EX1_
COU
T
EX1_RA EX3_RA WB_RA
LAB 7 Part1 with a dummy stage
WB
ID_XA
ID_YA
ID_ZA
EN
RU
N
RU
N
XD
+YD
XD
+YD
+ZD
XD
YD
ZD
STALL
STA
LL
_B
X_FORW1
Y_FORW1
Z_FORW1 Z_FORW2WB_RD
WB_WRITE
EX3_
COU
T
ID_R
A
P=Q P=Q
Complete the forwarding paths toZ_FORW2, Z_FORW3
Generate on the next page Z_FORW2, Z_FORW3
P=Q
P Q
P=Q
P Q
ID_ZA
ID_YA
ID_XA
P Q
EX3_RA
ID_XMEX3
ID_YMEX3
ID_ZMEX3
P=Q
0
1
Z3_mux
ZD
RA
EN
EX2_RA
RU
NX
D+Y
D
Z_FORW3
EX2
I.F
.R.F
EX1_RUN_IN EX3_RUN_INEX2_RUN_IN
WB_RA
WB_WRITE
WB_RD
Z2_mux and Z3_mux in unnatural order
20pts
March 23, 2016 8:26 pm EE457 MT - Spring 2016 7 / 13 C Copyright 2016 Gandhi Puvvada
Produce the Z_FORW2 and Z_FORW3 for both the designs below.
4 ( points) min. Virtual MemoryMMU stands for __________________________________.VPN stands for _____________________. PPFN stands for ____________________________TLB stands for ___________________________ Buffer. Given _______ (VPN/PPFN) the TLB provides __________ (VPN/PPFN).Given _______ (VPN/PPFN) the page table provides __________ (VPN/PPFN).We ___________ (use / don’t use) parallel search to search the page table. We ___________ (use / don’t use) binary search (also called dictionary search) to search the page table.
20pts
Z_FORW2
Z_FORW3
For the design withZ2_mux and Z3_mux in unnatural order
Z_FORW2
Z_FORW3
For the design withZ2_mux and Z3_mux in natural order
pts
March 24, 2016 7:41 am EE457 MT - Spring 2016 8 / 13 C Copyright 2016 Gandhi Puvvada
5 ( points) min. Early Branch (Lab6 Part 4 rev 3 design)
Given on the next page is the block diagram of the Lab 6 Early Branch design. Given below is the pseudo code for the HDU and HDU_Br of the Early Branch design.
In this question, we are dealing with the Load Word Delay slot and not the Branch delay slot. So please do not get confused. They are totally different.
If an ISA has declared one LW (Load Word) delay slot, it means that the compiler ____________(should / shouldn’t) place an instr. dependent on the LW, right _________ (before / after) the LW .
Our current Lab 6 Early Branch design assumes ________ (0/1) LW delay slots.Modify the design on the next page and the pseudo code for HDU and HDU_Br below to suit change the "LW delay slot" aspect of our design. So you are going to _________ (add / remove) a LW delay slot ____ (to / from) our Lab 6 design. This calls for FU_Br to change. T / F . This calls for FU to change. T / F
pts
pts
pts
========================================================================================HDU (Original Hazard Detection Unit in ID stage):Note: Here ID/EX.WriteRegister refers to the WriteRegister after the mux governed by RegDst. We could replace it with ID/EX.WriteRegisterRt . If [ ID/EX.MemRead and (ID/EX.WriteRegister =/= 0) and {(ID/EX.WriteRegister == IF/ID.ReadRegister_RS) or (ID/EX.WriteRegister == IF/ID.ReadRegister_RT)} ]then make STALL_LW = 1========================================================================================HDU_Br (New Hazard Detection Unit in ID stage to serve the early branch):Note: Here ID/EX.WriteRegister refers to the WriteRegister after the mux governed by RegDst.If [ Branch and [ [ ID/EX.RegWrite and (ID/EX.WriteRegister =/= 0) and { (ID/EX.WriteRegister == IF/ID.ReadRegister_RS) or (ID/EX.WriteRegister == IF/ID.ReadRegister_RT)} ] or [ EX/MEM.MemRead and (EX/MEM.WriteRegister =/= 0) and { (EX/MEM.WriteRegister == IF/ID.ReadRegister_RS) or (EX/MEM.WriteRegister == IF/ID.ReadRegister_RT)} ] ] ]then make STALL_BEQ = 1========================================================================================
March 23, 2016 8:26 pm
EE457 MT - Spring 2016 9 / 13
CC
opyright 2016 Gandhi Puvvada
Hazarddetection
unit
04
Inst
ruct
ion
mem
ory
PC
+
r1
r2
R1
R2w
W
opco
ders
rtrd
shift
func
t
Reg
iste
rs
Co
ntr
ol(PC
)
(rs)
(rt)
ALU
rtrd
ALUctrlSign
ext.
EX
MEWB
ALUSrcALUOpRegDst
ALUSrc
Reg
Dst
ALUOp
RegWrite_EX
Dat
am
emor
y
WR
ME
WB
ALU
_res
ult
@
W
R
Mem
Rea
d
Mem
Writ
e
Stor
e_da
ta
Reg
Writ
e
IF.Flush
WR
WB
MEM
_dat
aR
EG_d
ata
Reg
Writ
e
MemtoReg
+
=
func
ts_
ext
ShiftLeft 2
Zero
Forwarding Unit
Designed by: Gandhi PuvvadaDetailed implementation of Early Branch suggested in 3rd Ed.10/18/06
IF/IDIF-Stage
ID/EXID-Stage
EX/MEMEX-Stage MEM-Stage
MEM/WBWB-Stage
rs
MemRead_EXMemRead_MEM
Writ
eReg
iste
r_EX
FU_BrFW
_RS_
WB
FW_R
S_M
EM
FW_R
T_W
B
FW_R
T_M
EM
FW_R
T
FW_R
S
Writ
eReg
iste
r_M
EM
WriteRegister_MEMHDU_Br
STALL_BEQSTALL_LW
STALL
Bra
nch
01
0
1
1
0
0
1
11
11
1
0
0
00
0
0
0
1
Bra
nch
1
fowarding_mux_control
Drawn by: Wei-jen Hsu
Early Branch(Current Lab6)
pts
March 23, 2016 8:26 pm EE457 MT - Spring 2016 10 / 13 C Copyright 2016 Gandhi Puvvada
6 ( points) min. Cache
6.1 Shown below is a typical CAM (Content Addressable Memory) and a Data RAM to go with it.
The CAM size is specified as 32 x 16. "32" means there are 32 TAGs stored in. Then what is 16?____________________________________________________________________________When you switch on Power, do you clear all 32 tags to zeros or invalidate the 32 valid bits?____________________________________________________________________________How wide is the TAG in the above diagram and how wide is the comparison unit? How can you tell?________________________________________________________________________________________________________________________________________________________
25 = 32. So is there any 5-bit address? _________________________________________________________________________________________________________________________
Please write sizes of all 4 buses marked as .
How big is the Data RAM? _____X______
Is there any relation between the width of the CAM and the 32-bit width of the Data RAM?Yes / No. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Is there any relation between the depth of the CAM and the depth of the Data RAM? Yes / No. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________
I read somewhere that unlike RAM (which receives address and gives out Data) a CAM would receive Data and provide Address. But I see Write Addr as input at the top of the CAM. Explain.____________________________________________________________________________________________________________________________________________________________________________________________________________________________________
The "Data in" fed to the Data RAM is used during a ______________ (Cache Miss / Cache Hit).
TAG comparison inside a CAM:Several such comparators comparethe incoming Tag with all the stored Tags simultaneously.
March 23, 2016 8:26 pm EE457 MT - Spring 2016 11 / 13 C Copyright 2016 Gandhi Puvvada
6.2 Cache and MM organization for direct/set-associative caches
If the TAG size is 17 bits in a 32-bit address byte-addressable processor, we can tell the size of the cache if we know further the following (state needed or not needed for each (or true or false)):
Block size in words: needed/not needed . Word size in bytes: needed/not needed .Main Memory degree of lower-order interleaving: needed/not needed .
If it is set-associative, the degree of set-associativity (i.e. # of blocks per set): needed/not needed . If it is direct mapping, nothing more needed. True / False
At a late point of the CPU chip design (which includes the CPU and the L1-Cache), if you change the DSA (degree of set-associativity) from 2 to 4, and also double the size of the cache, the set-field will _________ (A/B/C/D/E) and the TAG field will _________ (A/B/C/D/E) where A = increase by 1 bit; B = increase by 2 bits; C = decrease by 1 bit; D = decrease by 2 bits; E = no change
The degree of Lower-Order Interleaving for the MM is decided by ___________________________________________________________________________________________________
7 ( points) min. Multi-cycle CPU
Reproduced on the next page is the modified CU state diagram for the 2nd edition multi-cycle CPU design from a previous exam where we fetch the next instruction in the last clock of the current instruction.
This kind of improvement is equally suitable for the 1st edition design. True / FalseExplain: _____________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
Why didn’t we try to apply this improvement to the last clock (State 5) of the "SW" instruction?____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
Why didn’t we try to apply this improvement to the last clock (State 8) of the "BEQ" instruction?____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________ ____________________________________________________________________________
March 23, 2016 8:26 pm EE457 MT - Spring 2016 12 / 13 C Copyright 2016 Gandhi Puvvada
Just FYINothing needsto be done here.
It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The advanced topics in the last 5 weeks are interesting and challenging too. About 40% of the final exam focuses on these topics. They are important for your interviews also. Best! Gandhi, TAs: Jizhe, Pezhman, Mentors: Kalpana, Bo, HW Graders: Monisha, Zihao Lab graders: Maanasa, Nita
March 23, 2016 8:26 pm EE457 MT - Spring 2016 13 / 13 C Copyright 2016 Gandhi Puvvada
Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.Student’s Last Name:____________________ email: __________________