ee457 midterm exam (~24%)

11
March 22, 2018 10:13 am EE457 MT - Spring 2018 1 / 10 C Copyright 2018 Gandhi Puvvada EE457 Midterm Exam (~24%) Closed-book Closed-notes Exam; No cheat sheets; Ordinary calculators may be used but not the smart phone with calculators. Verilog Guides are not needed and are not allowed. Smart phones, tablets (and any kind of computing/Internet devices) are not allowed. This is a Crowdmark exam. Please do not write on margins or on backside. Spring 2018 Instructor: Gandhi Puvvada Thursday, 3/22/2018 (A 3-hour exam) 05:00 PM - 08:00 PM (180 min) in THH201 Please do not write your student ID Ques# Topic Page# Time Points Score 1 Lab 7 Part 3 Subpart 2 2-5 90 min 130 2 Lab 7 Part 1 3-element adder 6-7 30 min 40 3 Flushing by a successful branch 8 15 min 25 4 Virtual Memory 8 5 min 15 5 Cache and MM Organization 9 15 min 32 Total Cover+8 + Blank = 10 155 min. 240 Perfect Score 230

Upload: others

Post on 24-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 1 / 10 C Copyright 2018 Gandhi Puvvada

EE457 Midterm Exam (~24%)Closed-book Closed-notes Exam; No cheat sheets;

Ordinary calculators may be used but not the smart phone with calculators. Verilog Guides are not needed and are not allowed.Smart phones, tablets (and any kind of computing/Internet devices) are not allowed.

This is a Crowdmark exam. Please do not write on margins or on backside.

Spring 2018Instructor: Gandhi Puvvada

Thursday, 3/22/2018 (A 3-hour exam) 05:00 PM - 08:00 PM (180 min) in THH201Please do not write your student ID

Ques# Topic Page# Time PointsScore

1 Lab 7 Part 3 Subpart 2 2-5 90 min 130

2 Lab 7 Part 1 3-element adder 6-7 30 min 40

3 Flushing by a successful branch 8 15 min 25

4 Virtual Memory 8 5 min 15

5 Cache and MM Organization 9 15 min 32

Total Cover+8+ Blank = 10

155 min. 240

Perfect Score 230

Page 2: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 2 / 10 C Copyright 2018 Gandhi Puvvada

1 ( 15 + 115 = 130 points) 10+80 = 90 min. Lab 7 Part 3 Subpart 2 modification:

1.1 Reproduced below is the solution to the Spring 2016 problem you are asked to go through, showing the generation of the STALL_12 logic and how that is used to control the four EN (enables).

1.1.1 Mr. _________________ (Bruin/Trojan) says, "When you stall the EX12 stage, it is not necessary to stall the WB stage. The senior #1 in the WB stage helps his junior ADD1 in the first clock and he (the junior) performs SUB3 on it in the first clock. In the next clock, ADD4 is performed on the result of the SUB3 and hence the forwarding help from the senior #1 is not needed in the 2nd clock. Hence stalling WB is unnecessary." Explanation of your answer: ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________

1.1.2 Miss _________________ (Bruin/Trojan) says that she can replace the above stall generating logic with a toggle Flip-Flop shown on the side. But won’t it be toggling multiple times? What if there is a series of ADD1 instructions? If possible, generate either STALL_12 or STALL_12. Explain how it is or it isn’t possible.________ ________________________________________________________________________________________________________________________________________________________

PC

XA

Reg. File

XA

RA

RDR-Write

0

1

0

10

1

A

Cout

A

Cout

Comp Station in ID Stage

ID_XMEX12

P

IF ID EX12 WBComp Station in ID Stage

Q

ID_XA EX12_RA

P=Q

ID_XMEX12 = ID_XA Matched with EX12_RA

XD

EN

XM

EX12

A-3 A+4

FU

EN

RD

Wri

te

RA

XD

EX12_RA

EX12_ADD4

EX12_SUB3

EX12_ADD1 WB_RA

WB_Write

WB_RDX_Mux

R1_Mux

R2_Mux

SKIP

1

SKIP

2

Qualifying signals

LAB 7 Part 3 with EX1 and EX2 merged Block Diagram

I-MEM

EN

ADD4SUB3

EN

FOR

W

D QCLKCLR

CLK

AD

D4

SUB

3A

DD

1

RA

MO

V

AD

D4

SUB

3A

DD

1

RA

MO

V

EX12_MOV

RESET_B

RESET_BRESET_B

RESET_B

RESET_B

STALL_12STALL_12Q

STALL12

7pts

D QCLK

CLR

CLK

RESET_B

ADD1

8pts

Page 3: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 3 / 10 C Copyright 2018 Gandhi Puvvada

1.2 Now to the above design (of Lab 7 P3 SP2 on the top of the previous page), we added an EX3 stage with a MULT2 unit (which multiplies by 2) and a R3_mux (select line labeled SKIP3) to skip this doubling operation. Now we have a total of 8 operations: 4 previous operations (MOV, SUB3, ADD4, ADD1) (abbreviated as MV, S3, A4, A1) and 4 more new operations (which produce double of the result of

those previous four) (2MOV, 2SUB3, 2ADD4, 2ADD1) (abbreviated as 2MV, 2S3, 2A4, 2A1). No more one-hot coding of the opcode. A 3-bit opcode is decoded in the ID stage to produce the 8 one-hot control signals. There is no opcode for a NOP, but the decoder can be disabled from producing any active outputs in order to inject a bubble (during power-on reset) using a Wrist-Band FF. Here, we have both, a RAW stalls in the ID stage (initiated by STALL_ID signal) as well as a stall to allow ADD1 or 2ADD1 (double of ADD1) to perform both SUB3 and ADD4 operations in EX12 (initiated by STALL_12 signal). Complete the design on page 5/10 as well as a few parts below.

1.2.1 Stalls: STALL_ID and STALL_12

(i) they always go active together. T / F(ii) they can go active together or independently. T / F(iii) when they go active together, the STALL_ID gets extended beyond the STALL_12 by just 1-clock. T / F(iv) when STALL_ID occurs without STALL_12, STALL_ID lasts for only one clock.. T / F(v) The entire pipeline (including EX3 and WB) is stalled (a) by both (b) by STALL_ID only (c) by STALL_12 only (d) by neither

1.2.2 Bubbles are injected into the next stage(a) by both (b) by STALL_ID only (c) by STALL_12 only (d) by neither

1.2.3 Circle instructions which cannot help from EX3 stage: MV, S3, A4, A1, 2MV, 2S3, 2A4, 2A1

1.2.4 Circle instructions which do not mind to postpone receiving help until they reach EX3 stage: MV, S3, A4, A1, 2MV, 2S3, 2A4, 2A1

1.2.5 Circle instructions which do not want to receive help for the second time in EX3 stage: MV, S3, A4, A1, 2MV, 2S3, 2A4, 2A1

1.2.6 You want to check to make sure if the senior on whom you (the junior) are dependent (and from whom you are receiving forwarding help) is not a NOP. True / False

1.2.7 It is not necessary to check to see that you yourself are not a NOP, if you are the junior, who is dependent on a senior and receiving forwarding help from him as long as he (the senior) is not a NOP. True / False

1.2.8 It is not necessary to check to see that you yourself are not a NOP, if you are the junior, who is dependent on a senior and are stalling because the senior can not otherwise provide you needed forwarding help in time. True / False

1.2.9 Register File: Your assistant Mr. Bruin forgot to provide Internal forwarding circuitry inside the Register File, so please complete the forwarding mux just outside the register file in the ID stage.

10pts

3pts

3pts

3pts

3pts

3pts

3pts

3pts

Page 4: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 4 / 10 C Copyright 2018 Gandhi Puvvada

Generate here the 6 items marked as on next page. Use signal names like EX12_ID_XMEX3 (page total 36 points)

4pts

3pts

3pts

3pts

6pts

7pts

6pts

4pts

STALL_ID

FORW_12A

FORW_3

SID

FORW_12B

FU_12

HDU

SKIP1

SKIP2

SKIP3

FU_3

ID_Write is produced on the side in two ways. Comment on them. Use words like,correct/incorrect, logically equivalent/different, cost wise ..., timing wise...,________________________________________________________________________________________________________________________________Show again here, what you did to the Wrist-band FF and explain your work.________________________________________________________________________________________________________________________________________________________________________________________________

RA

op2op1

op0 CU

D Q

A0A1A2

Y0Y1Y2Y3Y4Y5Y6Y7

EN

op0op1op2

MVS3A4A12MV2S32A42A1

ID_Write

ID_Write

ID_Write #1

ID_Write #2

Page 5: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am

EE

457 MT

- Spring 2018 5 / 10

CC

opyrigh

t 2018 Gan

dh

i Pu

vvada

PCXA

Reg. File

XA

RA

RDR-Write

XD

I-M

EM

EN EN

RA

op2op1

op0 CU

D Q

A0A1A2

Y0Y1Y2Y3Y4Y5Y6Y7

EN

op0op1op2

MVS3A4A12MV2S32A42A1

0

1

0

10

1

A

EX12 WB

A-3

FU_12

EN

RD

Wri

te

RAWB_RA

WB_Write

WB_RD

X12

A_M

ux

R1_MuxR2_Mux

SKIP1

SKIP2

ADD4SUB3

FORW_12A

STALL_12

A

A+4

EN

XD

RA

ID

XD

EN

XD

RA

0

1

FU_3

FORW_3

0

1

X12

B_M

uxFORW_12B

FORW_12A

FORW_12B

EX3

0

1

R3_Mux

SKIP3

MULT2

A

2A

MVS3A4A12MV2S32A42A1

MVS3A4A12MV2S32A42A1

EX3_RAEX12_RA

X3_

Mux

EX3_Wr

ite

HDU

STALL_ID

SID

Modified LAB 7 Part 3 Block Diagram

Q#1.2

Comp Station in ID Stage

ID_XMEX12

P Q

ID_XAEX12_RA

P=Q

ID_XMEX12= ID_XA Matched with EX12_RA

2. Complete the 18 items marked as here on this page. 13*1.5 + 2 WB_FF*2 + 3 Forwards*2.5 = 31

Notes:

4. Produce the 6 items marked as on the previous page.

D QCLKCLR

CLK

Wrist-Band FF

IF

R_B

R_B1. Add low-active R_B (Reset_Bar) control whereever needed using 7.5 pts

0

1ForwardingNo Internal

ID_XMEX3

P Q

ID_XAEX3_RA

P=Q

ID_XMWB

P Q

ID_XAWB_RA

P=Q

Assume 7 more lines like this

XMEX12

XMEX3

XMWB

XMEX12

XMEX3

XMWB

XMEX12

XMEX3

XMWB

EX12_ID_XMEX12

EX12_ID_XMEX3

EX12_ID_XMWB

ID_XMEX12

ID_XMEX3

ID_XMWB

EX3_ID_XMEX12

EX3_ID_XMEX3

EX3_ID_XMWB

WB_ID_XMEX12

WB_ID_XMEX3

WB_ID_XMWB

EX12_RA

WriteID_Write EX12_Write WriteEX12_Write

3. The 3 inferences of the 3 comparators in the ID station are carried through the three pipeline stage registers. Cross off unneeded items (comparators,registers, wires). 9.5 pts

48pts

Page 6: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 6 / 10 C Copyright 2018 Gandhi Puvvada

2 ( 8 + 12 + 9 + 9 + 12 = 40 points) 30 min. Shifting stalling from ID to EX stage

Lab 7 Part 1 (3-element adder)

You went through ee457_MT_Sp2012_Q1.3_revised_in_Sp2018_sol.pdf. This question (Q#2.1) is to test your understanding of the same. The revised problem statement and the solution figure are reproduced below for your reference.

Problem statement:Stalling is currently in the ID stage. Your boss wanted you to move the stall to the EX1 stage and she told you something like, "... we can have a stage after the WB called WB_after ...". You do not know if your boss is a Bruin or a Trojan. Discuss the feasibility. If it is feasible discuss the details of the new design including what goes into WB_after, how many comparators are involved in stalling/forwarding, their locations, any changes to forwarding besides stalling, any changes to internally forwarding nature of the RF file to avoid duplication of hardware, overall whether it is desirable or undesirable to do this move. If it is not feasible, state reasons.For this question, let us keep all comparators in the Comp Station in the ID Stage only.

Solution: 3+2 = 5 comparators in ID stage (which include the 3 comparators in IFRF) and 7 comparators in the EX1 stage (total 12 comparators) plus forwarding muxes as shown below.

2.1 Register re-balancing: In this design stalling occurs only in _______ (EX1/EX2) stage. Your VLSI engineer said that EX1 stage has timing problems for whatever reason where as EX2 has slack. So she asks if she can remove the redundant mux in EX1 and retain the mux marked to be removed in EX2? ______ (OK/Not OK).If OK, do you need to add/change one of the 7 comparators in EX1? If it is not OK, what is the reason?____________________________________________________________________________________________________________________________________________________________________________________________________________________________________

A

BS

A

BSPC IM RF

IF ID EX1 EX2 WB

WB_after

8pts

Page 7: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 7 / 10 C Copyright 2018 Gandhi Puvvada

2.2 List the two comparators stationed in ID stage whose inference is carried to the EX1 stage.Comparator #1 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). Comparator #2 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). The inferences are carried into EX1 stage and are used in __________ (A/B/C/D) whereA = stalling logic B = forwarding logic C = both stalling and forwarding logic, D = neither

2.3 Let us account for the 7 comparisons in the EX1 stage: The three sources EX1_XA, EX1_YA, and EX1_ZA are compared with _______________________________________ accounting for ____ of the 7 comparators. The rest are: __________________________________________________________________________________________________________________________________________________________________These 7 are used for (complete this sentence using words like forwarding or stalling or both) ___________________________________________________________________________________________________________________________________________________________________

2.4 Let us explore to see how the design changes, if we perform R <= X + 3 + Z, instead of performing R <= X + Y + Z in the above design (i.e. constant 3 in place of variable Y )

2.4.1 Design #1: The original design with stalling in the ID stage and all comparators in the ID stage: You will have ___ in place of original ___comparators in the IFRF. The new RF has ____ RO (Read Only) ports and ____ WO (Write Only) port(s). In addition, you will have ___ in place of original ___ comparators in the comparator station in the ID stage. Cross off forwarding muxes not needed in the block diagram below drawn for R <= X + Y + Z.

2.4.2 Design #2: Now with stalling moved to EX1 stage, how do these previous numbers change? Previous numbers are 3+2 = 5 comparators in ID stage (which include the 3 comparators in IFRF) and 7 comparators in the EX1 stage (total 12 comparators) plus forwarding muxes as shown below.

Now we need ___+ ___ = ___ comparators in ID stage (which include the ___ comparators in IFRF) and ___ comparators in the EX1 stage (total ___ comparators) plus forwarding muxes as shown above (please cross off unneeded muxes to perform X+3+Z.

12pts

9pts

9pts

A

BS

A

BSPC IM RF

IF ID EX1 EX2 WB

X_

MU

XY

_M

UX

Z1

_M

UX

Z2

_M

UX

12pts

Page 8: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 8 / 10 C Copyright 2018 Gandhi Puvvada

3 ( 6*3 = 18 + 7 bonus = 25 points) 15 min. Flushing by a successful branch:

3.1 Our Lab 6 Verilog code may have chosen to set or clear the "wrist-band" Flip-Flop (flush bit) in the stage register IF/ID on system reset

using the RESET signal. Accordingly we discussed in class a solution for the Lab 6 Part 4 question on flushing two stages of the 7-stage pipeline.

For this question, let us assume that each designer is allowed to choose to set or clear each "wrist-band" Flip-Flop. He can choose to set one FF and clear another. Also some designers below assumed one delay slot where as some assumed zero delay slots. All 6 designs are correct based on their assumptions. Fill-in the table telling us what assumptions make the designs correct.

All control units are identical and are as per the textbook design (a 1-input means it is an instruction destined to be flushed). Fill-in the table above.

4 ( 14 *1 + 1 bonus = 15 points) 5 min. Virtual MemoryMMU stands for _______________________; TLB stands for ________________________PTBR stands for ____________________________PT (Page Table) (essentially a LUT (Look-Up Table)) _______ has both the LHS (Left-Hand Side) and RHS (Right-Hand Side) of the LUT. A Fully Associative TLB has both sides of LUT. T / F Given _______ (VPN/PPFN) the TLB provides __________ (VPN/PPFN).Given _______ (VPN/PPFN) the page table provides __________ (VPN/PPFN).We ___________ (use / don’t use) parallel search to search the page table. We ___________ (use / don’t use) binary search (also called dictionary search) to search the page table. We ___________ (use / don’t use) indexing to ________ to locate the page table entry in ____________ (one/multiple) accesses to a full-length single-level page table.

25pts

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

PC

cont

rol

RESET RESET

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

PC

cont

rol

RESET RESET

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

PC

cont

rol

RESET RESET

#1 #2 #3

FF1 FF2 FF1 FF2 FF1 FF2

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

PC

cont

rol

RESET RESET

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

PC

cont

rol

RESET RESET

Instr.TLB

Instr.cache

IF1 IF2 ID

BR1

PC

cont

rol

RESET RESET

#4 #5 #6

FF1 FF2 FF1 FF2 FF1 FF2

15 pts

Page 9: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 9 / 10 C Copyright 2018 Gandhi Puvvada

5 ( 4 + 18 + 10 = 32 points) 15 min. Cache and MM Organization:

A 16-bit data (D15-D0) 32-bit (logical) address byte-addressable processor (address pins: A31-A1, /BE1-/BE0) has its cache and MM organized as shown below. Fill-in the 12 boxes.Also divide the address below into appropriate fields and name the fields.

5.1 What are the drawbacks of the left-side design, which made us select the right-side design? _____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

18pts

4pts

A19 A18 A17 A16A31 A30 A29 A28 A27 A26 A25 A24 A23 A22 A21 A20 A3 A2 A1 A0A15 A14 A13 A12 A11 A10 A9 A8 A7 A6 A5 A4

BE

1-B

E0

CPU

Cache

16-bitbus

2-w

ay lo

wer

-ord

er

inte

rlea

ved

MM

16

One of the TAG RAMs

Addr

Data-inData-out

Comp

1

Hit

16

Valid

?

10

(5 such TAG RAMs)

=

D7-0D15-8 D7-0D15-8

XCVR XCVR

Note

Note

Address??

16-bitbus

16

Size of one TAG Ram ?

Size of one Byte-wide bank

?

Size of one Comparator

?

Total address space?Degree of Set-Associativity= ?

?

D7-0D15-8D15-D0

D7-0

Address

D7-0D15-8 D7-0D15-8

D7-0D15-8D7-0D15-8

proc

esso

r

Addr.

Data

CacheData RAM

Block 0’s Block 2’sBlock 1’s

Block 3’s Block 4’s

?

D15-8

Address

Addr.

Data

Size of one Byte-wide bank

?

Total cache size in KB______ KB

?

A11-A1

Note

10pts

Page 10: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 10 / 10 C Copyright 2018 Gandhi Puvvada

Blank page: Please write your name and email. Tear it off and use for rough work. Do not submit.email. Tear it off and use for rough work. Do not submit.

Student’s Last Name:____________________ email: __________________

It is not difficult to get an A in EE457. You need to work for it and seek help from the 457 teaching team on whatever you do not understand. We are eager to help you. The next four topics, Multi-cycle CPU, pipelined CPU, cache and virtual memory are interesting and challenging too. They are the focus of the midterm exam. Then we cover ad-vanced topics. Best! Gandhi, TAs: Fangzhou, Chao, Mentors: Rui, Pravin HW Graders: Navtej, Rupam Lab graders: Ujwala, Aashish

Page 11: EE457 Midterm Exam (~24%)

March 22, 2018 10:13 am EE457 MT - Spring 2018 7 / 10 C Copyright 2018 Gandhi Puvvada

2.2 List the two comparators stationed in ID stage whose inference is carried to the EX1 stage.Comparator #1 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). Comparator #2 compares source register ________ (use notation such as ID_ZA) with the destination register _________ (use notation such as WB_RA) and carries the inference _________ (use notation such as ID_ZMEX2). The inferences are carried into EX1 stage and are used in __________ (A/B/C/D) whereA = stalling logic B = forwarding logic C = both stalling and forwarding logic, D = neither

2.3 Let us account for the 7 comparisons in the EX1 stage: The three sources EX1_XA, EX1_YA, and EX1_ZA are compared with _______________________________________ accounting for ____ of the 7 comparators. The rest are: __________________________________________________________________________________________________________________________________________________________________These 7 are used for (complete this sentence using words like forwarding or stalling or both) ___________________________________________________________________________________________________________________________________________________________________

2.4 Let us explore to see how the design changes, if we perform R <= X + 3 + Z, instead of performing R <= X + Y + Z in the above design (i.e. constant 3 in place of variable Y )

2.4.1 Design #1: The original design with stalling in the ID stage and all comparators in the ID stage: You will have ___ in place of original ___comparators in the IFRF. The new RF has ____ RO (Read Only) ports and ____ WO (Write Only) port(s). In addition, you will have ___ in place of original ___ comparators in the comparator station in the ID stage. Cross off forwarding muxes not needed in the block diagram below drawn for R <= X + Y + Z.

2.4.2 Design #2: Now with stalling moved to EX1 stage, how do these previous numbers change? Previous numbers are 3+2 = 5 comparators in ID stage (which include the 3 comparators in IFRF) and 7 comparators in the EX1 stage (total 12 comparators) plus forwarding muxes as shown below.

Now we need ___+ ___ = ___ comparators in ID stage (which include the ___ comparators in IFRF) and ___ comparators in the EX1 stage (total ___ comparators) plus forwarding muxes as shown above (please cross off unneeded muxes to perform X+3+Z.

12pts

9pts

9pts

A

BS

A

BSPC IM RF

IF ID EX1 EX2 WB

X_

MU

XY

_M

UX

Z1

_M

UX

Z2

_M

UX

12pts