Download - Review Problem 0
1
Review Problem 0
As you wait for class to start, answer the following question: What is important in a computer? What features do
you look for when buying one?
2
Review Problem 1
What aspects of a microprocessor can affect performance?
3
Review Problem 2
If a 200 MHz machine runs ½ billion instructions in 10 seconds, what is the CPI of the machine?
If a second machine with the same CPI runs the program in 5 seconds, what is it’s clock rate?
4
Review Problem 3
A program is 20% multiplication, 50% memory access, 30% other. You can quadruple multiplication speed, or double memory speed How much faster with 4x mult:
How much faster with 2x memory:
How much faster with both 4x mult & 2x memory:
5
Review Problem 4
In assembly, compute the average of $a0, $a1, $a2, $a3, and put into $v0
6
Review Problem 5
In assembly, replace the value in $a0 with its absolute value.
7
Review Problem 6
Register $a0 has the address of a 3 integer array. Set $v0 to 1 if the array is sorted (smallest to largest), 0 otherwise.
8
Review Problem 7
Sometimes it can be useful to have a program loop infinitely. We can do that, regardless of location, by the instruction:
LOOP: BEQ $7, $7, LOOP Convert this instruction to machine code
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
9
Review Problem 8
What does the number 110012 represent?
10
Review Problem 9
Perform the following binary computations.
1 0 1 1 0+ 0 0 1 1 1
1 0 0 1- 0 0 1 1
11
Review Problem 10
How would the ALU be used to help with each of the following branches? The first is filled in for you: beq ($rs == $rt) subtract $rt from $rs, use zero flag bne ($rs != $rt) bgez ($rs ≥ 0) bgtz ($rs > 0) blez ($rs ≤ 0) bltz ($rs < 0)
12
Review Problem 11
Design a 4-bit sra (shift arithmetic right) unit. Note that sra $t0, 1 = $t0/2, $t0, 2 = $t0/4, …
13
Review Problem 12
Write MIPS assembly to compute $t1 = $t0*5 without using a multiply or divide instruction.
14
Review Problem 13
What is the value of the following floating-point number?
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 1 1 0
15
Review Problem 14 What is done for these ops during each of the CPU’s execute steps at right?
add $t0, $t1, $t2 sw $t3, 16[$t4] lw $t5, 8[$t6]Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
16
Review Problem 15
Immediate vals for ADDI are sign-extended, while those for ORI are extended with zeros. Build a sign-extend unit that can handle both.
17
Review Problem 16
Develop a single-cycle CPU that can do LW and SW (only). Make it as simple as possible
18
Review Problem 17
What mods are needed to support jump register: PC = Reg[RS]
SignExtn
d
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
SignExtn
dSignE
xtnd
WrEn AddrDin Dout
DataMemory
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn File
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
Sign
Extn
d
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0” Sign
Extn
dS
ignE
xtnd
PC
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Con
catenate
Adder
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0”
19
Review Problem 18
Show the datapath changes and control settings needed to implement “addi $rs, $rt, imm16”
SignE
xtnd
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
SignE
xtndSign
Extnd
WrEn AddrDin Dout
DataMemory
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn File
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
Sign
Extn
d
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0” Sign
Extn
dS
ignE
xtnd
PC
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Con
catenate
Adder
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0”RegDst
ALUSrc
MemToReg
RegWr
MemWr
Branch
Jump
ALUCntrl
20
Review Problem 19
To allow a CPU to spend a cycle waiting, we use a NOP (No operation) function. What are the control settings for the NOP instruction?
SignE
xtnd
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
SignE
xtndSign
Extnd
WrEn AddrDin Dout
DataMemory
WrEn AddrDin Dout
DataMemory
InstructionFetchUnit
InstructionFetchUnit
Rs Rt Rs Rt Rd Imm16
imm16
Instructions[31:0]
[25:21]
[20:16]
[15:11]
[15:0]
BranchJump
ALUSrc
RegDst
Rd Rt
ALUcntrl
Aw Aa Ab DaDw Db
RegisterWrEn File
Aw Aa Ab DaDw Db
RegisterWrEn FileRegWr
MemWr MemToReg
Zero
Sign
Extn
d
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0” Sign
Extn
dS
ignE
xtnd
PC
PC
Addr[31:2] Addr[1:0] Instruction
Memory
Addr[31:2] Addr[1:0] Instruction
Memory
Con
catenate
Con
catenate
Adder
Adder
Instr[31:0]Jump
“00”PC[31:28]
Target Instr[25:0]
imm16
“1”
Zero
Branch
Cin
“0”
RegDst
ALUSrc
MemToReg
RegWr
MemWr
Branch
Jump
ALUCntrl
21
Review Problem 20
Design the PC for a multi-cycle CPU from basic components (AND, OR, INV, MUX, DFF, etc.)
22
Review Problem 21
Show the RTL and datapath to support jr $rs (jump to address in register $rs)
Sign
Extn
d
PC
<<
2
MD
R
AL
UO
utB
AWrEn
Addr DoutMemory
Din
IR
[25-21]
[20-16]
[15-11]
[15-0]
Aw Ab Aa Da
Registers Dw WrEn Db
4
23
Review Problem 22
What mods to the multi-cycle datapath are needed to support load indirect (note – 2 mem accesses): Reg[IR[20:16]] = MEM[MEM[Reg[A] + sign-extend(IR[15:0])]]
Sign
Extn
d
PC
<<
2
MD
R
AL
UO
utB
AWrEn
Addr DoutMemory
Din
IR
[25-21]
[20-16]
[15-11]
[15-0]
Aw Ab Aa Da
Registers Dw WrEn Db
4
24
Review Problem 23 Draw the complete state diagram (Mealy Machine) for the
Mem, Reg, and IR write enables for a machine that does just R-Type and Branch instructions
WE
State Mem Reg IR
1 0 0 1
2 0 0 0
3Br 0 0 0
3Rt 0 0 0
4Rt 0 1 0
25
Review Problem 24
WE ALU
State PC Mem Reg IR SrcA SrcB Op Dest MemIn RegIn PCSrc
1 1 0 0 1 PC 4 + X PC X ALU
2 0 0 0 0 PC <<2 + X X X X
3Br (zero) 0 0 0 A B - X X X ALUout
3Rt 0 0 0 0 A B (IR) X X X X
4Rt 0 0 1 0 X X X Rd X ALUout X
3St 0 0 0 0 A SE + X X X X
4St 0 1 0 0 X X X X ALUout X X
3Lo 0 0 0 0 A SE + X X X X
4Lo 0 0 0 0 X X X X ALUout X X
5Lo 0 0 1 0 X X X Rt X MDR X
We can decide whether SrcA=PC means SrcA is 0 or 1. How should we decide how to best convert the symbolic control values below to specific boolean values?
26
Review Problem 25
A program executes 30% ALU, 30% Load, 20% Store, and 20% Branch. What is the CPI of our multicycle processor for this
machine?
What is the CPI of our single-cycle processor for this program?
27
Review Problem 26
The pipelined CPU has the stage delays shown Is it better to speed up the ALU by 10ns, or the Data
Memory by 2ns?
Does you answer change for a single-cycle or multi-cycle CPU?
Register
Register
Register
Register
PC
DataMemory
Instr.Memory
RegisterFile
RegisterFile
25ns 20ns 30ns20ns 20ns
28
Review Problem 27
If we built our register file to have two write ports (i.e. can write two registers at once) would this help our pipelined CPU?
29
Review Problem 28
What registers are being read and written in the 5th cycle of a pipelined CPU running this code?
add $1, $2, $3nor $4, $5, $6sub $7, $8, $9slt $10, $11, $12nand $13, $14, $15
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
30
Review Problem 29
Do the jump instructions (j, jr) have problems with hazards?
31
Review Problem 30
What forwarding happens on the following code?
lw $t0, 0($t1)add $t2, $t3, $t3nor $0, $t0, $t4bne $t2, $0, ENDsub $t5, $t2, $t4
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
Ifetch Reg/Dec Mem WrExec
32
Review Problem 31
What should we do to this code to run it on a CPU with delay slots?
and $t0, $t1, $t2ori $t0, $t0, 7add $t3, $t4, $t5lw $t6, 0($t3)bgez $t6, FOOj BAR
33
Review Problem 32
Why might a compiler do this transformation?
/* Before */for (j=0; j<2000; j++)
for (i=0; i<2000; i++)x[i][j]+=1;
/* After */for (i=0; i<2000; i++)
for (j=0; j<2000; j++)x[i][j]+=1;
34
Review Problem 33
Level Hit Time Hit Rate
L1 1 cycle 95%
L2 10 cycles 90%
Main Memory 50 cycles 99%
Disk 50,000 cycles 100%
If you can speed up any level’s hit time by a factor of two, which is the best to speed up?
35
Review Problem 34
The length (number of blocks) in a direct mapped cache is always a power of 2. Why?
36
Review Problem 35
For the following access pattern, what is the smallest direct mapped cache that will not use the same cache location twice?
01391741024
37
Review Problem 36
How many total bits are requires for a direct-mapped cache with 64 KB of data and 8-byte blocks, assuming a 32-bit address?
Index bits:
Bits/block:Data:Valid:Tag:
Total size:
38
Review Problem 37
Assume we have three caches, with four one-word blocks: Direct mapped, 2-way set assoc. (w/LRU), and fully associative
How many misses will each have on this address pattern: Byte addresses: 0, 32, 0, 24, 32
39
Review Problem 38
Level Hit Time Hit Rate
L1
L2 10 cycles 90%
Main Memory 40 cycles 99%
Disk 4,000 cycles 100%
Which is the best L1 cache for this system? Direct Mapped: 1 cycle, 80% hit rate 2-way Set Associative: 2 cycle, 90% hit rate Fully Associative: 3 cycle, 95% hit rate
40
Review Problem 39
Can a direct-mapped cache ever have less cache misses than a fully associative cache of the same capacity? Why/why not?
41
Review Problem 40
Assume we have separate instruction and data L1 caches. For each feature, state which cache is most likely to have the given feature
Large blocksize
Write-back
2-cycle hit time
42
Review Problem 41
In a system with 8-bit addresses, virtual memory, and a direct mapped cache, where can these memory accesses be found?
1 011011 01010101 1 111110 00001111 0 111001 11110000 1 111001 11111111
Valid Tag Data
01101001
01100011
11011100
10001011
1 0101 1010 1 0110 1111 0 1000 0001 1 1100 0111
Valid Tag Physical page #
43
Review Problem 42
For a dynamic branch predictor, why is the Branch History Table a direct-mapped cache? Why not fully associative or set associative?
44
Review Problem 43
How would various branch predictors do on the bold branch in the following code?
A 1-bit predictor will be correct ___%
A 2-bit predictor will be correct ___%
A 2-bit correlating predictor with 2-bit global branch history will be correct ___%
while (1) { if (i<2) counter++; /* Branch when I = 2 or 3 */ i=(i+1)&3; /* I counts 0,1,2,3,0,1,2,3,… */}
45
Review Problem 44
Show the constraint graph for this code, indicating the type of hazard for each edge.
1: lw $t1, 4($a0)2: add $t2, $t1, $a03: lw $t3, 8($a1)4: sub $t4, $t3, $a25: sw $t5, 0($a3)6: beq $s0, $s1, FOO
46
Review Problem 45
Would loop unrolling & register renaming be useful for the following code? If so, what would the resulting code look like?
while (i<400) { if (x[i]==CONST) counter++; /* Count number of CONSTs in array */ i++;}
47
Review Problem 46
Schedule the code for a 4-way VLIW. Assume no delay slots, and all instructions in parallel with a branch/jump still execute.
ALU1 ALU2 Load/Store Branch/Jump
1: lw $t1, 4($a0)2: add $t2, $t1, $a03: lw $t3, 8($a1)4: sub $t2, $t3, $a25: sw $t5, 0($a3)6: beq $s0, $s1, FOO7: and $t7, $s0, $s1