lecture 5. mips processor design single-cycle mips #2
DESCRIPTION
ECM534 Advanced Computer Architecture. Lecture 5. MIPS Processor Design Single-Cycle MIPS #2. Prof. Taeweon Suh Computer Science Education Korea University. Single-Cycle MIPS. Again, keep in mind that microarchitecture is composed of 2 interacting parts Datapath Control - PowerPoint PPT PresentationTRANSCRIPT
Lecture 5. MIPS Processor DesignSingle-Cycle MIPS #2
Prof. Taeweon SuhComputer Science Education
Korea University
ECM534 Advanced Computer Architecture
Korea Univ
Single-Cycle MIPS
• Again, keep in mind that microarchitecture is composed of 2 interacting parts Datapath Control
• Let’s execute some example instructions on what we have designed so far
• Then, we are going to design control logic in detail
2
Korea Univ
Single-Cycle MIPS - lw
• Let’s start with a memory access instruction - lw
3
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr
CLK
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type• STEP 1: Instruction Fetch
lw $2, 80($0)
Korea Univ
Single-Cycle MIPS - lw
• STEP 2: Decoding Read source operands from register file
4
Instr
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
RegisterFile
A RDData
MemoryWD
WEPCPC'
25:21
CLK
lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle MIPS - lw
• STEP 2: Decoding Sign-extend the immediate
5
SignImm
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
CLK
module signext(input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a};endmodule
lw $2, 80($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle MIPS - lw
6
SignImm
CLK
A RD
InstructionMemory
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB
ALUResult
SrcA Zero
CLK
ALUControl2:0
ALU
010
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
• STEP 3: Execution Compute the memory address
lw $2, 80($0)
Korea Univ
Single-Cycle MIPS - lw
7
A1
A3WD3
RD2
RD1WE3
A2
SignImm
CLK
A RD
InstructionMemory
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
• STEP 4: Execution Read data from memory and write it to register file
lw $2, 80($0)
Korea Univ
Single-Cycle MIPS – PC
• CPU starts fetching the next instruction from PC+4
8
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
15:0
SrcB20:16
ALUResult ReadData
SrcA
PCPlus4
Result
RegWrite
Zero
CLK
ALUControl2:0
ALU
0101
module adder(input [31:0] a, b, output [31:0] y); assign y = a + b;endmodule
adder pcadd1(.a (pc), .b (32'b100) .y (pcplus4));
Korea Univ
Single-Cycle MIPS - sw• Let’s execute another memory access instruction - sw
sw instruction needs to write data to memory
9
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
A RDData
MemoryWD
WEPCPC' Instr 25:21
20:16
15:0
SrcB20:16
ALUResult ReadData
WriteData
SrcA
PCPlus4
Result
MemWriteRegWrite
Zero
CLK
ALUControl2:0
ALU
10100
Example: sw $2, 84($0) op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle MIPS - add, sub, and, or
• Let’s consider arithmetic and logical instructions - add, sub, and, or Write ALUResult to register file Note that R-type instructions write to rd
field of instruction (instead of rt)
10
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PCPC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
ALUResult ReadData
WriteData
SrcA
PCPlus4WriteReg4:0
Result
RegDst MemWrite MemtoRegALUSrcRegWrite
Zero
CLK
ALUControl2:0
ALU
0varies1 001
op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Type
Korea Univ
Single-Cycle MIPS - beq• Let’s consider a branch instruction - beq
Determine whether register values are equal Calculate branch target address (BTA) from sign-extended immediate and PC+4
11
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
RegDst Branch MemWrite MemtoRegALUSrcRegWrite
Zero
PCSrc
CLK
ALUControl2:0
ALU
01100 x0x 1
Example: beq $4,$0,around op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
Korea Univ
Single-Cycle MIPS - or
12
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
0010
01
0
0
1
0
• Let’s see how or instruction works out in the implementation with control signals
op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Type
Korea Univ
Single-Cycle MIPS
13
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
• As mentioned, CPU is designed with datapath and control• Now, let’s delve into the ALU and control part design
Korea Univ
ALU (Arithmetic Logic Unit)
14
ALU
N N
N3
A B
Y
F
N = 32 in 32-bit processor
+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2Zero
Extend
// slt (set less than)// $t0 = 1 if $t1 < $t2slt $t0, $t1, $t2
adder
F2:0 Function
000 A & B001 A | B010 A + B011 not used
100 A & ~B101 A | ~B110 A - B
111 SLT
Korea Univ
Verilog Code – ALU
15
module alu(input [31:0] a, b, input [2:0] alucont, output reg [31:0] result, output zero);
wire [31:0] b2, sum, slt;
assign b2 = alucont[2] ? ~b:b; // addition (sub) assign sum = a + b2 + alucont[2]; assign slt = sum[31]; // SLT
always@(*) begin case(alucont[1:0]) 2'b00: result <= a & b2; // A & B 2'b01: result <= a | b2; // A | B 2'b10: result <= sum; // A + B, A - B 2'b11: result <= slt; // SLT endcase end
// for branch assign zero = (result == 32'b0);
endmodule
ALU
N N
N3
A B
Y
FF2:0 Function
000 A & B
001 A | B
010 A + B
011 not used
100 A & ~B
101 A | ~B
110 A - B
111 SLT+
2 01
A B
Cout
Y
3
01
F2
F1:0
[N-1] S
NN
N
N
N NNN
N
2
ZeroE
xtend
Korea Univ
Control Unit
16
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWriteOpcode and funct fields come from the fetched instruction
op rs rt rd shamt funct6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
R-Type
op rs rt imm6 bits 5 bits 5 bits 16 bits
I-Type
op addr6 bits 26 bits
J-Type
Korea Univ
Control Unit - ALU Control
17
ALUOp1:0 Meaning
00 Add
01 Subtract
10 Look at Funct
11 Not Used
ALUOp1:0 Funct ALUControl2:000 X 010 (add)
X1 X 110 (subtract)
1X 100000 (add) 010 (add)
1X 100010 (sub) 110 (subtract)
1X 100100 (and) 000 (and)
1X 100101 (or) 001 (or)
1X 101010 (slt) 111 (slt)
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
• Memory access instructions (lw, sw) need to use ALU to calculate memory target address (addition)
• Branch instructions (beq, bne) need to use ALU for the equality check (subtraction)
• Implementation is completely dependent on hardware designers• But, the designers should make sure the implementation is
reasonable enough
Korea Univ
Control Unit - Main Decoder
18
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000
lw 100011
sw 101011
beq 000100
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
ALUOp1:0 Meaning
00 Add
01 Subtract
10 Look at Funct field
11 Not Used
1 1 0 0 0 10
10
0
0 1 0 0 1 00X 1 0 1 X 00X 0 1 0 X 01
0
Korea Univ
How about Other Instructions?
19
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
Example: addi $t0, $t1, -14
• Now, we are done with the control part design• Let’s examine if the design is able to execute other instructions
Korea Univ
Control Unit - Main Decoder
20
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type 000000 1 1 0 0 0 0 10lw 100011 1 0 1 0 0 1 00sw 101011 0 X 1 0 1 X 00beq 000100 0 X 0 1 0 X 01addi 001000 1 0 1 0 0 0 00
Korea Univ
Control Unit - Main Decoder
21
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
• How about jump instructions? j op addr
6 bits 26 bits
J-Type
Korea Univ
Control Unit - Main Decoder
22
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01 PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU
01
25:0 <<2
27:0 31:28
PCJump
Jump
• We added new hardware to support the j instruction A logic to compute the target address Mux and control signal
op addr6 bits 26 bits
J-Type
Korea Univ
Control Unit - Main Decoder
23
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
addi 001000 1 0 1 0 0 0 00 0
j 000100
• There should be one more output (jump) in the main decoder to support the jump instructions
0 X X X 0 X XX 1
Korea Univ
Verilog Code - Main Decoder and ALU Control
24
module maindec(input [5:0] op, output memtoreg, memwrite, output branch, alusrc, output regdst, regwrite, output jump, output [1:0] aluop);
reg [8:0] controls;
assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls;
always @(*) begin case(op) 6'b000000: controls <= 9'b110000010; // R-type 6'b100011: controls <= 9'b101001000; // lw 6'b101011: controls <= 9'b001010000; // sw 6'b000100: controls <= 9'b000100001; // beq 6'b001000: controls <= 9'b101000000; // addi 6'b000010: controls <= 9'b000000100; // j default: controls <= 9'bxxxxxxxxx; // ??? endcase end
endmodule
module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol);
always @(*) begin case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; // ADD 6'b100010: alucontrol <= 3'b110; // SUB 6'b100100: alucontrol <= 3'b000; // AND 6'b100101: alucontrol <= 3'b001; // OR 6'b101010: alucontrol <= 3'b111; // SLT default: alucontrol <= 3'bxxx; // ??? endcase endcase end
endmodule
RegDst
BranchMemWriteMemtoReg
ALUSrcOpcode5:0
ControlUnit
ALUControl2:0Funct5:0
MainDecoder
ALUOp1:0
ALUDecoder
RegWrite
ALUOp1:0 Meaning
00 Add
01 Subtract
10 Look at Funct
11 Not Used
Korea Univ
Single-Cycle MIPS Performance• How fast is the single-cycle processor?• Clock cycle time (frequency) is limited by the critical path
The critical path is the path that takes the longest time What do you think the critical path is?
• The path that lw instruction goes through
25
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
0100
1
0
1
0 0
Korea Univ
Single-Cycle MIPS Performance• Critical path of single-cycle MIPSTc = tpcq_PC + tmem + max(tRFread, tsext) + tmux + tALU + tmem + tmux + tRFsetup
• In most implementations, limiting paths are: memory (instruction and data), ALU, register file.Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup
26
SignImm
CLK
A RD
InstructionMemory
+
4
A1
A3WD3
RD2
RD1WE3
A2
CLK
Sign Extend
RegisterFile
01
01
A RDData
MemoryWD
WE01
PC01
PC' Instr 25:21
20:16
15:0
5:0
SrcB
20:16
15:11
<<2
+
ALUResult ReadData
WriteData
SrcA
PCPlus4
PCBranch
WriteReg4:0
Result
31:26
RegDst
BranchMemWriteMemtoReg
ALUSrc
RegWrite
OpFunct
ControlUnit
Zero
PCSrc
CLK
ALUControl2:0
ALU1
0100
1
0
1
0 0 Elements Parameter
Register clock-to-Q tpcq_PC
Multiplexer tmux
ALU tALU
Memory read tmem
Register file read tRFread
Register file setup tRFsetup
Korea Univ
Example
27
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup = [30 + 2(250) + 150 + 2(25) + 200 + 20] ps = 950 ps
Elements Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
• Assuming that the CPU should execute 100 billion instructions to run your program, what is the execution time of the program on a single-cycle MIPS processor?Execution Time = (#instructions) x (cycles/instruction) x (seconds/cycle) = (100 × 109) x (1) x (950 × 10-12 s)
= 95 seconds
fc = 1/Tc
fc = 1/950ps = 1.052GHz