Computer OrganizationCS224
Chapter 4 Part a The Processor
Spring 2011
With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture slide contents
CS224 Spring 2011
The Big Picture
• The Five Classic Components of a Computer
• Today’s Topic: Design a Single Cycle Processor
Control
Datapath
Memory
Processor Input
Output
CS224 Spring 2011
The Performance Perspective• Performance of a machine is determined by:
– Instruction count– Clock cycle time– Clock cycles per instruction
• Processor design (datapath and control) will determine:– Clock cycle time– Clock cycles per instruction
• Today: Single cycle processor– Advantage: One clock cycle per instruction– Disadvantage: long cycle time
CPI
Inst. Count Cycle Time
CS224 Spring 2011
Big Picture: Processor Implementation
• Key Ideas– Concept of datapath and control– Where the instruction and data bits go– Modern hardware organization
• Clocking, combinational, and sequential logic using computer organization as an example
– Handling complexity• Abstraction, use commonality, multilevel interpretation
• Approach– Start with a simple implementation and iteratively improve it
CS224 Spring 2011
Processor Design Steps
1. Analyze instruction set => datapath requirements– the meaning of each instruction is given by the register transfers
(ISA model => RTL model)– datapath must include storage element for ISA registers
• possibly more– datapath must support each register transfer
2. Select set of datapath components and establish clocking methodology
3. Assemble datapath meeting the RTL requirements
CS224 Spring 2011
Processor Design (cont’d)4. Analyze implementation of each instruction to determine setting of control points that effect the register transfer.
5. Assemble the control logic
6. RTL datapath and control design are refined to track physical design and functional validation
– Changes made for timing and errata (a.k.a. “bug”) fixes– Amount of work varies with capabilities of CAD tools and degree
of optimization for cost/performance
CS224 Spring 2011
Subset of Instructions• To simplify our study of processor design, we will focus
on a subset of the MIPS instructions– Memory: lw and sw– Arithmetic: add, sub, and, ori, and slt– Branch: beq and j
• Example in lecture uses ori rather than or covered in text, to demonstrate one more category of instructions
• The method of implementing other instructions should come naturally from these
CS224 Spring 2011
MIPS Format Review• R-Format
– add rd, rs, rt– sub rd, rs, rt
OP=0 rs rt rd sa funct
Bits 6 5 5 5 5 6
firstsource
register
secondsource
register
resultregister
shiftamount
functioncode
CS224 Spring 2011
MIPS Format Review (cont)
• I-Format– lw rt, rs, imm– sw rt, rs, imm– beq rs, rt, imm– ori rt, rs, imm
• Reminders– Branch uses PC Relative addressing (PC + 4 + 4 × imm)
OP rs rt imm
Bits 6 5 5 16
firstsource
register
secondsource
register
immediate
CS224 Spring 2011
MIPS Format Review (cont)• J-Format
– j target
• Reminders– Uses pseudodirect addressing (target × 4) to allow addressing
228 bits directly– Uses top 4 bits from PC
OP target
Bits 6 26
jump target address
CS224 Spring 2011
What Happens?• It’s hard to see how we should go about organizing the
processor• To start thinking about it, look at what happens on each
instruction– The instruction specified by the PC is fetched from memory– One or two registers are read (lw vs. add for instance)– The ALU must be used to add, subtract, etc.– The results are stored (to memory or a register)
CS224 Spring 2011
Execution Cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
CS224 Spring 2011
Implementation Overview• Data flows through memory and functional units
RegistersRegister #
Data
Register #
Datamemory
Address
Data
Register #
PC Instruction ALU
Instructionmemory
Address
CS224 Spring 2011
Some Logic Design…• Two important definitions
– Combinational – output is dependent only on current inputs• Example: ALU
– Sequential – element contains state information• Example: Registers
CS224 Spring 2011
1-bitFull
Adder
1 bit ALU• Using a MUX we can add the AND, OR, and adder
operations into a single ALU
A
B
Cout
Cin ALUOp
Mu
x Result
CS224 Spring 2011
4 bit ALU
A0
B01-bitALU
Result0
CIn0
COut0A1
B11-bitALU
Result1
CIn1
COut1A2
B21-bitALU
Result2
CIn2
COut2A3
B31-bitALU
Result3
CIn3
COut3
COut3
ALUopALUop
4
4
A
B
ALUopALUop
3
CS224 Spring 2011
Combinational Elements
32A
B32
Sum
Carry
Ad
der
CarryIn
32A
B32
Y32
Select
MU
X
32
32
A
B32
Result
Zero
OP
AL
U
Adder
ALU
MUX
CS224 Spring 2011
D Latches• Modified SR Latch• Latches value when C is asserted
C
D
Q
Q
CS224 Spring 2011
D Flip Flops• Uses Master/Slave D Latches
D
CLK
Q
Q
D
Latch
D
C
Q
Q
D
Latch
D
C
Q
Q
CS224 Spring 2011
Storage Element: Register• Register
– Similar to D Flip Flop• N bit input and output
• Write Enable input
– Write Enable• 0: Data Out will not change
• 1: Data Out will become Data In
– Data changes only on falling edge!Clk
Data In
Write Enable
N N
Data Out
CS224 Spring 2011
Storage Element: Reg File• Register File consists of 32 registers
– Two 32 bit output busses• busA and busB
– One 32 bit input bus• busW
– Register 0 hard wired to value 0– Register selected by
• RA selects register to put on busA• RB selects register to put on busB• RW selects register to be written via busW when Write Enable is 1
– Clock input (CLK)• CLK input is a factor only for write operation• During read, behaves as combinational logic block
– RA or RB stable busA or busB valid after “access time”– Minor simplification of reality
Clk
busW
Write Enable
32 32busA
32busB
5 5 5RW RA RB
32 32-bitRegisters
CS224 Spring 2011
Storage Element: Memory• Memory
– One input bus: Data In– One output bus: Data Out– Address selection
• Address selects the word to put on Data Out
• To write to address, set Write Enable to 1
– Clock input (CLK)• CLK input is a factor only for write operation• During read, behaves as combinational logic block
– Valid Address Data Out valid after “access time”– Minor simplification of reality
Clk
Data In
Write Enable
32 32Data Out
Address
CS224 Spring 2011
Some Logic Design…• All storage elements have same clock
– Edge-triggered clocking
– “Instantaneous” state change (simplification!)
– Timing always work if the clock is slow enough
Cycle Time = Clk-to-Q + Longest Delay + Setup + Clock Skew
Clk
Don’t CareSetup Hold
.
.
.
.
.
.
Setup Hold
.
.
.
.
.
.
CS224 Spring 2011
Instruction Fetch (I.F.) RTL• Common RTL operations
– Fetch instructionMem[PC]; Fetch instruction from memory
– Update program counter• Sequential
PC <- PC + 4; Calculate next address
CS224 Spring 2011
Datapath: I.F. Unit
32
Instruction WordAddress
InstructionMemory
PCClk
Ad
der
4
CS224 Spring 2011
Add RTL• Add instructionadd rd, rs, rt
Mem[PC]; Fetch instruction from memory
R[rd] <- R[rs] + R[rt]; Add operation
PC <- PC + 4; Calculate next address
OP=0 rs rt rd sa funct
Bits 6 5 5 5 5 6
firstsource
register
secondsource
register
resultregister
shiftamount
functioncode(=32)
CS224 Spring 2011
Sub RTL• Sub instructionsub rd, rs, rt
Mem[PC]; Fetch instruction from memory
R[rd] <- R[rs] - R[rt]; Sub operation
PC <- PC + 4; Calculate next address
OP=0 rs rt rd sa funct
Bits 6 5 5 5 5 6
firstsource
register
secondsource
register
resultregister
shiftamount
functioncode(=34)
CS224 Spring 2011
Datapath: Reg/Reg Ops• R[rd] <- R[rs] op R[rt];
– ALU control and RegWr based on decoded instruction– Ra, Rb, and Rd from rs, rt, rd fields
32
Result
ALU control
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs RtRd
AL
U
Instruction
OR Immediate RTL• OR Immediate instructionori rt, rs, imm
Mem[PC]; Fetch instruction from memory
R[rt] <- R[rs] OR ZeroExt(imm);
OR operation with Zero-Extend
PC <- PC + 4; Calculate next address
OP rs rt imm
Bits 6 5 5 16
firstsource
register
secondregister(dest)
immediate
Datapath: Immediate Ops• Rw set by MUX and ALU B set as busB or ZeroExt(imm)• ALUsrc and RegDst set based on instruction
Rd Rt
32
Result
ALU control
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt (Don’t Care)
RegDst
Zero
Ext
Mux
3216imm16
ALUSrc
AL
U
MU
X
Load RTL• Load instructionlw rt, rs, imm
Mem[PC]; Fetch instruction from memoryAddr <- R[rs]+SignExt(imm); Compute memory addrR[rt] <- Mem[Addr]; Load data into registerPC <- PC + 4; Calculate next address
OP rs rt imm
Bits 6 5 5 16
firstsource
register
secondregister(dest)
immediate
Datapath: Load• Extender handles sign vs. zero extension of immediate• MUX selects between ALU result and Memory output
RtRd
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt (Don’t Care)
RegDst
Exten
der
3216
imm16
ALUSrc
ExtOp
MemtoReg
Clk
Data InWrEn
32
Adr
DataMemory
32MemWr
Mux
MU
XMU
X
AL
U
Store RTL• Store instructionsw rt, rs, imm
Mem[PC]; Fetch instruction from memoryAddr <- R[rs]+ SignExt(imm); Compute memory addrMem[Addr] <- R[rt]; Load data into registerPC <- PC + 4; Calculate next address
OP rs rt imm
Bits 6 5 5 16
firstsource
register
secondsource
register
immediate
CS224 Spring 2011
Datapath: Store• Register rt is passed on busB into memory• Memory address calculated just as in lw case
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs RtRegDst
Exten
der
3216imm16
ALUSrc
MemtoReg
Clk
Data In WrEn32
Adr
DataMemory
MemWr
RtRd
Mux
AL
U
MU
X
32
MU
X
ExtOp
Branch RTL• Branch instructionbeq rs, rt, imm
Mem[PC]; Fetch instruction from memory
Cond <- R[rs] – R[rt]; Calculate branch condition
if (Cond eq 0) Test if equal
PC <- PC + 4 +
SignExt(imm)*4;Calculate PC Relative address
else
PC <- PC + 4; Calculate next address
OP rs rt immBits 6 5 5 16
firstsource
secondsource
immediate
CS224 Spring 2011
Datapath: Branch
ExtOp
ALUctr
Clk
busW
RegWr
3232
busA
32busB
5 5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst
Exten
der
3216
imm16
ALUSrc
PCClk
Next AddressLogic16
imm16
Branch
To InstructionMemory
Zero
More Detail to Come
AL
U
MU
X
RtRd
Mux
CS224 Spring 2011
The Next Address• PC is byte-addressed in instruction memory
– SequentialPC[31:0] = PC[31:0] + 4
– Branch operationPC[31:0] = PC[31:0] + 4 + SignExt(imm) × 4
• Instruction Addresses– PC is byte addressed, but instructions are 4 bytes long– Therefore 2 LSBs of the 32 bit PC are always 0– No reason to have hardware keep the 2 LSBs Simplify hardware by using 30 bit PC
• SequentialPC[31:2] = PC[31:2] + 1
• Branch operationPC[31:2] = PC[31:2] + 1 + SignExt(imm)
CS224 Spring 2011
Datapath: Fast, Expensive Next-I.F. Logic
• PC incremented to next instruction normally• On beq instruction then can add immediate × 4 to PC + 4
3030
Sig
nE
xt
30
16imm16
“1”
PC
Clk
30
30
BranchZero
Addr[31:2]
InstructionMemory
Addr[1:0]“00”
32
Instruction[31:0]Instruction[15:0]
30Ad
der A
dd
er
MU
X
Datapath: Slow, Smaller Next-I.F. Logic
• Slow because cannot start address add until ALU zero• But probably not the critical path (LOAD is usually)
30
30
Sig
nE
xt
3016imm16
“0”
PC
Clk
30
Branch Zero
Addr[31:2]
InstructionMemory
Addr[1:0]“00”
32
Instruction[31:0]
30
“1”
Carry In
Instruction[15:0]
MU
X
Ad
der
CS224 Spring 2011
Jump RTL• Jump instructionj target
Mem[PC]; Fetch instruction from memoryPC[31:2] <- PC[31:28] ||
target[25:0]; Calculate next address
OP target
Bits 6 26
jump target address
Datapath: I.F. Unit with Jump• MUX controls if PC is pseudodirect jump
3030
Sig
nE
xt
30
16imm16
“1”
PC
Clk
30
30
Branch Zero
“00”
Addr[31:2]
InstructionMemory
Addr[1:0]
32
26
4PC[31:28]
Target 30
Jump
Instruction[15:0]
Instruction[31:0]
30
Instruction[25:0]M
UX
Ad
der
MU
X
Ad
der
CS224 Spring 2011
Putting it All Together
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst
Exten
der
3216imm16
ALUSrc
ExtOp
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
JumpBranch [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux
CS224 Spring 2011
A Real MIPS Datapath
CS224 Spring 2011
Control Design
• Next: Designing the Control for the Single Cycle Datapath
Control
Datapath
Memory
Processor
Input
Output
CS224 Spring 2011
Adding Control• Analyze datapath and RTLs for control
– Identify control points for pieces of the datapath• Instruction Fetch Unit
• Integer function units
• Memory
– Categorize type of control signal• Flow of data through multiplexors
• Writes of state information
– Derive control signal values for each instruction
• Design and implement control with logic/PLA/ROM (for single cycle & pipelined)
Instruction Fetch (first part)• Always fetch next instruction
Mem[PC];
3030
Sig
nE
xt
30
16imm16
“1”
PC
Clk
30
30
Branch = previous
Zero =previous
“00”
Addr[31:2]
InstructionMemory
Addr[1:0]
32
26
4PC[31:28]
Target 30
Jump = previous
Instruction[15:0]
Instruction[31:0]
30
Instruction[25:0]M
UX
Ad
der
MU
X
1
00
1A
dd
er
5
Control for Arithmetic
32
ALUctr = <op>
Clk
busW
RegWr = 1
3232
busA
32busB
5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst = 1
Exten
der
3216imm16
ALUSrc = 0
ExtOp = X
MemtoReg = 0
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
Jump = 0Branch = 0 [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
Instruction Fetch at End• Increment PC: PC = PC+4; (for all but Branch/Jump)
3030
Sig
nE
xt
30
16imm16
“1”
PC
Clk
30
30
Branch = 0 Zero = X
“00”
Addr[31:2]
InstructionMemory
Addr[1:0]
32
26
4PC[31:28]
Target 30
Jump = 0
Instruction[15:0]
Instruction[31:0]
30
Instruction[25:0]M
UX
Ad
der
MU
X
0
1
1
0
Ad
der
Control for Immediate (ori)
32
ALUctr = <op>
Clk
busW
RegWr = 1
3232
busA
32busB
55 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst = 0
Exten
der
3216imm16
ALUSrc=1
ExtOp=0
MemtoReg = 0
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
Jump = 0Branch = 0 [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
5
Control for Load (lw)
32
ALUctr = Add
Clk
busW
RegWr = 1
3232
busA
32busB
5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst = 0
Exten
der
3216imm16
ALUSrc = 1
ExtOp = 1
MemtoReg = 1
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
Jump = 0Branch = 0 [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
5
Control for Store (sw)
32
ALUctr = Add
Clk
busW
RegWr = 0
3232
busA
32busB
5 5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst = X
Exten
der
3216imm16
ALUSrc = 1
ExtOp = 1
MemtoReg = X
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 1
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
Jump = 0Branch = 0 [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
5 5
Control for Branch (beq)
32
ALUctr = Sub
Clk
busW
RegWr = 0
3232
busA
32busB
5
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst = X
Exten
der
3216imm16
ALUSrc = 0
ExtOp = X
MemtoReg = X
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
Jump = 0Branch = 1 [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
Instruction Fetch (beq)Consider the interesting case where we branch (Zero = 1)
3030
Sig
nE
xt
30
16imm16
“1”
PC
Clk
30
30
Branch = 1 Zero = 1
“00”
Addr[31:2]
InstructionMemory
Addr[1:0]
32
26
4PC[31:28]
Target 30
Jump = 0
Instruction[15:0]
Instruction[31:0]
30
Instruction[25:0]M
UX
Ad
der
MU
X
0
1
1
0
Ad
der
5 5 5
Control for Jump (j)
32
ALUctr = X
Clk
busW
RegWr = 0
3232
busA
32busB
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst = X
Exten
der
3216imm16
ALUSrc = X
ExtOp = X
MemtoReg = X
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr = 0
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
Jump = 1Branch = 0 [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
Instruction Fetch (j)
3030
Sig
nE
xt
30
16imm16
“1”
PC
Clk
30
30
Branch = 0 Zero = X
“00”
Addr[31:2]
InstructionMemory
Addr[1:0]
32
26
4PC[31:28]
Target 30
Jump = 1
Instruction[15:0]
Instruction[31:0]
30
Instruction[25:0]M
UX
Ad
der
MU
X
0
1
1
0
Ad
der
CS224 Spring 2011
5 5 5
Control Path
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst
Exten
der
3216imm16
ALUSrc
ExtOp
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
JumpBranch [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
CS224 Spring 2011
Summary of Control Signals
add sub ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOpALUctr<2:0>
1001000x
Add
1001000x
Sub
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Sub
xxx0001x
xxx
funcop 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
coding from green card
10 0000 10 0010 Not Important
CS224 Spring 2011
Multilevel Decoding• 12-input control will be very large (212 = 4096)• To keep decoder size smaller, decode some control
lines in each stage • Since only R-type instructions (with op = 000000)
need function field bits, give these to ALU control
func
MainControl
op
6
ALUControl(Local)
N
6ALUop
ALUctr
3
AL
U
CS224 Spring 2011
Multilevel Decoding: Main Control Table
R-type ori lw sw beq jumpRegDstALUSrcMemtoRegRegWriteMemWriteBranchJumpExtOp
ALUop<N:0>
1001000x
“R-type”
01010000
Or
01110001
Add
x1x01001
Add
x0x0010x
Subtract
xxx0001x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
The Encoding of ALUop
• In this exercise, ALUop has to be 2 bits wide to represent:– (1) “R-type” instructions
– “I-type” instructions that require the ALU to perform:
• (2) Or, (3) Add, and (4) Subtract
• To implement the full MIPS ISA, ALUop has to be 3 bits wide to represent:– (1) “R-type” instructions
– “I-type” instructions that require the ALU to perform:
• (2) Or, (3) Add, (4) Subtract, and (5) And (e.g. andi)
MainControl
op
6
ALUControl(Local)
func
N
6ALUop
ALUctr
3
R-type ori lw sw beq jump
ALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx
CS224 Spring 2011
The Decoding of the “func” Field
R-type ori lw sw beq jump
ALUop (Symbolic) “R-type” Or Add Add Subtract xxx
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01 xxx
MainControl
op
6
ALUControl(Local)
func
N
6ALUop
ALUctr
3
op rs rt rd shamt funct
061116212631
R-type
funct<5:0> Instruction Operation
10 0000
10 0010
10 0100
10 0101
10 1010
add
subtract
and
or
set-on-less-than
ALUctr<2:0> ALU Operation
000
001
010
110
111
Add
Subtract
And
Or
Set-on-less-than
ALUctr
AL
U
Truth Tables
R-type ori lw sw beqALUop(Symbolic) “R-type” Or Add Add Subtract
ALUop<2:0> 1 00 0 10 0 00 0 00 0 01
ALUop func
bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x
ALUctrALUOperation
Add 0 1 0
bit<2> bit<1> bit<0>
0 x 1 x x x x Subtract 1 1 0
0 1 x x x x x Or 0 0 1
1 x x 0 0 0 0 Add 0 1 0
1 x x 0 0 1 0 Subtract 1 1 0
1 x x 0 1 0 0 And 0 0 0
1 x x 0 1 0 1 Or 0 0 1
1 x x 1 0 1 0 Set on < 1 1 1
funct<3:0> Instruction Op.
0000
0010
0100
0101
1010
add
subtract
and
or
set-on-less-than
CS224 Spring 2011
The Logic Equation for ALUctr<2>
ALUop func
bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<2>
0 x 1 x x x x 1
1 x x 0 0 1 0 1
1 x x 1 0 1 0 1
• ALUctr<2> = !ALUop<2> & ALUop<0> +
ALUop<2> & !func<2> & func<1> & !func<0>
This makes func<3> a don’t care
CS224 Spring 2011
The Logic Equation for ALUctr<1>
ALUop func
bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3>
0 0 0 x x x x 1
ALUctr<1>
0 x 1 x x x x 1
1 x x 0 0 0 0 1
1 x x 0 0 1 0 1
1 x x 1 0 1 0 1
• ALUctr<1> = !ALUop<2> & !ALUop<1> +
ALUop<2> & !func<2> & !func<0>
CS224 Spring 2011
The Logic Equation for ALUctr<0>
ALUop func
bit<2> bit<1> bit<0> bit<2> bit<1> bit<0>bit<3> ALUctr<0>
0 1 x x x x x 1
1 x x 0 1 0 1 1
1 x x 1 0 1 0 1
• ALUctr<0> = !ALUop<2> & ALUop<1> +
ALUop<2> & !func<3> & func<2> & !func<1> & func<0> + ALUop<2> & func<3> & !func<2> & func<1> & !func<0>
CS224 Spring 2011
The ALU Control Logic
ALUControl(Local)
func
3
6ALUop
ALUctr
3
• ALUctr<2> = !ALUop<2> & ALUop<0> +
ALUop<2> & !func<2> & func<1> & !func<0>• ALUctr<1> = !ALUop<2> & !ALUop<0> +
ALUop<2> & !func<2> & !func<0>• ALUctr<0> = !ALUop<2> & ALUop<1>
+ ALUop<2> & !func<3> & func<2> & !func<1> & func<0>
+ ALUop<2> & func<3> & !func<2> & func<1> & !func<0>
Main Control Truth Table
R-type ori lw sw beq jump
RegDst
ALUSrc
MemtoReg
RegWrite
MemWrite
Branch
Jump
ExtOp
ALUop (Symbolic)
1
0
0
1
0
0
0
x
“R-type”
0
1
0
1
0
0
0
0
Or
0
1
1
1
0
0
0
1
Add
x
1
x
0
1
0
0
1
Add
x
0
x
0
0
1
0
x
Subtract
x
x
x
0
0
0
1
x
xxx
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
ALUop <2> 1 0 0 0 0 x
ALUop <1> 0 1 0 0 0 x
ALUop <0> 0 0 0 0 1 x
MainControl
op
6
ALUControl(Local)
func
3
6
ALUop
ALUctr
3
RegDst
ALUSrc
:
CS224 Spring 2011
Truth Table for RegWrite
R-type ori lw sw beq jump
RegWrite 1 1 1 0 0 0
op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
• RegWrite = R-type + ori + lw
= !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type) + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori) + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw)
op<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jump
RegWrite
CS224 Spring 2011
PLA Implementationop<0>
op<5>. .op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
op<5>. .<0>
R-type ori lw sw beq jumpRegWrite
ALUSrc
MemtoReg
MemWrite
Branch
Jump
RegDst
ExtOp
ALUop<2>
ALUop<1>
ALUop<0>
CS224 Spring 2011
Implementing Control• Programmable Logic Array (PLA) vs.
“Random Logic”– Design Changes
• Validation changes are common• PLA is less work to change; area/timing impact is predictable
– Area• Tradeoff depends on complexity of logic (# of gates)
– Timing and Power• Random logic generally better since individual paths can be tuned
• Alternative approach is Read Only Memory (ROM/PROM) – Also combinational, but size makes it slow– used for microcoded control with more than one state/cycle per
instruction
CS224 Spring 2011
5 5 5
Putting It All Together
32
ALUctr
Clk
busW
RegWr
3232
busA
32busB
Rw Ra Rb32 32-bitRegisters
Rs Rt
RegDst
Exten
der
3216imm16
ALUSrc
ExtOp
MemtoReg
Clk
Data InWrEn
32Adr
DataMemory
32
MemWr
InstructionFetch Unit
Clk
Zero
Instruction[31:0]
JumpBranch [21:25]
[16:20]
[11:15]
[0:15]
Imm16RdRsRt
MU
X
MU
X
AL
U
RtRd
Mux1 0
0
1
0
1
MainControl
op6
ALUControl func
6
3ALUop
ALUctr3
RegDst
ALUSrc
:Instr[5:0]
Instr[31:26]
Worst Case Timing (Load)Clk
PC
Rs, Rt, Rd,Op, Func
Clk-to-Q
ALUctr
Instruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busA
Register File Access Time
Old Value New Value
busB
ALU Delay
Old Value New Value
Old Value New Value
New ValueOld Value
ExtOp Old Value New Value
ALUSrc Old Value New Value
MemtoReg Old Value New Value
Data Mem Address Old Value New Value
busW Old Value New
Sum of {Mux Delay+setup+skew}
Delay through Extender & Mux
RegisterWrite Occurs --->
Data Memory Access Time
CS224 Spring 2011
Single Cycle Processor• Advantages
– Single cycle per instruction makes logic and clock simple– All machines would have a CPI of 1
• Disadvantages– Inefficient utilization of memory and functional units since different
instructions take different lengths of time• Each functional unit is used only once per clock cycle• e.g. ALU only computes values a small amount of the time
– Cycle time is the worst case path long cycle times!• Load instruction
– PC CLK-to-Q + – instruction memory access time + – register file access time + – ALU delay + – data memory access time + – register file setup time + – clock skew
– All machines would have a CPI of 1, with cycle time set by the longest instruction!
Single cycle datapath => CPI=1, CCT => long
5 steps to design a processor• 1. Analyze instruction set => datapath requirements
• 2. Select set of datapath components & establish clock methodology
• 3. Assemble datapath meeting the requirements
• 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
• 5. Assemble the control logic
Control is the hard part
MIPS makes control easier• Instructions same size
• Source registers always in same place
• Immediates same size, location
• Operations always on registers/immediates
Summary
Control
Datapath
Memory
ProcessorInput
Output