risc architecture - iit bombay
TRANSCRIPT
CADSL
RISC Architecture: Multi-Cycle Implementation
Virendra Singh
Associate Professor Computer Architecture and Dependable Systems Lab
Department of Electrical Engineering Indian Institute of Technology Bombay
http://www.ee.iitb.ac.in/~viren/ E-mail: [email protected]
Computer Organization & Architecture
Lecture 13 (16 April 2013)
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 2
Example Processor MIPS subset MIPS Instruc,on – Subset v Arithme,c and Logical Instruc,ons
Ø add, sub, or, and, slt v Memory reference Instruc,ons
Ø lw, sw v Branch
Ø beq, j
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 3
v simple instruc,ons, all 32 bits wide v very structured, no unnecessary baggage v only three instruc,on formats
v rely on compiler to achieve performance
funct op rs1 rs2 rd shmt op rs1 rd 16 bit address/data
op 26 bit address
R
I
J
Instruction Format
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 4
How Fast Can the Clock Be?
• If every instruc,on is executed in one clock cycle, then: – Clock period must be at least 8ns to perform the longest instruc,on, i.e., lw.
– This is a single cycle machine. – It is slower because many instruc,ons take less than 8ns but are s,ll allowed that much ,me.
• Method of speeding up: Use mul,cycle datapath.
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 5
Multicycle Datapath PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Mem
ory
Addr.
Data
One-cycle data transfer paths (need registers to hold data)
4
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 6
Multicycle Datapath Requirements
• Only one ALU, since it can be reused. • Single memory for instruc,ons and data. • Five registers added:
– Instruc,on register (IR) – Memory data register (MDR) – Three ALU registers, A and B for inputs and ALUOut for output
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 7
Multicycle Datapath PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Mem
ory
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20 21-25
IorD
MemtoReg ALUOp
ALU
SrcB
ALU
SrcA
RegDst IRWrite
RegWrite
MemWrite MemRead
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource PC
Writ
e e
tc.
26-31 to Control
FSM
11-1
5
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 8
3 to 5 Cycles for an Instruction Step R-type
(4 cycles) Mem. Ref.
(4 or 5 cycles) Branch type (3 cycles)
J-type (3 cycles)
Instruc9on fetch IR ← Memory[PC]; PC ← PC+4
Instr. decode/ Reg. fetch
A ← Reg(IR[21-‐25]); B ← Reg(IR[16-‐20]) ALUOut ← PC + (sign extend IR[0-‐15]) << 2
Execu9on, addr. Comp., branch & jump comple9on
ALUOut ← A op B
ALUOut ← A+sign extend (IR[0-‐15])
If (A= =B) then
PC←ALUOut
PC←PC[28-‐31] ||
(IR[0-‐25]<<2)
Mem. Access or R-‐type comple9on
Reg(IR[11-‐15]) ← ALUOut
MDR←M[ALUout] or M[ALUOut]←B
Memory read comple9on
Reg(IR[16-‐20]) ← MDR
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 9
Cycle 1 of 5: Instruction Fetch (IF)
• Read instruc,on into IR, M[PC] → IR • Control signals used:
» IorD = 0 select PC » MemRead = 1 read memory » IRWrite = 1 write IR
• Increment PC, PC + 4 → PC • Control signals used:
» ALUSrcA = 0 select PC into ALU » ALUSrcB = 01 select constant 4 » ALUOp = 00 ALU adds » PCSource = 00 select ALU output » PCWrite = 1 write PC
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 10
Cycle 2 of 5: Instruction Decode (ID)
• Control unit decodes instruc,on • Datapath prepares for execu,on
• R and I types, reg 1→ A reg, reg 2 → B reg » No control signals needed
• Branch type, compute branch address in ALUOut » ALUSrcA = 0 select PC into ALU » ALUSrcB = 11 Instr. Bits 0-‐15 shif 2 into ALU » ALUOp = 00 ALU adds
opcode | reg 1 | reg 2 | reg 3 | shamt | fncode
opcode | reg 1 | reg 2 | word address increment
opcode | word address jump
31-26 25-21 20-16 15-11 10-6 5-0
R
I
J
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 11
Cycle 3 of 5: Execute (EX) • R type: execute func,on on reg A and reg B, result in ALUOut
• Control signals used: » ALUSrcA = 1 A reg into ALU » ALUsrcB = 00 B reg into ALU » ALUOp = 10 instr. Bits 0-‐5 control ALU
• I type, lw or sw: compute memory address in ALUOut ← A reg + sign extend IR[0-‐15]
• Control signals used: » ALUSrcA = 1 A reg into ALU » ALUSrcB = 10 Instr. Bits 0-‐15 into ALU » ALUOp = 00 ALU adds
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 12
Cycle 3 of 5: Execute (EX) • I type, beq: subtract reg A and reg B, write ALUOut to PC
• Control signals used: » ALUSrcA = 1 A reg into ALU » ALUsrcB = 00 B reg into ALU » ALUOp = 01 ALU subtracts » If zero = 1, PCSource = 01 ALUOut to PC » If zero = 1, PCwriteCond = 1 write PC » Instruc9on complete, go to IF
• J type: write jump address to PC ← IR[0-‐25] shif 2 and four leading bits of PC
• Control signals used: » PCSource = 10 » PCWrite = 1 write PC » Instruc9on complete, go to IF
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 13
Cycle 4 of 5: Reg Write/Memory • R type, write des,na,on register from ALUOut
• Control signals used: » RegDst = 1 Instr. Bits 11-‐15 specify reg. » MemtoReg = 0 ALUOut into reg. » RegWrite = 1 write register » Instruc9on complete, go to IF
• I type, lw: read M[ALUOut] into MDR • Control signals used:
» IorD = 1 select ALUOut into mem adr. » MemRead = 1 read memory to MDR
• I type, sw: write M[ALUOut] from B reg • Control signals used:
» IorD = 1 select ALUOut into mem adr. » MemWrite = 1 write memory » Instruc9on complete, go to IF
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 14
Cycle 5 of 5: Reg Write
• I type, lw: write MDR to reg[IR(16-‐20)] • Control signals used:
» RegDst = 0 instr. Bits 16-‐20 are write reg » MemtoReg = 1 MDR to reg file write input » RegWrite = 1 read memory to MDR » Instruc9on complete, go to IF
For an alternative method of designing datapath, see N. Tredennick, Microprocessor Logic Design, the Flowchart Method, Digital Press, 1987.
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 15
1-bit Control Signals Signal name Value = 0 Value =1
RegDst Write reg. # = bit 16-20 Write reg. # = bit 11-15 RegWrite No action Write reg. ← Write data ALUSrcA First ALU Operand ← PC First ALU Operand←Reg. A MemRead No action Mem.Data Output←M[Addr.] MemWrite No action M[Addr.]←Mem. Data Input MemtoReg Reg.File Write In←ALUOut Reg.File Write In←MDR IorD Mem. Addr. ← PC Mem. Addr. ← ALUOut IRWrite No action IR ← Mem.Data Output PCWrite No action PC is written PCWriteCond No action PC is written if zero(ALU)=1
PCWrite etc. PCWrite
PCWriteCond zero(ALU)
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 16
2-bit Control Signals Signal name Value Action
ALUOp
00 ALU performs add 01 ALU performs subtract 10 Funct. field (0-5 bits of IR ) determines ALU operation
ALUSrcB
00 Second input of ALU ← B reg. 01 Second input of ALU ← 4 (constant) 10 Second input of ALU ← 0-15 bits of IR sign ext. to 32b 11 Second input of ALU ← 0-15 bits of IR sign ext. and left shift
2 bits PCSource
00 ALU output (PC +4) sent to PC 01 ALUOut (branch target addr.) sent to PC 10 Jump address IR[0-25] shifted left 2 bits, concatenated with
PC+4[28-31], sent to PC
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 17
Control: Finite State Machine
Instruction decode and register fetch
Memory access instr.
R-type instr.
Branch instr.
Jump instr.
Start
Instruction fetch State 0
State 1
FSM-M FSM-R FSM-B FSM-J
Clock cycle 1
Clock cycle 2
Clock cycles
3-5
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 18
State 0: Instruction Fetch (CC1) PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Mem
ory
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20 21-25
IorD=0
MemtoReg ALUOp
=00
ALU
SrcB
=01
ALU
SrcA
=0
RegDst IRWrite
=1
RegWrite
MemWrite MemRead = 1
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource=00 PC
Writ
e e
tc.=
1
Add
26-31 to Control
FSM
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 19
State 0 Control FSM Outputs
MemRead =1 ALUSrcA = 0 IorD = 0 IRWrite = 1 ALUSrcB = 01 ALUOp = 00 PCWrite = 1 PCSource = 00
Start
State0 Instruction fetch
State 1 Instruction decode/ Register fetch/ Branch addr.
Outputs?
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 20
State 1: Instr. Decode/Reg. Fetch/ Branch Address (CC2)
PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Mem
ory
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20 21-25
IorD
MemtoReg ALUOp
= 00
ALU
SrcB
=11
ALU
SrcA
=0
RegDst IRWrite
RegWrite
MemWrite MemRead
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource PC
Writ
e e
tc.
Add
26-31 to Control
FSM
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 21
State 1 Control FSM Outputs
MemRead =1 ALUSrcA = 0 IorD = 0 IRWrite = 1 ALUSrcB = 01 ALUOp = 00 PCWrite = 1 PCSource = 00
Start State0 Instruction fetch (IF)
State 1 Instruction decode (ID) / Register fetch / Branch addr.
ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00
FSM-M FSM-R FSM-B FSM-J
Opcode = J-type Opcode = BEQ
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 22
State 1 (Opcode = lw) → FSM-M (CC3-5) PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20 21-25
IorD=1
MemtoReg=1 ALUOp
= 00
ALU
SrcB
=10
ALU
SrcA
=1
RegDst=0 IRWrite
RegWrite=1
MemWrite MemRead=1
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource PC
Writ
e e
tc.
Add
26-31 to Control
FSM M
emor
y CC3
CC4
CC5
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 23
State 1 (Opcode= sw)→FSM-M (CC3-4)
PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20 21-25
IorD=1
MemtoReg ALUOp
= 00
ALU
SrcB
=10
ALU
SrcA
=1
RegDst=0 IRWrite
RegWrite
MemWrite=1 MemRead
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource PC
Writ
e e
tc.
Add
26-31 to Control
FSM M
emor
y CC3
CC4
CC4
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 24
FSM-M (Memory Access)
ALUSrcA =1 ALUSrcB = 10 ALUOp = 00
MemRead = 1 IorD = 1
RegWrite = 1 MemtoReg = 1 RegDst = 0
MemWrite = 1 IorD = 1
Compute mem addrress
Read Memory data
Write memory
Write register
Opcode = “lw”
Opcode = “sw”
To state 0 (Instr. Fetch)
From state 1 Opcode = “lw” or “sw”
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 25
State 1(Opcode=R-type)→FSM-R (CC3-4) PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20 21-25
IorD
MemtoReg=0 ALUOp
= 10
ALU
SrcB
=00
ALU
SrcA
=1
RegDst=0 IRWrite
RegWrite
MemWrite MemRead
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource
PCW
rite
etc
.
26-31 to Control
FSM M
emor
y
“funct. code”
11-15 CC3
CC4
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 26
FSM-R (R-type Instruction)
ALUSrcA =1 ALUSrcB = 00 ALUOp = 10
RegWrite = 1 MemtoReg = 0 RegDst = 1
ALU operation
Write register
To state 0 (Instr. Fetch)
From state 1 Opcode = R-type
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 27
State 1 (Opcode = beq ) → FSM-B (CC3) PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
Reg
.
A R
eg.
B R
eg.
A
LU
Reg
iste
r file
Add
r.
Data
4
Sign extend
Shift left 2
ALU control 0-5
0-15
16-20
21-25
IorD
MemtoReg ALUOp
= 01
ALU
SrcB
=00
ALU
SrcA
=1
RegDst IRWrite
RegWrite
MemWrite MemRead
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource 01
PCW
rite
etc
.=1
26-31 to Control
FSM M
emor
y
subtract
11-15
zero
CC3
If(zero)
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 28
Write PC on “zero”
PCWrite etc.=1 PCWrite
PCWriteCond=1
zero=1
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 29
FSM-B (Branch)
ALUSrcA =1 ALUSrcB = 00 ALUOp = 01 PCWriteCond=1 PCSource=01
Branch condition: If A – B=0 zero = 1
To state 0 (Instr. Fetch)
From state 1 Opcode = “beq”
Write PC on branch condition
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 30
State 1 (Opcode = j) → FSM-J (CC3) PC
Inst
r. re
g. (I
R)
Mem
. Dat
a (M
DR
)
ALU
Out
R
eg.
A R
eg.
B R
eg.
Reg
iste
r file
Add
r.
Data
4
Sign extend
Shift left 2 ALU
control 0-5 0-15
16-20 21-25
IorD
MemtoReg ALUOp
ALU
SrcB
ALU
SrcA
RegDst IRWrite
RegWrite
MemWrite MemRead
in1 in2
out control MUX
Shift left 2 0-25
28-31
PCSource 10
PCW
rite
etc
.
26-31 to Control
FSM
Mem
ory
11-15
zero
ALU
CC3
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 31
Write PC
PCWrite etc.=1 PCWrite=1
PCWriteCond
zero
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 32
FSM-J (Jump)
PCWrite=1 PCSource=10
To state 0 (Instr. Fetch)
From state 1 Opcode = “jump”
Write jump addr. In PC
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 33
Control FSM
Instr. decode/reg. fetch/branch
addr.
ALU operation
Write PC on branch condition
Write memory
data
Write jump addr.
to PC
Write register
Read memory
data
Instr. fetch/
adv. PC
Compute memory
addr.
Write register
lw
sw
R B
J
Start State 0 1
2 3
4 5
6
7
8 9
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 34
Control FSM (Controller)
Combinational logic
FF
FF
FF
FF
16 control outputs
6 inputs (opcode)
Next state
Present state
Clock Reset
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 35
Designing the Control FSM
• Encode states; need 4 bits for 10 states, e.g., – State 0 is 0000, state 1 is 0001, and so on.
• Write a truth table for combina,onal logic: Opcode Present state Control signals Next state 000000 0000 0001000110000100 0001 . . . . . . . . . . . . . . . .
• Synthesize a logic circuit from the truth table. • Connect four flip-‐flops between the next state outputs and present
state inputs.
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 36
Block Diagram of a Processor
Datapath (PC, register file, registers, ALU)
Controller (Control FSM)
ALU control
Reset Clock
Mem. Addr. Mem. write data Mem. data out
MemWrite MemRead ALUOp
2-bits
func
t. [0
,5]
ALU
Op
3-bi
ts
Opc
ode
6-bi
ts
zer
o
Ove
rflo
w
ALU
SrcA
ALU
SrcB
2-
bits
PC
Sour
ce
2-bi
ts
Reg
Dst
Reg
Writ
e
Mem
toR
eg
IorD
IRW
rite
PCW
rite
PCW
riteC
ond
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 37
Exceptions or Interrupts
• Condi,ons under which the processor may produce incorrect result or may “hang”. – Illegal or undefined opcode. – Arithme,c overflow, divide by zero, etc. – Out of bounds memory address.
• EPC: 32-‐bit register holds the affected instruc,on address.
• Cause: 32-‐bit register holds an encoded excep,on type. For example, – 0 for undefined instruc,on – 1 for arithme,c overflow
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 38
Implementing Exceptions PC
Inst
r. re
g. (I
R)
4
ALU control
ALUOp =01
ALU
SrcB
=01
ALU
SrcA
=0
in1
in2
out control
MUX
PCSource 11
PCW
rite
etc
.=1
26-31 to Control
FSM
A
LU
Subtract
8000 0180(hex)
EPC
Cause
EPCWrite=1
0
1
CauseWrite=1 Overflow to Control FSM
32-bit register
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 39
How Long Does It Take? Again
• Assume control logic is fast and does not affect the cri,cal ,ming. Major ,me components are ALU, memory read/write, and register read/write.
• Time for hardware opera,ons, suppose • Memory read or write 2ns • Register read 1ns • ALU opera,on 2ns • Register write 1ns
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 40
Single-Cycle Datapath • R-‐type 6ns • Load word (I-‐type) 8ns • Store word (I-‐type) 7ns • Branch on equal (I-‐type) 5ns • Jump (J-‐type) 2ns • Clock cycle ,me = 8ns • Each instruc,on takes one cycle
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 41
Multicycle Datapath • Clock cycle ,me is determined by the longest opera,on, ALU or memory:
• Clock cycle ,me = 2ns
• Cycles per instruc,on (CPI): • lw 5 (10ns) • sw 4 (8ns) • R-‐type 4 (8ns) • beq 3 (6ns) • j 3 (6ns)
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 42
CPI of a Computer
∑k (Instructions of type k) × CPIk CPI = ——————————————
∑k (instructions of type k) where CPIk = Cycles for instruction of type k Note: CPI is dependent on the instruction mix of the
program being run. Standard benchmark programs are used for specifying the performance of CPUs.
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 43
Example
• Consider a program containing: • loads 25% • stores 10% • branches 11% • jumps 2% • Arithme,c 52%
• CPI = 0.25×5 + 0.10×4 + 0.11×3 + 0.02×3 + 0.52×4
= 4.12 for mul,cycle datapath • CPI = 1.00 for single-‐cycle datapath
CADSL 16 Apr 2013 Computer Architecture@IIT Mandi 44
Multicycle vs. Single-Cycle
Performance ratio = Single cycle time / Multicycle time
(CPI × cycle time) for single-cycle = ——————————————— (CPI × cycle time) for multicycle
1.00 × 8ns = ————— = 0.97 4.12 × 2ns
Single cycle is faster in this case, but remember, performance ratio depends on the instruction mix.
CADSL
Thank You
Computer Architecture@IIT Mandi 16 Apr 2013 45