cs1104: computer organisation cs1104 cs1104 school of computing national university of singapore
TRANSCRIPT
CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104
School of ComputingNational University of
Singapore
CS1104-P2-6 Processor: Datapath and Control 2
PII Lecture 6: Processor: Datapath and Control
Datapath: Single-bus Organization Multiple-bus Organization
MIPS: Multicycle Datapath and Control Stages of Instructions Datapath Walkthroughs
Processor and Logic Design
CS1104-P2-6 Processor: Datapath and Control 3
PII Lecture 6: Processor: Datapath and Control
Reading: Chapter 9 of textbook, which is Chapter 7 in
“Computer Organization” by Hamacher, Vranesic and Zaky.
Optional reading: Chapter 5 in “Computer Organization & Design” by Patterson and Hennessy.
CS1104-P2-6 Processor: Datapath and Control 5
Recap: Organisation
Processor
Control
Datapath
Memory Devices
Input
Output
Cache
Registers
Bus
CS1104-P2-6 Processor: Datapath and Control 6
Fundamental Concepts
Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making).
Datapath: portion of the processor which contains hardware necessary to perform all operations required by the computer (the brawn).
Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain).
CS1104-P2-6 Processor: Datapath and Control 7
Fundamental Concepts (2)
Instruction execution cycle: fetch, decode, execute. Fetch: fetch next
instruction (using PC) from memory into IR.
Decode: decode the instruction.
Execute: execute instruction.
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
CS1104-P2-6 Processor: Datapath and Control 8
Fundamental Concepts (3)
Fetch: Fetch next instruction into IR (Instruction Register). Assume each word is 4 bytes and each instruction
is stored in a word, and that the memory is byte addressable.
PC (Program Counter) contains address of next instruction.
IR [[PC]]PC [PC] + 4
CS1104-P2-6 Processor: Datapath and Control 9
Single-bus Organization
Data line
Address line
PC
MAR
MDR
Y
Internal processor bus
Memory bus
Z
MUX
A
ALU
B
Constant 4
Select
AddSub
XOR
:ALU
control lines Carry-in
IR
RO
R(n–1)
::
TEMP
Instruction decoder
and control logic
. . .
Control signals
CS1104-P2-6 Processor: Datapath and Control 10
Instruction Execution
An instruction can be executed by performing one or more of the following operations in some specified sequence: Transfer a word of data from one register to
another or to the ALU (Arithmetic Logic Unit). Perform an arithmetic or a logic operation and
store the result in a register. Fetch the contents of a given memory location and
load them into a register. Store a word of data from a register into a given
memory location.
CS1104-P2-6 Processor: Datapath and Control 11
Register Transfer
Register to register transfer: For each register Ri, two control signals:
Riin used to load the data on the bus into the register.
Riout to place the register’s contents on the bus.
Example: To transfer contents of R1 to R4: Set R1out to 1. This places contents of R1 on the bus.
Set R4in to 1. This loads data from the processor bus into R4.
CS1104-P2-6 Processor: Datapath and Control 12
Register Transfer (2)
Y
Internal processor bus
Z
MUX
A
ALU
B
Constant 4
Select
Ri
X
Ri in
X
Ri out
Y in
X
X
Z in
Z out
X
CS1104-P2-6 Processor: Datapath and Control 13
Arithmetic/Logic Operation
ALU: Performs arithmetic and logic operations on its A and B inputs.
To perform R3 [R1] + [R2]:1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Y
Internal processor bus
Z
MUX
A
ALU
B
Constant 4
Select
Ri
X
Ri in
X
Ri out
Y in
X
X
Z in
Z out
X
CS1104-P2-6 Processor: Datapath and Control 14
Arithmetic/Logic Operation (2)
If there are n operations, do we need n ALU control lines?
We could use encoding, which requires log2 n control lines for n operations. However, this will increase complexity and hardware (additional decoder needed).
A
ALU
BAddSub
XOR
:ALU
control lines Carry-in
CS1104-P2-6 Processor: Datapath and Control 15
Reading a Word from Memory
Move (R1), R2 /* R2 [[R1]]1. MAR [R1]2. Start a Read operation on the memory bus3. Wait for the MFC response from the memory4. Load MDR from the memory bus5. R2 [MDR]
MDR has four control signals: MDRin, MDRout, MDRinE and MDRoutE. Memory-bus
data lines
MDR
X
MDR inE
X
MDR outE
Internal processor bus
X
MDR in
X
MDR out
CS1104-P2-6 Processor: Datapath and Control 16
Reading a Word from Memory (2)
Move (R1), R2 /* R2 [[R1]]
Sequence of control steps:1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
WMFC: Wait for arrival of MFC (Memory-Function-Completed) signal.
MFC: To accommodate variability in response time, the processor waits until it receives an indication that the Read/Write operation has been completed. The addressed device sets MFC to 1 to indicate this.
CS1104-P2-6 Processor: Datapath and Control 17
Storing a Word in Memory
Move R2, (R1) /* [R1] [R2] Sequence of control steps:
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
CS1104-P2-6 Processor: Datapath and Control 18
Executing a Complete Instruction
Add (R3), R1 /* R1 [R1] + [[R3]] Adds the contents of a memory location pointed to by
R3 to register R1. Sequence of control steps:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End
Steps 1 – 3: Instruction
fetch
CS1104-P2-6 Processor: Datapath and Control 19
Multiple-Bus Organization
Single-bus structure: Control sequences are long as only one data item can be transferred over the bus in a clock cycle.
Figure on next slide shows a three-bus structure. All registers are combined into a single block called
register file with three ports: 2 outputs allowing 2 registers to be accessed simultaneously and have their contents put on buses A and B, and 1 input allowing data on bus C to be loaded into a third register.
Buses A and B are used to transfer source operands to the A and B inputs of ALU, and result transferred to destination over bus C.
CS1104-P2-6 Processor: Datapath and Control 20
Multiple-Bus Organization (2)Bus C
Constant 4
Bus A Bus B
PC
Register file
MU
X
Incrementer
AALU
B
R
Address line
Memory bus data lines
Bus CBus A Bus B
MAR
MDR
IR
Instruction decoder
CS1104-P2-6 Processor: Datapath and Control 21
Multiple-Bus Organization (3)
For the ALU, R=A (or R=B) means that its A (or B) input is passed unmodified to bus C.
Add R4, R5, R6 /* R6 [R4] + [R5] Adds the contents of R4 and R5 to R6.
Sequence of control steps:1. PCout, R=B, MARin, Read, IncPC
2. WMFC
3. MDRoutB, R=B, IRin
4. R4outA, R5outB, SelectA, Add, R6in, End
CS1104-P2-6 Processor: Datapath and Control 22
Control
Hardwired control or microprogrammed control. Hardwired control:
Memory bus data lines
Control signals
Clock
. . .
CLK
::
:
:
. . .
IRDecoder/ encoder
External inputs
Condition codes
Control step counter
CS1104-P2-6 Processor: Datapath and Control 23
Control (2)
Microprogrammed control: Control signals generated by a program. Control word (CW) is a microinstruction that contains
individual bits that represent the various control signals. Vertical organization: highly encoded schemes that use
compact codes to specify only a small number of control functions in each microinstruction.
Horizontal organization: minimally encoded scheme in which many resources can be controlled with a single microinstructions.
Popular in Complex Instruction Set Architectures (CISC) because complex instruction sets require complex controllers that can more easily be implemented as microprograms.
Memory bus data lines
CS1104-P2-6 Processor: Datapath and Control 24
Control (3)
Example of a horizontal organization scheme:
Memory bus data lines
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. R1out, Yin, WMFC
6. MDRout, SelectY, Add, Zin
7. Zout, R1in, End
1
2
3
4
5
6
7
0
1
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
1
0
0
0
0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
1
PC
in
PC
ou
t
End
MA
Rin
Rea
d
IRjn
Yin
Sel
ect
MD
Ro
ut
Zo
ut
Zin
R1
ou
t
R1
in
Add
R3
ou
t
WM
FC
Mic
ro-
inst
ruct
ion
.. ..
Select=0: SelectYSelect=1: Select4
CS1104-P2-6 Processor: Datapath and Control 25
MIPS: Multicycle Datapath and Control
Adapted from D. Patterson’s CS61C
http://www.cs.berkeley.edu/~pattrsn/61CF00
Copyright 2000 UCB
CS1104-P2-6 Processor: Datapath and Control 26
Stages of a Datapath
Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient.
Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath. Smaller stages are easier to design. Easy to optimize (change) one stage without
touching the others.
CS1104-P2-6 Processor: Datapath and Control 27
Stages of a Datapath (2)
There is a wide variety of MIPS instructions: so what general steps do they have in common?
Stages 1. Instruction Fetch2. Instruction Decode3. ALU4. Memory Access5. Register Write
CS1104-P2-6 Processor: Datapath and Control 28
Stages of a Datapath (3)
Stage 1: Instruction Fetch. No matter what the instruction is, the 32-bit
instruction word must first be fetched from memory (the cache-memory hierarchy).
Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction; byte addressing so + 4).
CS1104-P2-6 Processor: Datapath and Control 29
Stages of a Datapath (4)
Stage 2: Instruction Decode Upon fetching the instruction, we next gather data
from the fields (decode all necessary instruction data).
First, read the opcode to determine instruction type and field lengths.
Second, read in data from all necessary registers. For add, read two registers. For addi, read one register. For jal, no read necessary.
CS1104-P2-6 Processor: Datapath and Control 30
Stages of a Datapath (5)
Stage 3: ALU (Arithmetic-Logic Unit) The real work of most instructions is done here:
arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt).
What about loads and stores? lw $t0, 40($t1) The address we are accessing in memory =
the value in $t1 plus the value 40. We do this addition at this stage.
CS1104-P2-6 Processor: Datapath and Control 31
Stages of a Datapath (6)
Stage 4: Memory Access Actually only the load and store instructions do
anything during this stage; for the other instructions, they remain idle during this stage.
Since these instructions have a unique step, we need this extra stage to account for them.
As a result of the cache system, this stage is expected to be just as fast (on average) as the others.
CS1104-P2-6 Processor: Datapath and Control 32
Stages of a Datapath (7)
Stage 5: Register Write Most instructions write the result of some
computation into a register. Examples: arithmetic, logical, shifts, loads, slt What about stores, branches, jumps?
They do not write anything into a register at the end.
These remain idle during this fifth stage.
CS1104-P2-6 Processor: Datapath and Control 33
Datapath: Generic Steps
PC
inst
ruct
ion
me
mor
y+4
rtrs
rd
regi
ste
rs
ALU
Da
tam
em
ory
imm
1. InstructionFetch
2. Decode/ Register
Read
3. Execute 4. Memory 5. Reg. Write
CS1104-P2-6 Processor: Datapath and Control 34
Datapath Walkthroughs: add
add $r3,$r1,$r2 # r3 = r1+r2 Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find that it is an add
instruction, then read registers $r1 and $r2. Stage 3: Add the two values retrieved in stage 2. Stage 4: Idle (nothing to write to memory). Stage 5: Write result of stage 3 into register $r3.
CS1104-P2-6 Processor: Datapath and Control 35
Datapath Walkthroughs: add (2)
PC
inst
ruct
ion
me
mor
y
+4
regi
ste
rs
ALU
Da
tam
em
ory
imm
2
1
3
ad
d r
3, r
1, r
2
reg[1]+reg[2]
reg[2]
reg[1]
CS1104-P2-6 Processor: Datapath and Control 36
Datapath Walkthroughs: slti
slti $r3,$r1,17 Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is an slti, then read
register $r1. Stage 3: Compare value retrieved in stage 2 with
the integer 17. Stage 4: Go idle. Stage 5: Write the result of stage 3 in register $r3.
CS1104-P2-6 Processor: Datapath and Control 37
Datapath Walkthroughs: slti (2)
PC
inst
ruct
ion
me
mor
y
+4
regi
ste
rs
ALU
Da
tam
em
ory
imm
3
1
x
slt
i r3
, r1
, 17
reg[1]-17
17
reg[1]
CS1104-P2-6 Processor: Datapath and Control 38
Datapath Walkthroughs: sw
sw $r3, 20($r1) Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is an sw, then read
registers $r1 and $r3. Stage 3: Add 20 to value in register $r1 (retrieved
in stage 2). Stage 4: Write value in register $r3 (retrieved in
stage 2) into memory address computed in stage 3.
Stage 5: Go idle (nothing to write into a register).
CS1104-P2-6 Processor: Datapath and Control 39
Datapath Walkthroughs: sw (2)
PC
inst
ruct
ion
me
mor
y
+4
regi
ste
rs
ALU
Da
tam
em
ory
imm
3
1
x
sw
r3
, 20
(r1)
reg[1]+20
20
reg[1]
ME
M[r
1+
20]<
-r3
reg[3]
CS1104-P2-6 Processor: Datapath and Control 40
Why Five Stages?
Could we have a different number of stages? Yes, and other architectures do.
So why does MIPS have five stages, if instructions tend to go idle for at least one stage? There is one instruction that uses all five stages:
the load.
CS1104-P2-6 Processor: Datapath and Control 41
Datapath Walkthroughs: lw
lw $r3, 40($r1) Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is a lw, then read
register $r1. Stage 3: Add 40 to value in register $r1 (retrieved
in stage 2). Stage 4: Read value from memory address
compute in stage 3. Stage 5: Write value found in stage 4 into register $r3.
CS1104-P2-6 Processor: Datapath and Control 42
Datapath Walkthroughs: lw (2)
PC
inst
ruct
ion
me
mor
y
+4
regi
ste
rs
ALU
Da
tam
em
ory
imm
3
1
x
lw r
3, 4
0(r
1)
reg[1]+40
40
reg[1]
r3<
-ME
M[r
1+
40
]
reg[3]
CS1104-P2-6 Processor: Datapath and Control 43
What Hardware Is Needed?
PC: a register which keeps track of address of the next instruction.
General Purpose Registers Used in stages 2 (read) and 5 (write). We are currently working with 32 of these.
Memory Used in stages 1 (fetch) and 4 (R/W). Cache system makes these two stages as fast as
the others, on average.
CS1104-P2-6 Processor: Datapath and Control 44
Datapath: Summary
Construct datapath based on register transfers required to perform instructions.
Control part causes the right transfers to happen.P
C
inst
ruct
ion
me
mor
y
+4
rtrs
rd
regi
ste
rs
ALU
Da
tam
em
ory
imm
Controller
opcode, funct
CS1104-P2-6 Processor: Datapath and Control 45
Where is Logic Design Used?
Combinational circuits for ALU and other parts of the datapath.
Different control signals are needed for different clock cycles and different instructions for the ALU, registers and other parts of the datapath. Sequential circuits.
ALU
ALU Control
CS1104-P2-6 Processor: Datapath and Control 46
Where is Logic Design Used? (2)
High-level view of finite state machine control. Sequential logic design can be used to assert the
correct control signals at the correct times.
Start
Instruction fetch/decode and register fetch
Memory accessinstructions
R-type instructions
Branchinstruction
Jumpinstruction
CS1104-P2-6 Processor: Datapath and Control 47
Summary
Datapath is the hardware that performs operations necessary to execute programs.
Control instructs datapath on what to do next. Datapath needs:
access to storage (general purpose registers and memory)
computational ability (ALU) helper hardware (local registers and PC)
CS1104-P2-6 Processor: Datapath and Control 48
Summary (2)
Five stages of datapath (executing an instruction): 1: Instruction Fetch (Increment PC) 2: Instruction Decode (Read Registers) 3: ALU (Computation) 4: Memory Access 5: Write to Registers
• ALL instructions must go through ALL five stages.
• Datapath designed in hardware.