first implementation of the datapath - biuu.cs.biu.ac.il/~wiseman/co/co3.pdf · 2001. 10. 10. · 3...
TRANSCRIPT
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-1י ב מ
3 Single-Cycle DatapathFirst implementation of the Datapath:
– One cycle for each instruction.– All combinational logic must stabilize within one clock cycle.– All state elements will be written exactly once at the end of the clock.– Simplified version; helps understand datapath operation.
● Architectural elements required for single cycle implementation:– Memories to hold instructions and data.– Register file.– ALU for basic arithmetic and logical operations.– Adders for computing instruction and jump addresses.– Multiplexors will be used to choose the correct resources for each instruction.☛ Note: Because everything must complete in one clock, we cannot use the ALU
for more than one operation. Extra adders required for computing addresses.
● Control requirements --– ALU, multiplexors.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-2י ב מ
Instructions review● R-type (register) instruction format:
– op: operation to be performed by the instruction.– rs: 1st register source operand.– rt: 2nd register source operand.– rd: destination register; gets result.– shamt: shift amount.– funct: selects operation more specifically than op does.
● Example: add $8, $17, $18
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
000000 10001 10010 01000 00000 100000
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-3י ב מ
● For load/store instructions we need a field which holds an address; 5 bits is too small.
● To preserve the principal of simplicity, we want all instructions to be the same length - 32 bits, so we have more than one instruction format. Op field distinguishes the instruction formats.
– I-type (immediate) instruction format
● Example: lw $8, Astart($19)
op rs rt
6 bits 5 bits 5 bits 16 bits
address
100011 10011 01000
6 bits 5 bits 5 bits 16 bits
Astart
Instructions review (Cont.)
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-4י ב מ
Instruction Formats Handled by the Datapath
0 rs rt rd shift n function
31-26 25-21 20-16 15-11 10-6 5-0
field
bit positions
R-type instruction
address35 or 43 rs rt
31-26 25-21 20-16 15-0
field
bit positions
Load Word (LW)/Store Word (SW) instruction
field
bit positions
Branch Equal (BEQ) instruction
4 rs rt
31-26 25-21 20-16 15-0
address
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-5י ב מ
The Singlecycle Datapath - Progressively
Readaddress
Instructionmemory
Instruction
PC
Add
4
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
ALUzero
resultInstruction
FIGURE 3.1 Portion of the datapath needed for fetching instructions and incrementing the program counter. The fetched instruction is used by other parts of the datapath.
FIGURE 3.2 The datapath for R-type instructions.. The ALU discussed in Chapter 2 can be controlled to provide all the basic ALU functions required for R-type instructions.
Registers
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-6י ב מ
The Singlecycle Datapath - Progressively
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
Registers ALUzero
resultInstruction
FIGURE 3.3 The datapath for a load or store that does a register access. It is followed by a memory address calculation, then a read or write from memory and a write into the register file if the instruction is a load.
FIGURE 3.4 The datapath for a branch uses an ALU for evaluation of the branch condition and a separate adder for computing the branch target as the sum of the incremented PC and the sign-extended, lower, 16 bits of the instruction (the branch displacement), shifted left 2 bits. The unit labeled shift left 2 performs the shift adding 00two to the bottom of the sign-extended offset field. Since we know that the offset was sign-extended from 16 bits, the shift will throw away only "sign bits". Control logic is used to decide whether the incremented PC or branch target should replace the PC, based on the Zero output of the ALU.
ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
signextend
16 32
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
Registers ALUzero
result
Instruction
signextend
16 32
Addersum
shiftleft 2
Branchtarget
To branch control logic
PC+4 from instruction datapath
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-7י ב מ
The Singlecycle Datapath - Progressively
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
ALUzero
result
Instruction Registers ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
0Mux1
signextend
16 32
1Mux0
FIGURE 3.5 Combining the datapaths for the memory instructions and the R-type instructions. This example shows how a single datapath can be assembled from the pieces. The multiplexors and their connections are highlighted.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-8י ב מ
The Singlecycle Datapath - Progressively
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
ALUzero
result
Registers ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
0Mux1
signextend
16 32
1Mux0
FIGURE 3.6 The instruction fetch portion of the datapath from Figure 3.1 is appended to the datapath of Figure 3.5 that handles memory and ALU instructions The addition is highlighted. The result is a datapath that supports many operations of the MIPS instruction set -- branches and jumps are the major missing pieces.
Readaddress
Instructionmemory
Instruction
PC
Add
4
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-9י ב מ
The Singlecycle Datapath - Progressively
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
ALUzero
result
Registers ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
0Mux1
signextend
16 32
1Mux0
FIGURE 3.7 The simple datapath for the MIPS architecture combines the elements required by the different instruction classes. This datapath can execute the basic instructions (load/store word, ALU operations and branches) in a single clock cycle. The additions to Figure 3.6, which are needed to implement branches, are highlighted.
Readaddress
Instructionmemory
Instruction
PC
Add
4Adder
sum
shiftleft 2
0Mux1
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-10י ב מ
ALU Functions
● We implement a subset of the MIPS ISA -- 8 instructions; which cover all important types.
– ALUop is a new control signal we create to decide what ALU operation is required.
– Ld, Str and BEQ use ALUop to choose Add or Subtract; others use Func code from MIPS instruction.
ALU control input operation000001010110111
AND
OR
Add
Subtract
Set on less than
OP ALUop Func ALU action ALU controlLoad 00 xxxxxx Add 010Store 00 xxxxxx Add 010BEQ 01 xxxxxx Sub 110Add 10 100000 Add 010Sub 10 100010 Sub 110AND 10 100100 AND 000OR 10 100101 OR 001Slt 10 101010 Sub/shift 111
from MIPS instruction
new logic signals
MIPS instruction op-codeto ALU control lines
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-11י ב מ
Logic for ALU Control
● We must generate the logic signals A2, A1, A0 for controlling the ALU. ALUop will be derived later.
OP ALUop Func A2 A1 A0Load 00 xxxxxx 0 1 0Store 00 xxxxxx 0 1 0BEQ 01 xxxxxx 1 1 0Add 10 100000 0 1 0Sub 10 100010 1 1 0AND 10 100100 0 0 0OR 10 100101 0 0 1Slt 10 101010 1 1 1
01234567
1 0 5 4 3 2 1 0 ALU control linesALU F
A0 = (ALU1 ⋅⋅⋅⋅ ALU0 ⋅⋅⋅⋅ F5 ⋅⋅⋅⋅ F4 ⋅⋅⋅⋅ F3 ⋅⋅⋅⋅ F2 ⋅⋅⋅⋅ F1 ⋅⋅⋅⋅ F0) + (ALU1 ⋅⋅⋅⋅ ALU0 ⋅⋅⋅⋅ F5 ⋅⋅⋅⋅ F4 ⋅⋅⋅⋅ F3 ⋅⋅⋅⋅ F2 ⋅⋅⋅⋅ F1 ⋅⋅⋅⋅ F0)A1 = An exercise.A2 = (ALU1 ⋅⋅⋅⋅ ALU0) + (ALU1 ⋅⋅⋅⋅ ALU0 ⋅⋅⋅⋅ F5 ⋅⋅⋅⋅ F4 ⋅⋅⋅⋅ F3 ⋅⋅⋅⋅ F2 ⋅⋅⋅⋅ F1 ⋅⋅⋅⋅ F0) + (ALU1 ⋅⋅⋅⋅ ALU0 ⋅⋅⋅⋅ F5 ⋅⋅⋅⋅ F4 ⋅⋅⋅⋅ F3 ⋅⋅⋅⋅ F2 ⋅⋅⋅⋅ F1 ⋅⋅⋅⋅ F0)
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-12י ב מ
Implementation of ALU ControlF0F1F2F3F4F5
ALU0ALU1
A2
2
A0
4 6 7
A1 is left as an exercise.
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-13י ב מ
The Singlecycle Datapath without Control
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
ALUzero
result
Registers ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
0Mux1
signextend
16 32
1Mux0
FIGURE 3.8 The datapath of Figure 3.7 with all necessary multiplexors and all control lines identified. The ALU control block has also been added.
Readaddress
Instructionmemory
Instruction
PC
Add
4Adder
sum
shiftleft 2
0Mux1
Instruction [25-21]
Instruction [20-16]
Instruction [15-11]
0Mux1
Instruction [15-0]ALU
control
MemWrite
ALUSrc
RegDst
RegWrite
PCSrc
MemtoReg
ALUOp
Instruction [5-0]
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-14י ב מ
Control Line Functions
MemWrite Contents of memory at the write address are replaced by value in write data input
ALUSrc The second ALU operand is the sign-extended lower 16 bits of instruction
RegDst The destination reg number for the Write register comes from the rd field
RegWrite Register whose number is in Write register input is written with value of write data input.
PCSrc The PC is replaced by the output of a special adder that computes branch target address
MemtoReg The value to register write data input comes from data memory
Signal name
None
The second ALU operand comes from the second register file output
The destination reg number for the write register comes from rt field
None
The PC is replaced by the output of the adder: PC+4
The value to register write data comes from the ALU
Effect when "off" Effect when "on"
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-15י ב מ
The Singlecycle Datapath
signextend
0
1
ControlInstruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-11]
Instruction [15-0] 16
ALUcontrol
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
0
1
32
ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
Registers
1
0
ALUzero
result
shiftleft 2
Add
0
Mux
1
Readaddress
Instructionmemory
Instruction[31-0]
PC
Add
4RegDst
BranchMemtoRegALUop
MemWriteALUSrc
RegWrite
2
Mux
Mux
Mux
3
Instruction [5-0]
PCSrc
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-16י ב מ
MIPS op-codes for the 3 instruction classes in our datapath
R-type 0 0 0 0 0 0 0
LW 35 1 0 0 0 1 1
SW 43 1 0 1 0 1 1
BEQ 4 0 0 0 1 0 0
op-codein decimal
op-code in binaryop5 op4 op3 op2 op1 op0
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-17י ב מ
Op5 0 1 1 0Op4 0 0 0 0Op3 0 0 1 0Op2 0 0 0 1Op1 0 1 1 0Op0 0 1 1 0
RegDst 1 0 x xALUsrc 0 1 1 0MemtoReg 0 1 x xRegWrite 1 1 0 0MemWrite 0 0 1 0Branch 0 0 0 1ALUOp1 1 0 0 0ALUOp0 0 0 0 1
For the function of 6 inputs defined by the op-code, the table shows exactly those truth table rows that are true for each of the 9 control lines.
Inputs
Outputs
R-type LW SW BEQ
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-18י ב מ
Implementation of Control LogicOp5Op4Op3Op2Op1Op0
RegDstALUsrcMemtoReg
RegWrite
MemWriteBranchALUop1ALUop0
Outputs
InputsR-type LW SW BEQ
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-19י ב מ
Logic Flow -- R-type InstructionAdd $x, $y, $z
1) The instruction is fetched from memory; PC is incremented.
2) Source operand registers $y and $z are read from the register file; control lines determine how the register file is read.
3) ALU operates on the source operands; control lines set by control unit + function code.
4) ALU result is written to register file using bits 15-11 of the instruction register to select the correct destination register $x.
rd rs rt
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-20י ב מ
R-type Instructions Flow
signextend
0
1
ControlInstruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-11]
Instruction [15-0] 16
ALUcontrol
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
0
1
32
ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
Registers
1
0
ALUzero
result
shiftleft 2
Add
0
Mux
1
Readaddress
Instructionmemory
Instruction[31-0]
PC
Add
4RegDst (1)
Branch (0)MemtoReg (0)ALUop (10) - R-TYPE
MemWrite (0)ALUSrc (0)
RegWrite (1)
2
Mux
Mux
Mux
0
0
0
00
1
1
10
3
Instruction [5-0]
PCSrc
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-21י ב מ
Logic Flow -- Ld/Str InstructionLW $x, base ($y)
1) The instruction is fetched from memory; PC is incremented.
2) The source register value $y is read from the register file; control lines determine how the register file is read.
3) ALU computes the sum of $y and the base value from instruction bits 0-15 sign extended to 32 bits.
4) Result of address computation goes to data memory.
5) Data from data memory written to register file; bits 20-16 of instruction specify destination register number $x.
rt rs
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-22י ב מ
Ld/Str Instructions Flow
signextend
0
1
ControlInstruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-11]
Instruction [15-0] 16
ALUcontrol
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
0
1
32
ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
Registers
1
0
ALUzero
result
shiftleft 2
Add
0
Mux
1
Readaddress
Instructionmemory
Instruction[31-0]
PC
Add
4
RegDst (0)
Branch (0)MemtoReg (1)ALUop (00) - ADD
MemWrite (0/1)ALUSrc (1)
RegWrite (1/0)
2
Mux
Mux
Mux
0
0
0 - Load1 - Store
11
1/0
0
00
3010
Instruction [5-0]
PCSrc
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-23י ב מ
Logic Flow -- Branch InstructionBEQ $x, $y, offset
1) The instruction is fetched from memory; PC is incremented.
2) Source operand registers $x and $y are read from the register file; control lines determine how the register file is read.
3) ALU subtracts the source operands to determine if zero.
4) PC+4 is added to bits 0-15 of instruction (offset) which was shifted left 2 places and sign extended to 32 bits. Result is branch target address if taken.
5) Zero condition from ALU determines which next address to write into PC.
rs rt
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-24י ב מ
Branch Instruction Flow
signextend
0
1
ControlInstruction [31-26]
Instruction [25-21]
Instruction [20-16]
Instruction [15-11]
Instruction [15-0] 16
ALUcontrol
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata 1
Readdata 2
0
1
32
ReadAddress
WriteAddress
Writedata
Readdata
DataMemory
Registers
1
0
ALUzero
result
shiftleft 2
Add
0
Mux
1
Readaddress
Instructionmemory
Instruction[31-0]
PC
Add
4RegDst (x)
Branch (1)MemtoReg (x)ALUop (01) - SUB
MemWrite (0)ALUSrc (0)
RegWrite (0)
2
Mux
Mux
Mux
1
0
0
0
01
3110
Instruction [5-0]
PCSrc
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-25י ב מ
The Problems with a Single Cycle Datapath
3. CPU Time = IC X CPI X Clock Cycle Time● CPI = 1.● IC is the same for all implementations of MIPS ISA.● Everything depends on clock cycle Time.
I-type stage1 stage2 stage3 stage4 stage5 total (ns)R - type I-fetch regs ALU regs 38Load I-fetch regs ALU mem regs 48Store I-fetch regs ALU mem 39Branch I-fetch regs ALU 29Jump I-fetch regs ALU 29
table assumes ALU, Adders - 10ns; Memory - 10ns; register file - 9ns
1. Need for multiple ALUs, adders, etc.
2. Memory split (to data and instructions).
Example
Copyright 1995 by Coherence LTD., all rights reserved (Revised: Oct 97 by Rafi Lohev, Oct 99 by Yair Wiseman)IBM3-26י ב מ
Assume the following dynamic instruction counts for a typical program (gcc compiler)
Loads 22Stores 11R-type 49Branch 16Jump 2
I-type %
Using table of instruction delays from the previous slide, calculate actual average time per instruction for gcc.
(.49 x 38ns) + (.22 x 48ns) + (.11 x 39ns) + (.16 x 29ns) + (.02 x 29ns) = 38.69ns
For single cycle implementation we must use longest period = 48ns;slowdown = approx. 48/38.69 = 24%. Floating point or other longer instructions would make the slowdown much worse.