single cycle processing
TRANSCRIPT
Slide 1
Mid Term Exam1KICT, IIUM
MIPS Programming Syllabus: Lecture 1 ~ 5 Date and Day: 13/03/2017 Monday Time: 2.00 PM ~ 3.00 PM Venue: Class Room QuestionsQ1. (a) to (e) 5 4 = 20Q2. (a) to (e) 5 4 = 20
1
CSC 3402M.M. Hafizur Rahman Single Cycle Processor DesignLecture 6Office: C-5-15, KICT, IIUMEmail: [email protected]
2
Performance3KICT, IIUM
Single Cycle Processor DesignRecall, performance is determined by:Instruction countClock cycles per instruction (CPI)Clock cycle timeProcessor design will affectClock cycles per instructionClock cycle timeSingle cycle datapath and control design:Advantage: One clock cycle per instructionDisadvantage: long cycle time
I-CountCPICycle
3
Step by Step Design of a Processor4KICT, IIUM
Single Cycle Processor DesignAnalyze instruction set => datapath requirementsThe meaning of each instruction is given by the register transfersDatapath must include storage elements for ISA registersDatapath must support each register transferSelect datapath components and clocking methodologyAssemble datapath meeting the requirementsAnalyze implementation of each instructionDetermine the setting of control signals for register transferAssemble the control logic
4
MIPS Instruction5KICT, IIUM
Single Cycle Processor DesignAll instructions are 32-bit wideThree instruction formats: R-type, I-type, and J-type
Op6: 6-bit opcode of the instructionRs5, Rt5, Rd5: 5-bit source and destination register numberssa5: 5-bit shift amount used by shift instructionsfunct6: 6-bit function field for R-type instructionsimmediate16: 16-bit immediate value or address offsetimmediate26: 26-bit target address of the jump instructionOp6Rs5Rt5Rd5funct6sa5Op6Rs5Rt5immediate16Op6immediate26
5
MIPS Instruction6KICT, IIUM
Single Cycle Processor DesignOnly a subset of the MIPS instructions are consideredALU instructions (R-type): add, sub, and, or, xor, sltImmediate instructions (I-type): addi, slti, andi, ori, xoriLoad and Store (I-type): lw, swBranch (I-type): beq, bneJump (J-type): jThis subset does not include all instructionsSufficient to illustrate design of datapath and controlConcepts used to implement the MIPS subset are used to construct a broad spectrum of computers
6
MIPS Instruction7KICT, IIUM
Single Cycle Processor DesignInstructionMeaningFormataddRd, Rs, Rtadditionop6 = 0Rs5Rt5Rd500x20subRd, Rs, Rtsubtractionop6 = 0Rs5Rt5Rd500x22andRd, Rs, Rtbitwise andop6 = 0Rs5Rt5Rd500x24orRd, Rs, Rtbitwise orop6 = 0Rs5Rt5Rd500x25xorRd, Rs, Rtexclusive orop6 = 0Rs5Rt5Rd500x26sltRd, Rs, Rtset on less thanop6 = 0Rs5Rt5Rd500x2aaddiRt, Rs, Im16add Immediate0x08Rs5Rt5Im16sltiRt, Rs, Im16slt Immediate0x0aRs5Rt5Im16andiRt, Rs, Im16and Immediate0x0cRs5Rt5Im16oriRt, Rs, Im16or Immediate0x0dRs5Rt5Im16xoriRt, Im16xor Immediate0x0eRs5Rt5Im16lwRt, Im16(Rs)load woRd0x23Rs5Rt5Im16swRt, Im16(Rs)store woRd0x2bRs5Rt5Im16beqRs, Rt, Im16branch if equal0x04Rs5Rt5Im16bneRs, Rt, Im16branch not equal0x05Rs5Rt5Im16jIm26jump0x02Im26
7
Processor Implementation8KICT, IIUM
Single Cycle Processor DesignSingle Cycleperform each instruction in 1 clock cycleclock cycle must be long enough for slowest instructiondisadvantage: only as fast as slowest instructionMulti-Cyclebreak fetch/execute cycle into multiple stepsperform 1 step in each clock cycleadvantage: each instruction uses only as many cycles as it needsPipelinedexecute each instruction in multiple stepsperform 1 step / instruction in each clock cycleprocess multiple instructions in parallel
8
Register Transfer Level9KICT, IIUM
Single Cycle Processor DesignRTL is a description of data flow between registersRTL gives a meaning to the instructionsAll instructions are fetched from memory at address PCInstruction RTL DescriptionADDReg(Rd)Reg(Rs) + Reg(Rt);PC PC + 4SUBReg(Rd)Reg(Rs) Reg(Rt);PC PC + 4ORIReg(Rt)Reg(Rs) | zero_ext(Im16); PC PC + 4LWReg(Rt)MEM[Reg(Rs) + sign_ext(Im16)]; PC PC + 4SWMEM[Reg(Rs) + sign_ext(Im16)] Reg(Rt); PC PC + 4BEQif (Reg(Rs) == Reg(Rt))PC PC + 4 + 4 sign_extend(Im16)elsePC PC + 4
9
Instruction Execution10KICT, IIUM
Single Cycle Processor DesignR-typeFetch instruction:Instruction MEM[PC]Fetch operands:data1 Reg(Rs), data2 Reg(Rt)Execute operation:ALU_result func(data1, data2)Write ALU result:Reg(Rd) ALU_resultNext PC address:PC PC + 4
I-typeFetch instruction:Instruction MEM[PC]Fetch operands:data1 Reg(Rs), data2 Extend(imm16)Execute operation:ALU_result op(data1, data2)Write ALU result:Reg(Rt) ALU_resultNext PC address:PC PC + 4
BEQFetch instruction:Instruction MEM[PC]Fetch operands:data1 Reg(Rs), data2 Reg(Rt)Equality:zero subtract(data1, data2) Branch:if (zero)PC PC + 4 (1 + sign_ext(imm16)elsePC PC + 4
10
Instruction Execution11KICT, IIUM
Single Cycle Processor DesignLWFetch instruction:Instruction MEM[PC]Fetch base register:base Reg(Rs)Calculate address:address base + sign_extend(imm16)Read memory:data MEM[address]Write register Rt:Reg(Rt) dataNext PC address:PC PC + 4SWFetch instruction:Instruction MEM[PC]Fetch registers:base Reg(Rs), data Reg(Rt)Calculate address:address base + sign_extend(imm16)Write memory:MEM[address] dataNext PC address:PC PC + 4JumpFetch instruction:Instruction MEM[PC]Target PC address:target PC[31:28] || Imm26 || 00Jump:PC target
concatenation
11
What do we need?12KICT, IIUM
Single Cycle Processor Design
Two types of functional hardware elements are needed:elements that operate on data (called combinational elements)elements that contain data (called sequential or state elements)
12
Fetch and Execute Cycle13KICT, IIUM
Single Cycle Processor DesignAbstraction of fetch/execute implementationuse the PC to read instruction addressfetch the instruction from memory and increment PCuse fields of the instruction to select registers to readexecute depending on the instructionrepeat
13
Datapath and Control14KICT, IIUM
Single Cycle Processor Design
Status
ControllerControlDatapath
14
Basic Hardware15KICT, IIUM
Single Cycle Processor Design
15
Truth Table and Simplification16KICT, IIUM
Problem: Consider logic functions with three inputs: A, B, C.
output D is true if at least one input is trueoutput E is true if exactly two inputs are trueoutput F is true only if all three inputs are true
Show the truth table for these three functionsShow the Boolean equations for these three functionsShow an implementation consisting of AND-OR-NOT gate.Single Cycle Processor Design
16
A Simple Multifunction Logic Unit17KICT, IIUM
To warm up let's build a logic unit to support the AND & OR instructions for MIPS (32-bit registers)we'll just build a 1-bit unit and use 32 of them
Possible implementation using a multiplexor :
aboutputoperationselector
...32 units
Single Cycle Processor Design
17
Using Multiplexor 18KICT, IIUM
Selects one of the inputs to be the output based on a control input
Lets build our ALU using a MUX (multiplexor):
Single Cycle Processor Design
18
Implementation19KICT, IIUM
Not easy to decide the best way to implement somethingdo not want too many inputs to a single gatedo not want to have to go through too many gates (= levels)for our purposes, ease of comprehension is importantLet's look at a 1-bit ALU for addition:
How could we build a 1-bit ALU for add, and, and or?How could we build a 32-bit ALU?
Single Cycle Processor Design
19
1-Bit Adder20KICT, IIUM
xorSingle Cycle Processor Design
20
21KICT, IIUM
Ripple-Carry Logic for 32-bit ALU1-bit ALU for AND, OR and addMultiplexor control line
Building a 32-bit ALUSingle Cycle Processor Design
21
Addition and Subtraction2-22
22
Subtraction23KICT, IIUM
Two's complement approach: just negate b and add.Negation: invert each bit of b and set Cin (LSB, ALU0) to 1
Single Cycle Processor Design
23
Detecting Overflow24KICT, IIUM
No overflow when adding a positive and a negative numberNo overflow when subtracting numbers with the same signOverflow occurs when the result has wrong sign (verify!):
Consider the operations A + B, and A Bcan overflow occur if B is 0 ?can overflow occur if A is 0 ?Single Cycle Processor Design
24
MIPS Instruction25KICT, IIUM
Single Cycle Processor DesignInstructionMeaningFormataddRd, Rs, Rtadditionop6 = 0Rs5Rt5Rd500x20subRd, Rs, Rtsubtractionop6 = 0Rs5Rt5Rd500x22andRd, Rs, Rtbitwise andop6 = 0Rs5Rt5Rd500x24orRd, Rs, Rtbitwise orop6 = 0Rs5Rt5Rd500x25xorRd, Rs, Rtexclusive orop6 = 0Rs5Rt5Rd500x26sltRd, Rs, Rtset on less thanop6 = 0Rs5Rt5Rd500x2aaddiRt, Rs, Im16add Immediate0x08Rs5Rt5Im16sltiRt, Rs, Im16slt Immediate0x0aRs5Rt5Im16andiRt, Rs, Im16and Immediate0x0cRs5Rt5Im16oriRt, Rs, Im16or Immediate0x0dRs5Rt5Im16xoriRt, Im16xor Immediate0x0eRs5Rt5Im16lwRt, Im16(Rs)load woRd0x23Rs5Rt5Im16swRt, Im16(Rs)store woRd0x2bRs5Rt5Im16beqRs, Rt, Im16branch if equal0x04Rs5Rt5Im16bneRs, Rt, Im16branch not equal0x05Rs5Rt5Im16jIm26jump0x02Im26
25
Set Less Than Instruction26KICT, IIUM
MIPS has set on less than instructionsslt rd,rs,rt if (rs < rt) rd = 1 else rd = 0sltu rd,rs,rt unsigned Th
43
Fetch and Execute Cycle44KICT, IIUM
Single Cycle Processor DesignAbstraction of fetch/execute implementationuse the PC to read instruction addressfetch the instruction from memory and increment PCuse fields of the instruction to select registers to readexecute depending on the instructionrepeat
44
Datapath: Fetch Cycle45KICT, IIUM
Single Cycle Processor DesignAssemble the datapath from its componentsFor instruction fetching, we need Program CounterInstruction MemoryAdder for incrementing PC
The least significant 2 bits of the PC are 00 since PC is a multiple of 4Datapath does not handle branch or jump instructionsImproved datapath increments upper 30 bits of PC by 1 32
Address
InstructionInstructionMemory
32
30
PC00
+1 30
ImprovedDatapathnext PC
clkPC
32
Address
InstructionInstructionMemory
32
32
32
4
Addnext PC
clk00
45
Illustration46KICT, IIUM
Single Cycle Processor Design
Instruction MEM[PC]PC PC + 4
46
Datapath for R-Type Instruction47KICT, IIUM
Single Cycle Processor DesignControl signalsALUCtrl is derived from the funct field because Op = 0 for R-typeRegWrite is used to enable the writing of the ALU resultOp6Rs5Rt5Rd5funct6sa5
ALUCtrl
RegWrite
ALU
32
32
ALU result 32
Rs and Rt fields select two registers to read. Rd field selects register to writeBusS & BusT provide data input to ALU. ALU result is connected to BusD
Registers RsRtBusSBusTRdBusD
5Rs
5Rt
5Rd
Same clock updates PC and Rd registerPC
32
Address
InstructionInstructionMemory
32
32
32
4
Addnext PC
clk00
47
Datapath for R-Type Instruction48KICT, IIUM
Single Cycle Processor Design
add Rd, Rs, RtR[rd] R[rs] + R[rt];
48
Datapath for I-type Instructn49KICT, IIUM
Single Cycle Processor DesignControl signalsALUCtrl is derived from the Op fieldRegWrite is used to enable the writing of the ALU resultExtOp is used to control the extension of the 16-bit immediateOp6Rs5Rt5immediate16ALUCtrl
RegWrite
5Registers RsRtBusSBusTRdBusD
5Rs
5Rt
ExtOp
32 32
ALU result 32
32
ALU
Extender
Imm16Second ALU input comes from the extended immediate. Rt and BusT are not usedSame clock edge updates PC and RtRt selects register to write, not Rd
clk
PC
32
Address
InstructionInstructionMemory
32
32
32
4
Addnext PC
clk00
49
Immediate Extension50KICT, IIUM
Single Cycle Processor DesignTwo types of extensionsZero-extension for unsigned constantsSign-extension for signed constantsControl signal ExtOp indicates type of extensionExtender Implementation: wiring and one AND gateExtOp = 0 Upper16 = 0ExtOp = 1Upper16 = sign bit...
ExtOp
Upper16 bits
Lower16 bits
...
Imm16
50
Combination of R and I51KICT, IIUM
Single Cycle Processor DesignControl signalsALUCtrl is derived from either the Op or the funct fieldRegWrite enables the writing of the ALU resultExtOp controls the extension of the 16-bit immediateRegDst selects the register destination as either Rt or RdALUSrc selects the 2nd ALU source as BusT or extended immediateA mux selects Rd as either Rt or RdAnother mux selects 2nd ALU input as either data on BusT or the extended immediate
ALUCtrl
RegWrite
ExtOp
ALUALU result 32
32
Registers RsRtBusS
BusT
Rd
5
32BusD 32
Address
InstructionInstructionMemory
32
30
PC00 30
Rs
5Rd
Extender
Imm16Rt
32
RegDst
ALUSrc
01
clk
01PC
32
32
32
4
Addnext PC
clk00
51
Adding Data Memory52KICT, IIUM
Single Cycle Processor DesignAdditional Control signalsMemRead for load instructionsMemWrite for store instructionsMemtoReg selects data on BusD as ALU result or Memory Data_outBusT is connected to Data_in of Data Memory for store instructionsA data memory is added for load and store instructionsA 3rd mux selects data on BusD as either ALU result or memory data_outDataMemory AddressData_inData_out
32
32
ALU
ALUCtrl 32
Registers RsRtBusS
RegWriteBusT
Rd
5BusD 32
Address
InstructionInstructionMemory
32
30
PC00
+1 30
Rs
5Rd
E
ExtOpImm16Rt
01
RegDstALUSrc
01
32
MemRead
MemWrite
32
ALU result
32
01
MemtoRegALU calculates data memory address
clk
52
Datapath of LOAD53KICT, IIUM
Single Cycle Processor Design
lw Rt, offset(Rs)R[rt]