single cycle processing

99
Mid Term Exam 1 KICT, IIUM MIPS Programming Syllabus: Lecture 1 ~ 5 Date and Day: 13/03/2017 Monday Time: 2.00 PM ~ 3.00 PM Venue: Class Room Questions Q1. (a) to (e) 5 × 4 = 20 Q2. (a) to (e) 5 × 4 = 20

Upload: inmogr

Post on 05-Apr-2017

11 views

Category:

Education


0 download

TRANSCRIPT

Slide 1

Mid Term Exam1KICT, IIUM

MIPS Programming Syllabus: Lecture 1 ~ 5 Date and Day: 13/03/2017 Monday Time: 2.00 PM ~ 3.00 PM Venue: Class Room QuestionsQ1. (a) to (e) 5 4 = 20Q2. (a) to (e) 5 4 = 20

1

CSC 3402M.M. Hafizur Rahman Single Cycle Processor DesignLecture 6Office: C-5-15, KICT, IIUMEmail: [email protected]

2

Performance3KICT, IIUM

Single Cycle Processor DesignRecall, performance is determined by:Instruction countClock cycles per instruction (CPI)Clock cycle timeProcessor design will affectClock cycles per instructionClock cycle timeSingle cycle datapath and control design:Advantage: One clock cycle per instructionDisadvantage: long cycle time

I-CountCPICycle

3

Step by Step Design of a Processor4KICT, IIUM

Single Cycle Processor DesignAnalyze instruction set => datapath requirementsThe meaning of each instruction is given by the register transfersDatapath must include storage elements for ISA registersDatapath must support each register transferSelect datapath components and clocking methodologyAssemble datapath meeting the requirementsAnalyze implementation of each instructionDetermine the setting of control signals for register transferAssemble the control logic

4

MIPS Instruction5KICT, IIUM

Single Cycle Processor DesignAll instructions are 32-bit wideThree instruction formats: R-type, I-type, and J-type

Op6: 6-bit opcode of the instructionRs5, Rt5, Rd5: 5-bit source and destination register numberssa5: 5-bit shift amount used by shift instructionsfunct6: 6-bit function field for R-type instructionsimmediate16: 16-bit immediate value or address offsetimmediate26: 26-bit target address of the jump instructionOp6Rs5Rt5Rd5funct6sa5Op6Rs5Rt5immediate16Op6immediate26

5

MIPS Instruction6KICT, IIUM

Single Cycle Processor DesignOnly a subset of the MIPS instructions are consideredALU instructions (R-type): add, sub, and, or, xor, sltImmediate instructions (I-type): addi, slti, andi, ori, xoriLoad and Store (I-type): lw, swBranch (I-type): beq, bneJump (J-type): jThis subset does not include all instructionsSufficient to illustrate design of datapath and controlConcepts used to implement the MIPS subset are used to construct a broad spectrum of computers

6

MIPS Instruction7KICT, IIUM

Single Cycle Processor DesignInstructionMeaningFormataddRd, Rs, Rtadditionop6 = 0Rs5Rt5Rd500x20subRd, Rs, Rtsubtractionop6 = 0Rs5Rt5Rd500x22andRd, Rs, Rtbitwise andop6 = 0Rs5Rt5Rd500x24orRd, Rs, Rtbitwise orop6 = 0Rs5Rt5Rd500x25xorRd, Rs, Rtexclusive orop6 = 0Rs5Rt5Rd500x26sltRd, Rs, Rtset on less thanop6 = 0Rs5Rt5Rd500x2aaddiRt, Rs, Im16add Immediate0x08Rs5Rt5Im16sltiRt, Rs, Im16slt Immediate0x0aRs5Rt5Im16andiRt, Rs, Im16and Immediate0x0cRs5Rt5Im16oriRt, Rs, Im16or Immediate0x0dRs5Rt5Im16xoriRt, Im16xor Immediate0x0eRs5Rt5Im16lwRt, Im16(Rs)load woRd0x23Rs5Rt5Im16swRt, Im16(Rs)store woRd0x2bRs5Rt5Im16beqRs, Rt, Im16branch if equal0x04Rs5Rt5Im16bneRs, Rt, Im16branch not equal0x05Rs5Rt5Im16jIm26jump0x02Im26

7

Processor Implementation8KICT, IIUM

Single Cycle Processor DesignSingle Cycleperform each instruction in 1 clock cycleclock cycle must be long enough for slowest instructiondisadvantage: only as fast as slowest instructionMulti-Cyclebreak fetch/execute cycle into multiple stepsperform 1 step in each clock cycleadvantage: each instruction uses only as many cycles as it needsPipelinedexecute each instruction in multiple stepsperform 1 step / instruction in each clock cycleprocess multiple instructions in parallel

8

Register Transfer Level9KICT, IIUM

Single Cycle Processor DesignRTL is a description of data flow between registersRTL gives a meaning to the instructionsAll instructions are fetched from memory at address PCInstruction RTL DescriptionADDReg(Rd)Reg(Rs) + Reg(Rt);PC PC + 4SUBReg(Rd)Reg(Rs) Reg(Rt);PC PC + 4ORIReg(Rt)Reg(Rs) | zero_ext(Im16); PC PC + 4LWReg(Rt)MEM[Reg(Rs) + sign_ext(Im16)]; PC PC + 4SWMEM[Reg(Rs) + sign_ext(Im16)] Reg(Rt); PC PC + 4BEQif (Reg(Rs) == Reg(Rt))PC PC + 4 + 4 sign_extend(Im16)elsePC PC + 4

9

Instruction Execution10KICT, IIUM

Single Cycle Processor DesignR-typeFetch instruction:Instruction MEM[PC]Fetch operands:data1 Reg(Rs), data2 Reg(Rt)Execute operation:ALU_result func(data1, data2)Write ALU result:Reg(Rd) ALU_resultNext PC address:PC PC + 4

I-typeFetch instruction:Instruction MEM[PC]Fetch operands:data1 Reg(Rs), data2 Extend(imm16)Execute operation:ALU_result op(data1, data2)Write ALU result:Reg(Rt) ALU_resultNext PC address:PC PC + 4

BEQFetch instruction:Instruction MEM[PC]Fetch operands:data1 Reg(Rs), data2 Reg(Rt)Equality:zero subtract(data1, data2) Branch:if (zero)PC PC + 4 (1 + sign_ext(imm16)elsePC PC + 4

10

Instruction Execution11KICT, IIUM

Single Cycle Processor DesignLWFetch instruction:Instruction MEM[PC]Fetch base register:base Reg(Rs)Calculate address:address base + sign_extend(imm16)Read memory:data MEM[address]Write register Rt:Reg(Rt) dataNext PC address:PC PC + 4SWFetch instruction:Instruction MEM[PC]Fetch registers:base Reg(Rs), data Reg(Rt)Calculate address:address base + sign_extend(imm16)Write memory:MEM[address] dataNext PC address:PC PC + 4JumpFetch instruction:Instruction MEM[PC]Target PC address:target PC[31:28] || Imm26 || 00Jump:PC target

concatenation

11

What do we need?12KICT, IIUM

Single Cycle Processor Design

Two types of functional hardware elements are needed:elements that operate on data (called combinational elements)elements that contain data (called sequential or state elements)

12

Fetch and Execute Cycle13KICT, IIUM

Single Cycle Processor DesignAbstraction of fetch/execute implementationuse the PC to read instruction addressfetch the instruction from memory and increment PCuse fields of the instruction to select registers to readexecute depending on the instructionrepeat

13

Datapath and Control14KICT, IIUM

Single Cycle Processor Design

Status

ControllerControlDatapath

14

Basic Hardware15KICT, IIUM

Single Cycle Processor Design

15

Truth Table and Simplification16KICT, IIUM

Problem: Consider logic functions with three inputs: A, B, C.

output D is true if at least one input is trueoutput E is true if exactly two inputs are trueoutput F is true only if all three inputs are true

Show the truth table for these three functionsShow the Boolean equations for these three functionsShow an implementation consisting of AND-OR-NOT gate.Single Cycle Processor Design

16

A Simple Multifunction Logic Unit17KICT, IIUM

To warm up let's build a logic unit to support the AND & OR instructions for MIPS (32-bit registers)we'll just build a 1-bit unit and use 32 of them

Possible implementation using a multiplexor :

aboutputoperationselector

...32 units

Single Cycle Processor Design

17

Using Multiplexor 18KICT, IIUM

Selects one of the inputs to be the output based on a control input

Lets build our ALU using a MUX (multiplexor):

Single Cycle Processor Design

18

Implementation19KICT, IIUM

Not easy to decide the best way to implement somethingdo not want too many inputs to a single gatedo not want to have to go through too many gates (= levels)for our purposes, ease of comprehension is importantLet's look at a 1-bit ALU for addition:

How could we build a 1-bit ALU for add, and, and or?How could we build a 32-bit ALU?

Single Cycle Processor Design

19

1-Bit Adder20KICT, IIUM

xorSingle Cycle Processor Design

20

21KICT, IIUM

Ripple-Carry Logic for 32-bit ALU1-bit ALU for AND, OR and addMultiplexor control line

Building a 32-bit ALUSingle Cycle Processor Design

21

Addition and Subtraction2-22

22

Subtraction23KICT, IIUM

Two's complement approach: just negate b and add.Negation: invert each bit of b and set Cin (LSB, ALU0) to 1

Single Cycle Processor Design

23

Detecting Overflow24KICT, IIUM

No overflow when adding a positive and a negative numberNo overflow when subtracting numbers with the same signOverflow occurs when the result has wrong sign (verify!):

Consider the operations A + B, and A Bcan overflow occur if B is 0 ?can overflow occur if A is 0 ?Single Cycle Processor Design

24

MIPS Instruction25KICT, IIUM

Single Cycle Processor DesignInstructionMeaningFormataddRd, Rs, Rtadditionop6 = 0Rs5Rt5Rd500x20subRd, Rs, Rtsubtractionop6 = 0Rs5Rt5Rd500x22andRd, Rs, Rtbitwise andop6 = 0Rs5Rt5Rd500x24orRd, Rs, Rtbitwise orop6 = 0Rs5Rt5Rd500x25xorRd, Rs, Rtexclusive orop6 = 0Rs5Rt5Rd500x26sltRd, Rs, Rtset on less thanop6 = 0Rs5Rt5Rd500x2aaddiRt, Rs, Im16add Immediate0x08Rs5Rt5Im16sltiRt, Rs, Im16slt Immediate0x0aRs5Rt5Im16andiRt, Rs, Im16and Immediate0x0cRs5Rt5Im16oriRt, Rs, Im16or Immediate0x0dRs5Rt5Im16xoriRt, Im16xor Immediate0x0eRs5Rt5Im16lwRt, Im16(Rs)load woRd0x23Rs5Rt5Im16swRt, Im16(Rs)store woRd0x2bRs5Rt5Im16beqRs, Rt, Im16branch if equal0x04Rs5Rt5Im16bneRs, Rt, Im16branch not equal0x05Rs5Rt5Im16jIm26jump0x02Im26

25

Set Less Than Instruction26KICT, IIUM

MIPS has set on less than instructionsslt rd,rs,rt if (rs < rt) rd = 1 else rd = 0sltu rd,rs,rt unsigned Th

43

Fetch and Execute Cycle44KICT, IIUM

Single Cycle Processor DesignAbstraction of fetch/execute implementationuse the PC to read instruction addressfetch the instruction from memory and increment PCuse fields of the instruction to select registers to readexecute depending on the instructionrepeat

44

Datapath: Fetch Cycle45KICT, IIUM

Single Cycle Processor DesignAssemble the datapath from its componentsFor instruction fetching, we need Program CounterInstruction MemoryAdder for incrementing PC

The least significant 2 bits of the PC are 00 since PC is a multiple of 4Datapath does not handle branch or jump instructionsImproved datapath increments upper 30 bits of PC by 1 32

Address

InstructionInstructionMemory

32

30

PC00

+1 30

ImprovedDatapathnext PC

clkPC

32

Address

InstructionInstructionMemory

32

32

32

4

Addnext PC

clk00

45

Illustration46KICT, IIUM

Single Cycle Processor Design

Instruction MEM[PC]PC PC + 4

46

Datapath for R-Type Instruction47KICT, IIUM

Single Cycle Processor DesignControl signalsALUCtrl is derived from the funct field because Op = 0 for R-typeRegWrite is used to enable the writing of the ALU resultOp6Rs5Rt5Rd5funct6sa5

ALUCtrl

RegWrite

ALU

32

32

ALU result 32

Rs and Rt fields select two registers to read. Rd field selects register to writeBusS & BusT provide data input to ALU. ALU result is connected to BusD

Registers RsRtBusSBusTRdBusD

5Rs

5Rt

5Rd

Same clock updates PC and Rd registerPC

32

Address

InstructionInstructionMemory

32

32

32

4

Addnext PC

clk00

47

Datapath for R-Type Instruction48KICT, IIUM

Single Cycle Processor Design

add Rd, Rs, RtR[rd] R[rs] + R[rt];

48

Datapath for I-type Instructn49KICT, IIUM

Single Cycle Processor DesignControl signalsALUCtrl is derived from the Op fieldRegWrite is used to enable the writing of the ALU resultExtOp is used to control the extension of the 16-bit immediateOp6Rs5Rt5immediate16ALUCtrl

RegWrite

5Registers RsRtBusSBusTRdBusD

5Rs

5Rt

ExtOp

32 32

ALU result 32

32

ALU

Extender

Imm16Second ALU input comes from the extended immediate. Rt and BusT are not usedSame clock edge updates PC and RtRt selects register to write, not Rd

clk

PC

32

Address

InstructionInstructionMemory

32

32

32

4

Addnext PC

clk00

49

Immediate Extension50KICT, IIUM

Single Cycle Processor DesignTwo types of extensionsZero-extension for unsigned constantsSign-extension for signed constantsControl signal ExtOp indicates type of extensionExtender Implementation: wiring and one AND gateExtOp = 0 Upper16 = 0ExtOp = 1Upper16 = sign bit...

ExtOp

Upper16 bits

Lower16 bits

...

Imm16

50

Combination of R and I51KICT, IIUM

Single Cycle Processor DesignControl signalsALUCtrl is derived from either the Op or the funct fieldRegWrite enables the writing of the ALU resultExtOp controls the extension of the 16-bit immediateRegDst selects the register destination as either Rt or RdALUSrc selects the 2nd ALU source as BusT or extended immediateA mux selects Rd as either Rt or RdAnother mux selects 2nd ALU input as either data on BusT or the extended immediate

ALUCtrl

RegWrite

ExtOp

ALUALU result 32

32

Registers RsRtBusS

BusT

Rd

5

32BusD 32

Address

InstructionInstructionMemory

32

30

PC00 30

Rs

5Rd

Extender

Imm16Rt

32

RegDst

ALUSrc

01

clk

01PC

32

32

32

4

Addnext PC

clk00

51

Adding Data Memory52KICT, IIUM

Single Cycle Processor DesignAdditional Control signalsMemRead for load instructionsMemWrite for store instructionsMemtoReg selects data on BusD as ALU result or Memory Data_outBusT is connected to Data_in of Data Memory for store instructionsA data memory is added for load and store instructionsA 3rd mux selects data on BusD as either ALU result or memory data_outDataMemory AddressData_inData_out

32

32

ALU

ALUCtrl 32

Registers RsRtBusS

RegWriteBusT

Rd

5BusD 32

Address

InstructionInstructionMemory

32

30

PC00

+1 30

Rs

5Rd

E

ExtOpImm16Rt

01

RegDstALUSrc

01

32

MemRead

MemWrite

32

ALU result

32

01

MemtoRegALU calculates data memory address

clk

52

Datapath of LOAD53KICT, IIUM

Single Cycle Processor Design

lw Rt, offset(Rs)R[rt]