chapter 7 processing unit processing unit processing unit datapath internal bus architecture ...

69
Chapter 7 Processing Unit Processing Unit Processing Unit Datapath Datapath Internal Bus Architecture Internal Bus Architecture Internal Processing Internal Processing Hard-wired Hard-wired Microinstruction method (briefly) Microinstruction method (briefly) Next Lecture Next Lecture Pipelining Pipelining

Upload: kathlyn-booker

Post on 11-Jan-2016

236 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Chapter 7Processing Unit

Processing UnitProcessing Unit DatapathDatapath

Internal Bus ArchitectureInternal Bus Architecture Internal ProcessingInternal Processing

• Hard-wiredHard-wired• Microinstruction method (briefly)Microinstruction method (briefly)

Next LectureNext Lecture PipeliningPipelining

Page 2: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

2

Fundamental Concepts

For simplicity, assume that each instruction occupies one For simplicity, assume that each instruction occupies one memory wordmemory word

Instruction execution stagesInstruction execution stages Fetch stageFetch stage

Fetch the contents of the memory location pointed to Fetch the contents of the memory location pointed to by PC and load it into IR : [IR] by PC and load it into IR : [IR] [[PC]] [[PC]]

Increment the contents of PC : [PC] Increment the contents of PC : [PC] [PC] + 4 [PC] + 4 Execution stageExecution stage

Carry out the instruction fetched Carry out the instruction fetched Accessing register, memory, etcAccessing register, memory, etc Performing computation using ALUPerforming computation using ALU Using internal and external resourcesUsing internal and external resources

Page 3: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Datapath

linesData

Addresslines

External Memory Bus

Carry-in

ALU

PC

MAR

MDR

Y

Z

Add

XOR

Sub

IR

TEMP

R0

controlALU

lines

Control signals

R n 1-

Instruction

decoder and

Internal processor bus

control logic

A B

MUXSelect

Constant 4

ADD R1,R2,R3ADD R1,R2,R3

LDR R0, addrLDR R0, addr

Page 4: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

4

Datapath with a single common bus

ALU and all registers are on a single common busALU and all registers are on a single common bus The common bus is internal to the CPU (do not be The common bus is internal to the CPU (do not be

confused with external buses connecting CPU to memory confused with external buses connecting CPU to memory and I/O devices)and I/O devices)

The external memory bus connects to the CPU via MDR The external memory bus connects to the CPU via MDR and MARand MAR

The number and function of registers R0 through R(n-1) The number and function of registers R0 through R(n-1) varies from one CPU to anothervaries from one CPU to another

Registers can either be general purpose or special Registers can either be general purpose or special purposepurpose

Register Y, Z and TEMP are transparent to the program, Register Y, Z and TEMP are transparent to the program, they are used only by the CPU for temporary storagethey are used only by the CPU for temporary storage

DatapathDatapath: ALU, registers, and the interconnecting bus: ALU, registers, and the interconnecting bus Assume all the registers have a clock inputAssume all the registers have a clock input

Page 5: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

5

Processing

Most of the operations needed to execute an instruction Most of the operations needed to execute an instruction can be carried out by performing one or more of the can be carried out by performing one or more of the following functionsfollowing functions Fetch the contents of a given memory location and Fetch the contents of a given memory location and

load them into a CPU register (e.g., LDR R0, addr)load them into a CPU register (e.g., LDR R0, addr) Store a word of data from a CPU register into a given Store a word of data from a CPU register into a given

location in memory (e.g., STO R0, addr)location in memory (e.g., STO R0, addr) Transfer a word of data from one CPU register to Transfer a word of data from one CPU register to

another or to the ALU (e.g., MOV R2,R3 or ADD R1,#1)another or to the ALU (e.g., MOV R2,R3 or ADD R1,#1) Perform an arithmetic or logical operation and store Perform an arithmetic or logical operation and store

the result in a CPU register (e.g., ADD R1,R2,R3)the result in a CPU register (e.g., ADD R1,R2,R3)

Page 6: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Register Transfer

Registers need input and Registers need input and output gatingoutput gating

RiRiinin control signal for input of control signal for input of Ri: when RiRi: when Riinin=1, data available =1, data available on the common bus is loaded on the common bus is loaded in Riin Ri

RiRioutout control signal for output control signal for output of Ri when Riof Ri when Rioutout=1, the =1, the contents of Ri are placed on contents of Ri are placed on the busthe bus

Example: transfer the Example: transfer the contents of R1 to R4contents of R1 to R4 Enable output of R1 : Enable output of R1 :

R1R1outout=1=1 Enable input of R4: Enable input of R4:

R4R4inin=1=1

BA

Z

ALU

Yin

Y

Zin

Zout

Riin

Ri

Riout

Internal processor bus

Constant 4

MUXSelect

Page 7: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Arithmetic & Logical Operation

ALU is a combinational circuit that ALU is a combinational circuit that has no internal storagehas no internal storage

To add two numbers, the two To add two numbers, the two operands have to be availableoperands have to be available to to the ALU simultaneouslythe ALU simultaneously

Register Y holds one of the two Register Y holds one of the two numbersnumbers

The other number is gated onto the The other number is gated onto the busbus

The result is stored temporarily in ZThe result is stored temporarily in Z

Example : ADD R1, R2, R3 (R3=R1+R2)Example : ADD R1, R2, R3 (R3=R1+R2)

Step 1,Step 1, R1 R1outout=1 and Y=1 and Yinin=1=1

Step 2,Step 2, R2 R2outout=1, Add=1, Z=1, Add=1, Zinin = 1 = 1

Step 3,Step 3, Z Zoutout = 1, R3 = 1, R3inin=1: contents of Z =1: contents of Z are transferred to R3are transferred to R3

Step3 cannot be done concurrently Step3 cannot be done concurrently with step2, because only one with step2, because only one register can be connected to the register can be connected to the bus at any given time bus at any given time

BA

Z

ALU

Yin

Y

Zin

Zout

Riin

Ri

Riout

Internal processor bus

Constant 4

MUXSelect

Add

Page 8: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

8

Register Gating and Timing of Data Transfers Each bit of a register consists of a flip-flop (FF) Each bit of a register consists of a flip-flop (FF) WWhile Rihile Riinin=1 , the state of =1 , the state of eacheach FF changes FF changes to to its its

correspondcorrespondinging data on the bus data on the bus At a clock edge while At a clock edge while RiRiinin=1,=1, the data stored in the FF the data stored in the FF

immediately before the transition is locked untilimmediately before the transition is locked until RiRiinin=1 =1 againagain

TThe output of the register is capable of being disconnectedhe output of the register is capable of being disconnected from the bus, placing a 0 or placing a 1 on the bus: tri-statefrom the bus, placing a 0 or placing a 1 on the bus: tri-state

D Q

Q

Clock

1

0

Ri out

Ri in

Bus

Page 9: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

9

Fetch Operation

CPU has to specify the address of the memory location and request a CPU has to specify the address of the memory location and request a read operation (e.g., LDR R2, [R1])read operation (e.g., LDR R2, [R1])

1.1. Send an address (MAR Send an address (MAR [R1]) to memory [R1]) to memory CPU transfers the address of the required word into MARCPU transfers the address of the required word into MAR

2.2. Start a Read operationStart a Read operation CPU uses the control lines of the memory bus to indicate a Read CPU uses the control lines of the memory bus to indicate a Read

operation is neededoperation is needed3.3. Wait for MFC (memory function complete) response Wait for MFC (memory function complete) response

CPU waits until it receives an answer from memory informing that CPU waits until it receives an answer from memory informing that the Read has been completed.the Read has been completed.

When MFC is set to 1, it indicates that the specified location has When MFC is set to 1, it indicates that the specified location has been read and the contents are available on the data lines of the been read and the contents are available on the data lines of the memory busmemory bus

The duration of this step depends on the speed of memoryThe duration of this step depends on the speed of memory Overall execution time of an instruction can be decreased by useful Overall execution time of an instruction can be decreased by useful

work, example: incrementing the PCwork, example: incrementing the PC4.4. R2 R2 [MDR] [MDR]

The information on the memory bus is first loaded into MDR The information on the memory bus is first loaded into MDR The contents of the MDR are next moved into a destination registerThe contents of the MDR are next moved into a destination register

Page 10: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Read Timing1 2

Clock

Address

MR

Data

MFC

Read

MDRinE

MDRout

Step 3

MARin

Page 11: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

11

Synchronous Asynchronous Transfer

Asynchronous transferAsynchronous transfer One device initiates the transfer and waits until the One device initiates the transfer and waits until the

other device respondsother device responds Enables transfer of data between two independent Enables transfer of data between two independent

devices that have different speeds of operationdevices that have different speeds of operation Synchronous transferSynchronous transfer

One of the control lines of the bus carries pulses from One of the control lines of the bus carries pulses from a clock running continuously at a fixed frequencya clock running continuously at a fixed frequency

These pulses provide common timing signals to the These pulses provide common timing signals to the CPU and main memoryCPU and main memory

Simpler implementationSimpler implementation Cannot accommodate devices of widely varying Cannot accommodate devices of widely varying

speed, except by reducing the speed of all devices to speed, except by reducing the speed of all devices to that of the slowest onethat of the slowest one

MixedMixed

Page 12: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

12

Store Operation

STORE R2, [R1]STORE R2, [R1]

Step 1,Step 1,MAR MAR [R1] [R1]

Step Step 2,2,MDR MDR [R2], Write [R2], Write

Step Step 3,3,Wait for MFCWait for MFC

Steps 1 and 2 can be carried out simultaneously if the Steps 1 and 2 can be carried out simultaneously if the architecture allows itarchitecture allows it

This is not possible with a single CPU bus This is not possible with a single CPU bus Step 3 may be overlapped with other operations, Step 3 may be overlapped with other operations,

provided that there is no conflictprovided that there is no conflict

Page 13: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

13

Execution of a Complete Instruction Example: ADD (R3),R1

1.1. Instruction FetchInstruction Fetch

2.2. Fetch operand(s)Fetch operand(s)

3.3. Perform the additionPerform the addition

4.4. Store results into R1Store results into R1

linesData

Addresslines

External Memory

Bus

Carry-in

ALU

PC

MAR

MDR

Y

Z

Add

XOR

SubcontrolALU

lines

A B

MUXSelect

Constant 4

Page 14: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

14

Execution of a Complete Instruction Example: ADD (R3),R1

Step Action

1 PCout , MARin , Read,Select4,Add, Zin

2 Zout , PCin , WMFC

3 MDRout , IRin

4 R3out , MARin , Read

5 R1out , Yin , WMF C

6 MDRout , SelectY,Add, Zin

7 Zout , R1in , End

linesData

Addresslines

External Memory

Bus

Carry-in

ALU

PC

MAR

MDR

Y

Z

Add

XOR

SubcontrolALU

lines

A B

MUXSelect

Constant 4

IR

R1

R3

Page 15: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

15

Steps 1, 2 and 3. Fetch & Increase PC

PCPCoutout, MAR, MARinin, Read, Se, Read, Select 4lect 4, Add, Z, Add, Zinin

Load the content of the PC into MAR, and send a Load the content of the PC into MAR, and send a read request read request

PCPCoutout, M, MAARRinin, R, Readead WWhile waiting for a response, increment PC hile waiting for a response, increment PC

SSeeleclect t constant 4 in MUXconstant 4 in MUX ALU inputALU input B is receiving B is receiving the current value in PC,the current value in PC, SSpecify Add operationpecify Add operation In In step 2step 2, , move updated value back into PCmove updated value back into PC and and

wait MFCwait MFC ((ZZoutout, PC, PCinin, WMFC, WMFC)) In In step 3step 3, the w, the word fetched from memory is loaded ord fetched from memory is loaded

into IR into IR MDRMDRoutout, IR, IRinin

Page 16: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

16

Steps 4, 5, 6 and 7

Step 4 and 5:Step 4 and 5: FFetch the first operand: the content of the memory etch the first operand: the content of the memory

locationlocation pointed to by R3pointed to by R3 R3R3outout, MAR, MARinin, Read, Read R1R1outout, Y, Yinin, WMFC, WMFC

Step 6:Step 6: PPerform the additionerform the addition

MDRMDRoutout, , Select Y, Select Y, Add, ZAdd, Zinin

Step 7:Step 7: LLoad results into R1oad results into R1

ZZoutout, R1, R1inin, End, End

Page 17: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

17

StepAction

1 PCout, MARin , Read,Select4, Add, Zin

2 Zout, PCin , Yin, WMF C

3 MDRout , IRin

4 Offset-field-of-IRout, Add,Zin

5 Zout, PCin, End

Control sequence for an unconditional branch instruction

Branch Instructions

Page 18: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

18

Steps of Unconditional Branching Branching: branch address is obtained by adding an offset X Branching: branch address is obtained by adding an offset X

(given in(given in the branch instruction) to the current value of the PCthe branch instruction) to the current value of the PC

1.1. Fetch an instructionFetch an instruction PCPCoutout, MAR, MARinin, Read, Select 4, Add, Z, Read, Select 4, Add, Zinin

ZZoutout, PC, PCinin, Y, Yinin, WMFC, WMFC MDRMDRoutout, IR, IRinin

2.2. EExecutexecute Offset-field-of-IROffset-field-of-IRoutout, Add, Z, Add, Zinin

ZZoutout, PC, PCinin, End, End

PC is incremented during the fetch PC is incremented during the fetch phasephase before knowing the before knowing the typetype of instruction being executedof instruction being executed

WWhen the offset is added to the contenthen the offset is added to the contentss of the PC, the PC has of the PC, the PC has already been updated to the instruction following the branchalready been updated to the instruction following the branch

TThe offset is the difference between the branch target address he offset is the difference between the branch target address and theand the address immediately following the branchaddress immediately following the branch

Page 19: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

19

Steps of Conditional Branching

Check the status of the condition codes before loading Check the status of the condition codes before loading the new value into the PCthe new value into the PC

Offset-field-of-IROffset-field-of-IRoutout, Add, Z, Add, Zinin

If conditions do not match, then EndIf conditions do not match, then End

Page 20: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Multi-Bus Structure

All general purpose registers are All general purpose registers are combined intocombined into a register filea register file

RRegister file can be implemented egister file can be implemented in VLSI using an array of memory in VLSI using an array of memory cells similar to the one used in cells similar to the one used in RAM chipsRAM chips

TThe register file has two outputs, he register file has two outputs, allowing theallowing the contentcontentss of the of the register to be placed on buses A register to be placed on buses A and B simultaneouslyand B simultaneously

CCompared to the single bus ompared to the single bus organization, this organization organization, this organization requires fewer control stepsrequires fewer control steps (i.e., (i.e., faster)faster)

Memory busdata lines

Bus A Bus B Bus C

Instructiondecoder

PC

Registerfile

Constant 4

ALU

MDR

A

B

R

MU

X

Incrementer

Addresslines

MAR

IR

Controls

Page 21: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

21

StepAction

1 PCout, R=B, MARin, Read,IncPC

2 WMFC

3 MDRoutB, R=B, IR in

4 R4outA, R5outB, Select BusA,Add, R6in, End

Control sequence for the instruction

Multiple Bus Operation ExampleAdd R4,R5,R6

Steps 1…3: Instruction fetchSteps 1…3: Instruction fetch Step 4: AdditionStep 4: Addition

Page 22: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

22

Buses A and B are used to transfer the source operandsBuses A and B are used to transfer the source operands Bus C is used to transfer the destinationBus C is used to transfer the destination TThe path from the source to the destination goes he path from the source to the destination goes

throughthrough the ALU (where the operation is performed)the ALU (where the operation is performed) Copies of one register to another also go through the Copies of one register to another also go through the

ALUALU TTemporary storage registers (Y, Z) are not neededemporary storage registers (Y, Z) are not needed Ensuring that a register can serve as both a sourceEnsuring that a register can serve as both a source and and

a destination a destination not possible if registers are simple latchesnot possible if registers are simple latches the register file must be implemented using edge the register file must be implemented using edge

triggeredtriggered m master-slave aster-slave flip-flopsflip-flops TThe three-bus architecture allows execution of a he three-bus architecture allows execution of a

register-to-register operation in a single clock cycleregister-to-register operation in a single clock cycle

Multiple Bus Operation ExampleAdd R4,R5,R6

Page 23: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

23

Overlap fetch and execute phasesOverlap fetch and execute phases IInstruction unit: fetch instructions and place them nstruction unit: fetch instructions and place them

into a queue ready for executioninto a queue ready for execution IIt generates memory addresses based on the t generates memory addresses based on the

address of theaddress of the last instruction fetchedlast instruction fetched AAttempts to ttempts to prefetchprefetch the correct instruction on the correct instruction on

branchesbranches based on a based on a history of phistory of prrevious branchesevious branches Prefetching with branch predictionPrefetching with branch prediction

Including a fast Including a fast cachecache on the same chip as the CPU on the same chip as the CPU HHides the memory response time ides the memory response time IIf the desired data is found in the cache: cache f the desired data is found in the cache: cache hit;hit;

otherwise a cache otherwise a cache missmiss IIf a cache miss occursf a cache miss occurs,, it is necessary to go to it is necessary to go to the the

main memorymain memory

Enhancements

Page 24: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

24

Generating Control Signals

To execute an instruction, the CPU must generate control To execute an instruction, the CPU must generate control signals corresponding to the current instructionsignals corresponding to the current instruction

TTwo types of approacheswo types of approaches HHardard--wired wired MMicroprogrammedicroprogrammed

Page 25: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

25

Hard-wired Control

CLKClock

Control step

IRencoder

Decoder/

Control signals

codes

counter

inputs

Condition

External

Current instruction

e.g., MFC

e.g., result of previous

computation

For an instruction, many steps are needed as shown

previously

Page 26: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

26

Several non overlapping time slotsSeveral non overlapping time slots (i.e., steps) (i.e., steps) are are required for executing an instructionrequired for executing an instruction

EEach time slot must be long enough for the functions ach time slot must be long enough for the functions specified in the step to be completedspecified in the step to be completed

AAssume all time slots are equalssume all time slots are equal TThe control unit may be based on the use of a counterhe control unit may be based on the use of a counter

driven by CLKdriven by CLK TThe required control signalhe required control signalss are uniquely determined are uniquely determined

byby contentcontentss of the control step counter of the control step counter contentcontentss of the instruction register of the instruction register (i.e., instruction (i.e., instruction

fetched)fetched) contents of the condition code and other status contents of the condition code and other status

flags (e.g. MFC status signal)flags (e.g. MFC status signal) The decoder/encoder is a combinational circuit that The decoder/encoder is a combinational circuit that

generates the required control outputs depending on generates the required control outputs depending on the state of all its inputsthe state of all its inputs

Hard-wired Control

Page 27: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

27

Externalinputs

Encoder

ResetCLK

Clock

Control signals

counter

Run End

Conditioncodes

decoderInstruction

Step decoder

Control step

IR

T1 T2 Tn

INS1

INS2

INSm

Separation of Decoding and Encoding Functions

Page 28: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

28

Diagram with decoding and encoding function separated Diagram with decoding and encoding function separated The step decoder provides a separate signal line for The step decoder provides a separate signal line for

each step in the control sequenceeach step in the control sequence The output of the instruction decoder consists of a The output of the instruction decoder consists of a

separate line for each machine instructionseparate line for each machine instruction All input signals to the encoder block should be All input signals to the encoder block should be

combined to generate individual control signals (e.g. Yin, combined to generate individual control signals (e.g. Yin, PCout, Add, End)PCout, Add, End)

ExamplesExamples

Separation of Decoding and Encoding Functions

Page 29: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

29

Control Signals

BA

Z

ALU

Yin

Y

Zin

Zout

Riin

Ri

Riout

Internal processor bus

Constant 4

MUXSelect

Add

XOR

SubcontrolALU

lines

Page 30: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

30

T1

AddBranch

T4 T6

Generation of the Zin Control Signal

Example encoder structure, Zin = T1 + T6Example encoder structure, Zin = T1 + T6 ·· ADD + TADD + T44 ·· BR BR + ...+ ...

Zin is turned on duringZin is turned on during slot T1 for all instructionsslot T1 for all instructions slot T6 for an ADD instructionslot T6 for an ADD instruction (e.g., Add (R3),R1) (e.g., Add (R3),R1) slot Tslot T44 for a for an unconditionaln unconditional bbranchranch

Zin

Page 31: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

31

Example: Add (R3),R1

Step Action

1 PCout , MARin , Read,Select4,Add, Zin

2 Zout , PCin , WMFC

3 MDRout , IRin

4 R3out , MARin , Read

5 R1out , Yin , WMF C

6 MDRout , SelectY,Add, Zin

7 Zout , R1in , End

Control sequence for instruction Add (R3),R1(Yin at step 2 is there b/c steps 1~3 are common for all instructions)

Yin ,

Page 32: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

32

StepAction

1 PCout, MARin , Read,Select4, Add,Zin

2 Zout, PCin , Yin, WMF C

3 MDRout , IRin

4 Offset-field-of-IRout, Add,Zin

5 Zout, PCin, End

Control sequence for an unconditional branch instruction

Unconditional Branch

Page 33: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

33

Generation of the End Control Signal

Example encoder structure, End = T7 Example encoder structure, End = T7 ·· ADD + T ADD + T55 ·· BR + BR + (T(T55 ·· N + T4 N + T4 ·· N N’’) ) ·· BRN + ... BRN + ...

T7

Add Branch

Branch<0case

T5

End

NN

T4T5

Page 34: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

34

A Complete CPU

Instructionunit

Integerunit

Floating-pointunit

Instructioncache

Datacache

Bus interface

Mainmemory

Input/Output

System bus

Processor

Page 35: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

35

The instruction unit fetches instructions from an instruction The instruction unit fetches instructions from an instruction cache, or from main memory on a cache misscache, or from main memory on a cache miss

SSeparate processing units to deal with integer and floating eparate processing units to deal with integer and floating pointpoint

DData cache is between the processing units and main ata cache is between the processing units and main memorymemory

SSeparate caches for instruction and data (split cache)eparate caches for instruction and data (split cache) OOther processors may have one cache for both data and ther processors may have one cache for both data and

instructionsinstructions (unified cache)(unified cache) The CPU is connected to the system bus (rest of the The CPU is connected to the system bus (rest of the

computer) throughcomputer) through a bus interfacea bus interface

AlternativesAlternatives MMore than two processing units: several units of the same ore than two processing units: several units of the same

typetype to increase parallelismto increase parallelism PProcessors that execute instructions at a rate faster thanrocessors that execute instructions at a rate faster than

one instruction per cycle are called : one instruction per cycle are called : superscalarsuperscalar

A Complete CPU

Page 36: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

36

PCin

PCout

MA

Rin

Rea

d

MD

Rout

IRin

Yin

Sel

ect

Ad

d

Zin

Zout

R1 out

R1in

R3 out

WM

FC

En

d

0

1

0

0

0

0

0

0

0

0

0

0

0

1

1

0

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

1

0

0

0

0

0

1

0

0

1

0

0

0

1

0

0

0

0

0

1

0

0

1

0

0

1

0

0

0

0

0

0

1

0

0

0

0

1

0

1

0

0

0

0

1

0

0

1

0

0

0

0

1

0

0

0

0

1

0

0

0

0

0

0

0

0

1

0

0

0

1

0

0

0

0

1

0

0

1

0

0

Micro -instruction

1

2

3

4

5

6

7

Figure 7.15 An example of microinstructions for Figure 7.6.

Microprogrammed Control Approach

Add (R3), R1Add (R3), R1

steps

Page 37: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

37

Example: Add (R3),R1

Step Action

1 PCout , MARin , Read,Select4,Add, Zin

2 Zout , PCin , WMFC

3 MDRout , IRin

4 R3out , MARin , Read

5 R1out , Yin , WMF C

6 MDRout , SelectY,Add, Zin

7 Zout , R1in , End

Control sequence for instruction Add (R3),R1(Yin at step 2 is there b/c steps 1~3 are common for all instructions)

Yin ,

Page 38: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Datapath

linesData

Addresslines

External Memory Bus

Carry-in

ALU

PC

MAR

MDR

Y

Z

Add

XOR

Sub

IR

TEMP

R0

controlALU

lines

Control signals

R n 1-

Instruction

decoder and

Internal processor bus

control logic

A B

MUXSelect

Constant 4

Page 39: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

39

Basic Organization of a Microprogrammed Control Unit

storeControl

generator

Startingaddress

CW

Clock PC

IR

PCin

PCout

MA

Rin

Rea

dM

DR out

IRin

Y in

Sele

ctA

dd Z in

Z out

R1 out

R1 in

R3 out

WM

FCE

nd

0100000

0000001

1000000

1001000

1001000

0010010

0010000

0100100

1000000

1000010

1000010

0100001

0000100

0000001

0001000

0100100

Micro -instruction

1234567

(index)

Page 40: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

40

Control signals are generated by a program similar to Control signals are generated by a program similar to machine language programsmachine language programs

Individual bits of aIndividual bits of a control word control word ( (CW)CW) correspond to correspond to controlcontrol signals signals

EEach of the control steps defines a unique combination ofach of the control steps defines a unique combination of 1s and 0s in the CW1s and 0s in the CW

MMicroroutineicroroutine: : a a sequence of CWs corresponding to the sequence of CWs corresponding to the control sequence of a control sequence of a single single machine instructionmachine instruction

IIndividual control words are called ndividual control words are called microinstructionsmicroinstructions

Microprogrammed Control

microroutine ≈ subroutinemicroinstruction ≈ instruction

microprogram counter ≈ program counter

Page 41: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

41

Assume that the microroutines for all instructions are Assume that the microroutines for all instructions are stored in special memory called astored in special memory called a control store control store

TThe control unit can generate control signals for anyhe control unit can generate control signals for any instruction by sequentially reading the CWs in the instruction by sequentially reading the CWs in the correspondingcorresponding microroutinemicroroutine

AA microprogram counter (microprogram counter (µµPC)PC) is used to point to the next is used to point to the next microinstructionmicroinstruction

WWhen a new instruction is fetched into IR, the startinghen a new instruction is fetched into IR, the starting address generator loads the starting address of the address generator loads the starting address of the correspondingcorresponding microroutine into the microroutine into the µµPCPC

TThe he µµPC is incremented to access successive PC is incremented to access successive microinstructionsmicroinstructions

Basic Organization of a Microprogrammed Control Unit

Page 42: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

42

Branch Instructions How does the control unit check the status of the condition flags How does the control unit check the status of the condition flags

or status flags on conditional branchesor status flags on conditional branches The microinstruction set needs to be expanded to include The microinstruction set needs to be expanded to include

conditional branch microinstructions conditional branch microinstructions In addition to the branch address, these microinstructions specify In addition to the branch address, these microinstructions specify

the flag or bit that should be checked as a conditionthe flag or bit that should be checked as a condition

Example: Microroutine for the instruction Branch on negativeExample: Microroutine for the instruction Branch on negativeAddressAddress microinstruction . microinstruction .

0 0 PCPCoutout, MAR, MARinin, Read, Select4, Add, Z, Read, Select4, Add, Zinin

1 1 Zout, PCZout, PCinin, Y, Yinin, WMFC , WMFC

2 2 MDRMDRoutout, IR, IRinin

3 3 Branch to starting address of an appropriate microroutineBranch to starting address of an appropriate microroutine.. .. .... .... 25 25 if N=0 then branch to microinstruction 0if N=0 then branch to microinstruction 0

26 26 Offset field of IROffset field of IRoutout, SelectY, Add, Z, SelectY, Add, Zinin

27 27 ZZoutout, PC, PCinin, End, End

After loading the instruction into IR, a branch microinstruction transfers After loading the instruction into IR, a branch microinstruction transfers control to the microroutine starting at location 25control to the microroutine starting at location 25

Page 43: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

43

Allowing Conditional Branch in Microprogram

Controlstore

Clock

generator

Starting andbranch address Condition

codes

inputsExternal

CW

IR

PC

Page 44: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

44

Support for microprogram branchingSupport for microprogram branching Starting and branch address generatorStarting and branch address generator The block loads a new µPC when a microinstruction The block loads a new µPC when a microinstruction

requires a branchrequires a branch Input to the block include: status flags, condition flags, IRInput to the block include: status flags, condition flags, IR The µPC is incremented by one every time except in the The µPC is incremented by one every time except in the

following situationsfollowing situations When a new instruction is loaded into IR, When a new instruction is loaded into IR, µµPC is PC is

loaded with the loaded with the starting address of the microroutine starting address of the microroutine for that instructionfor that instruction

WWhen a branch microinstruction is encountered and hen a branch microinstruction is encountered and the branchthe branch condition is satisfiedcondition is satisfied

WWhen an End microinstruction is encountered: the hen an End microinstruction is encountered: the µµPC is loaded with the first microinstruction PC is loaded with the first microinstruction (i.e., (i.e., address 0) to fetch a new instruction to IRaddress 0) to fetch a new instruction to IR

Allowing Conditional Branch in Microprogram

Page 45: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

45

Implementation of Microinstructions

1st 1st designdesign : Assign one bit position to each control signal : Assign one bit position to each control signal - - Resulting in Resulting in long microinstructionslong microinstructions OOnly few bits are set to 1 in any given microinstructionnly few bits are set to 1 in any given microinstruction EExample of the single bus organizationxample of the single bus organization

4 general purpose registers 4 general purpose registers SSome of the connections to the CPU are permanently ome of the connections to the CPU are permanently

enabled: theenabled: the output of IR to the decoding circuit, the two output of IR to the decoding circuit, the two inputs of the ALUinputs of the ALU

AA total of 20 gating signals are needed total of 20 gating signals are needed AAdditional signals include: Read, Write, Clear Y, Set dditional signals include: Read, Write, Clear Y, Set

Carry-in,Carry-in, WMFC and EndWMFC and End SSignals to specify with ALU operation to perform: 16 ignals to specify with ALU operation to perform: 16

operations operations 16 bits 16 bits Total of 42 bits of control signalsTotal of 42 bits of control signals

Page 46: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

46

An alternative: Encoded control signalsAn alternative: Encoded control signals Most signals are not needed simultaneouslyMost signals are not needed simultaneously Many signals are mutually exclusiveMany signals are mutually exclusive Only one function of the ALU is needed at a timeOnly one function of the ALU is needed at a time Read and write signals to memory cannot be active at the Read and write signals to memory cannot be active at the

same timesame time The source for a data transfer must be unique: cannot gate The source for a data transfer must be unique: cannot gate

the contents of two registers simultaneously on a single busthe contents of two registers simultaneously on a single bus Signals can be grouped so that mutually exclusive signals are Signals can be grouped so that mutually exclusive signals are

placed in the same groupplaced in the same group 4 bits are needed to represent the 16 functions of the ALU4 bits are needed to represent the 16 functions of the ALU Register output control signals can be in a group consisting of Register output control signals can be in a group consisting of

PCPCoutout, MDR, MDRoutout, Z, Zoutout, Address, Addressoutout, R0, R0outout, R1, R1outout, R2, R2outout, R3, R3outout and and TEMPTEMPoutout : encoding with 4 bits : encoding with 4 bits

Control signals can be grouped and encoded to reduce the Control signals can be grouped and encoded to reduce the number of bits in microinstructionsnumber of bits in microinstructions

Microinstructions

Page 47: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

47

Field-encoded Microinstructions

F2 (3 bits)

000: No transfer001: PC

in010: IRin011: Zin100: R0in101: R1

in110: R2in111: R3in

F1 F2 F3 F4 F5

F1 (4 bits) F3 (3 bits) F4 (4 bits) F5 (2 bits)

0000: No transfer0001: PC

out0010: MDRout0011: Zout0100: R0out0101: R1

out0110: R2out0111: R3out1010: TEMPout1011: Offsetout

000: No transfer001: MAR

in010: MDRin011: TEMPin100: Yin

0000: Add0001: Sub

1111: XOR

16 ALUfunctions

00: No action01: Read

10: Write

Microinstruction

F6 F7 F8

F6 (1 bit) F7 (1 bit) F8 (1 bit)

0: SelectY1: Select4

0: No action1: WMFC

0: Continue1: End

Total 20 bitsTotal 20 bits

Page 48: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

48

Most fields must include one inactive code for the case Most fields must include one inactive code for the case where no action is requiredwhere no action is required

No active code is reserved in the ALU;No active code is reserved in the ALU; thus the ALU thus the ALU is is active at all timesactive at all times;; the control on Z the control on Zinin makes sure makes sure that that the result of an operated is gated only when the result of an operated is gated only when appropriateappropriate

GGrouping control signals requires more hardware to rouping control signals requires more hardware to decodedecode bit patternsbit patterns

TThe cost of the additional hardware is he cost of the additional hardware is amortizedamortized by by having having thethe smaller control store smaller control store 

Field-encoded Microinstructions

Page 49: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

49

Microprogram Sequencing

Each machine instruction is implemented by a microroutineEach machine instruction is implemented by a microroutine AA microroutine is entered by decoding an instruction into microroutine is entered by decoding an instruction into a a

starting address that is loaded into the starting address that is loaded into the µµPCPC BBranching capabilities are introduced through branchranching capabilities are introduced through branch

microinstructionsmicroinstructions HHaving a separate microroutine for each machine instructionaving a separate microroutine for each machine instruction

leads to a large control storeleads to a large control store TThere are several instructions and several addressing modeshere are several instructions and several addressing modes OOrganize the microprogram so that microroutines sharerganize the microprogram so that microroutines share as as

mamanny common parts as possibley common parts as possible SSharing common parts requires several branch haring common parts requires several branch

microinstructionsmicroinstructions LLonger time is needed to execute branch microinstructionsonger time is needed to execute branch microinstructions

Page 50: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

50

Example: ADD src, Rdst

AAssume that the source operand can be specified using: ssume that the source operand can be specified using: register, autoincrement, autodecrement, indirect and register, autoincrement, autodecrement, indirect and indirect formindirect formss of all of these modes of all of these modes

AA suitable microprogram will combine all the modes suitable microprogram will combine all the modes See next slideSee next slide

Page 51: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

51

Page 52: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

52

Branch address modification using bitBranch address modification using bit--ORingORing Branches are not always made to a single branch addressBranches are not always made to a single branch address AA direct consequence of combining microroutines direct consequence of combining microroutines At the point At the point αα of the previous example of the previous example, it is necessary to choose, it is necessary to choose between the between the

actions required by direct and indirect addressingactions required by direct and indirect addressing modesmodes Indirect mode: microinstruction at location 170 (fetch Indirect mode: microinstruction at location 170 (fetch an an operandoperand from memory)from memory) Direct mode: microinstruction at location 171 (fetchDirect mode: microinstruction at location 171 (fetchinging an an operand is bypassed)operand is bypassed) EEfficient branching: Bitfficient branching: Bit--ORing technique ORing technique

havhavinging the preceding instruction specify 170 the preceding instruction specify 170 use an OR gate to change the least significant bit use an OR gate to change the least significant bit of 170 ifof 170 if direct addressing direct addressing

mode mode 

Wide Branch AddressingWide Branch Addressing GGenerating branch addresses means that the circuitry becomes moreenerating branch addresses means that the circuitry becomes more complexcomplex

E.E.g.g.,, the machine instruction fetch is completed the machine instruction fetch is completed,, and and anan appropriate appropriate microroutinemicroroutine should be selected according to addressing modes should be selected according to addressing modes

AA simple and inexpensive way of generating required branch addresses is using simple and inexpensive way of generating required branch addresses is using a PLAa PLA

The OP code of The OP code of aa machine instruction is translated machine instruction is translated ininto a startingto a starting addressaddress

Branch Addressing

Page 53: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

53

Address Address MMicroinstructionicroinstructionss(octal) (octal)

000 000 PCPCoutout, MAR, MARinin, Read, Clear Y, Set carry-in, Add, Z, Read, Clear Y, Set carry-in, Add, Zinin

001001 ZZoutout, PC, PCinin, WMFC , WMFC

002 002 MDRMDRoutout, IR, IRinin

003 003 BranchBranch PC PC 101 (from instruction decoder); 101 (from instruction decoder); PC{5,4} PC{5,4} [IR_{10,9}]; [IR_{10,9}];

PCPC33 [IR [IR1010]' . [IR]' . [IR99]]’’ . [IR . [IR88]]’’

121 121 RsrcRsrcoutout, MAR, MARinin, Read, Clear Y, Set carry-in, Add, Z, Read, Clear Y, Set carry-in, Add, Zinin

122 122 ZZoutout, Rsrc, Rsrcinin

123 123 Branch { Branch { PC PC 170 ; 170 ; PCPC00 [IR [IR88]]’’}, WFMC }, WFMC

170170 MDRMDRoutout, MAR, MARinin, Read, WMFC , Read, WMFC

171171 MDRMDRoutout, Y, Yinin

172 172 RdstRdstoutout, Add, Z, Add, Zinin

173 173 ZZoutout, Rdst, Rdstinin, End, End

Detailed Example: ADD (Rsrc)+, Rdst

Page 54: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

54

3 bit field used to specify the addressing mode for the 3 bit field used to specify the addressing mode for the source operandsource operand

Bits 10 and 9 denote indexed (11), autodecrement (10), Bits 10 and 9 denote indexed (11), autodecrement (10), autoincrement (01),autoincrement (01), and register modes (00)and register modes (00)

Bit 8 is used to specify the indirect version of the Bit 8 is used to specify the indirect version of the addressing modeaddressing mode

EE..g.g.,, 010: direct version of the autoincrement 010: direct version of the autoincrement AAssume CPU has 16 registers that can be used for ssume CPU has 16 registers that can be used for

addressing purposesaddressing purposes Bits 7 through 4 specify the source operandBits 7 through 4 specify the source operand Bits 3 through 0 specify the destination operandBits 3 through 0 specify the destination operand

Detailed Example: ADD (Rsrc)+, Rdst

Page 55: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

55

Any of the 16 general purpose registers may be Any of the 16 general purpose registers may be involved in determining the source and destination involved in determining the source and destination operandsoperands

Microinstructions refer to control signals only Microinstructions refer to control signals only as Rsrcas Rsrcoutout, , RscrRscrinin, Rdst, Rdstoutout and Rdst and Rdstinin

TThese signals are translated hese signals are translated ininto a specific register by to a specific register by the decoding circuit connected to Rsrcthe decoding circuit connected to Rsrc and and Rdst Rdst address fields of IRaddress fields of IR

RRequires a two level decodingequires a two level decoding TThe microinstruction field must be decoded to he microinstruction field must be decoded to

determine that andetermine that an Rsrc or Rdst is involvedRsrc or Rdst is involved TThe decoded output is used to gate the contents of he decoded output is used to gate the contents of

the Rsrc or Rdstthe Rsrc or Rdst field in the IR into a second decoder field in the IR into a second decoder which produces the gating signals for the actual which produces the gating signals for the actual registers R0 through R15registers R0 through R15

Detailed Example: ADD (Rsrc)+, Rdst

Page 56: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

56

Consider Address 123:Consider Address 123:

123 123 Branch {Branch {PC PC 170 ; 170 ; PCPC00 [IR [IR88]’}, WFMC ]’}, WFMC unmodified version causes a branch to location 170unmodified version causes a branch to location 170

WWhen a direct addressing mode appears, the fetch is bypassed by ORinhen a direct addressing mode appears, the fetch is bypassed by ORingg th the e inverse of the indirect bit inverse of the indirect bit inin the src address (bit 8 of IR) with the 0 bi the src address (bit 8 of IR) with the 0 bit t position position of the of the PCPC

003 003 BranchBranch {{PC PC 101 (from 101 (from IInstruction decoder)nstruction decoder); ; PCPC5,45,4[IR[IR10,910,9]; ];

PCPC33 [IR [IR1010]’]’..[IR[IR99]’ . [IR]’ . [IR88]} ]}

TThe five branch addresses differ in the middle octal digit onlyhe five branch addresses differ in the middle octal digit only TThe octal pattern 101 is obtained from the PLAhe octal pattern 101 is obtained from the PLA TThe 3 bits to be ORed with the middle octal digit are supplied bhe 3 bits to be ORed with the middle octal digit are supplied by y the decoding the decoding

circuitry connected to the src address mode field (bits 8,circuitry connected to the src address mode field (bits 8, 9 and 19 and 10 0 of IR)of IR) BBits 4 and 5 of the its 4 and 5 of the PC are set directly from bits 9 and 10 of IRPC are set directly from bits 9 and 10 of IR TThese bits select the appropriate microinstruction for all srchese bits select the appropriate microinstruction for all src address except the address except the

register indirect moderegister indirect mode RRegister indirect mode: set bit 3 of egister indirect mode: set bit 3 of PC to 1 using thPC to 1 using the e AND of [IRAND of [IR1010]’ , [IR]’ , [IR99]’ and ]’ and

[IR[IR88]]

Detailed Example: ADD (Rsrc)+, RdstUsing Bit-Oring Scheme

Page 57: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

57

The previous microprogram requires several branch The previous microprogram requires several branch microinstructionsmicroinstructions

This reduces the operating speed of the computerThis reduces the operating speed of the computer AA powerful alternative is to include an address field as a part of powerful alternative is to include an address field as a part of

every microinstruction to indicate the location of the next every microinstruction to indicate the location of the next microinstructionmicroinstruction

Thus, every microinstruction becomes a branchThus, every microinstruction becomes a branch AAdvantages: flexibilitydvantages: flexibility DDisadvantages: expense of the additional bits for the address fieldisadvantages: expense of the additional bits for the address field TyTypical microprogram: 4k microinstructions with 50 to 80 bitspical microprogram: 4k microinstructions with 50 to 80 bits per per

microinstruction microinstruction 12 bit address field is needed 12 bit address field is needed

Microinstructions with Next-Address Field

Page 58: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

58

Microinstruction Sequencing

Conditioncodes

IR

Decoding circuits

Control store

Next address

Microinstruction decoder

Control signals

InputsExternal

AR

I R

Page 59: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

59

Advantage: separate branch microinstructions are Advantage: separate branch microinstructions are virtually eliminated, makes this scheme very attractivevirtually eliminated, makes this scheme very attractive

The µPC is replaced by a microinstruction address register The µPC is replaced by a microinstruction address register (µAR) (µAR)

The µAR holds the address of the next microinstructionThe µAR holds the address of the next microinstruction

A new control structure that supports next address field A new control structure that supports next address field and bit-ORingand bit-ORing

TThe decoding circuit includes a PLA decoder that ishe decoding circuit includes a PLA decoder that is used used to generate the starting address of a given microroutineto generate the starting address of a given microroutine on the basis of the OP code field in the IRon the basis of the OP code field in the IR

Microinstruction Sequencing

Page 60: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

60

Example : ADD (Rsrc)+, RdstExample : ADD (Rsrc)+, Rdst Rsrc and Rdst are used instead of referring to register R0 Rsrc and Rdst are used instead of referring to register R0

through R15 explicitly through R15 explicitly Actual control signals can be decoded using the data in the Actual control signals can be decoded using the data in the

src and dst fields of IRsrc and dst fields of IR

MMicroinstruction 003icroinstruction 003 BBit-Oring is used to determine the next instruction based on it-Oring is used to determine the next instruction based on

thethe addressing mode of the source operandaddressing mode of the source operand TThe addressing mode is indicated by bits 8,9 and 10 of IRhe addressing mode is indicated by bits 8,9 and 10 of IR LLet ORet ORmodemode control whether or not this bit-ORing is used control whether or not this bit-ORing is used

MMicroinstructions 123, 143, and 166icroinstructions 123, 143, and 166 BBit-ORing is used to decide if indirect addressing of the sourceit-ORing is used to decide if indirect addressing of the source

operand is usedoperand is used ORORindsrcindsrc signal is used for this purpose signal is used for this purpose

Microprogram Sequencing

Page 61: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

61

Format for Microinstructions

F1 (3 bits)

000: No transfer001: PCout010: MDRout011: Zout100: Rsrcout101: Rdstout110: TEMPout

F0 F1 F2 F3

F0 (8 bits) F2 (3 bits) F3 (3 bits)

000: No transfer001: PCin010: IRin011: Zin100: Rsrcin

000: No transfer001: MARin

Microinstruction

Address of nextmicroinstruction

101: Rdstin

010: MDRin011: TEMPin100: Yin

F4 F5 F6 F7

F5 (2 bits)F4 (4 bits) F6 (1 bit)

0000: Add0001: Sub

0: SelectY1: Select4

00: No action01: Read

1111: XOR

10: Write

F7 (1 bit)

0: No action1: WMFC

F8 F9 F10

F8 (1 bit) F9 (1 bit) F10 (1 bit)

0: No action1: ORindsrc

0: No action1: ORmode

0: NextAdrs1: InstDec

Page 62: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

62

PLA PLA  PLA is used initially to decode the instruction OP codesPLA is used initially to decode the instruction OP codes OOne bit in the microinstruction is used to indicate ne bit in the microinstruction is used to indicate

whenwhen the output of the PLA is gated into the the output of the PLA is gated into the µµARAR Address fieldAddress field

Each microinstruction contains an 8 bit address field Each microinstruction contains an 8 bit address field that holds the address of the next microinstructionthat holds the address of the next microinstruction

Format for Microinstructions

Page 63: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

63

Implementation of the Microroutine

1

01

111100111110

001

001

1

21 0

00

0

00

0

0

0

0

0

0

0

0

0

0

0 0

0

0

00

0 0

0101

110

37

7

00000000

0 1111

110

0

0

01707

F9

0

00

0

0

0

F10

0

0

0

00

0

00

0

00

0

0

0

F8F7F6F5F4

000 0 0 0 0 0

0

0

00

0

100

0

00

0

00

0

00

0 1

1

0

00 0

1

0

0

0

10000

0000

1100000

10

0

0

0

0

0

0

1

0 0

0

0

0

00

00 01

000000

001

110

100

10

F2

1

110 0 0 0 0 0

11

221

011110

111 00

1

12

0

21

0

00

addressOctal

111 00000

1 0000000

10000000

F0 F1

0

0 0 10 0

010010

0 11

001

110

100

0

0

0

1

1

0

1

F3

011000 0 0 0 0 00 00 00000 0 0 0 0 030 0 00 0 0

Page 64: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

64

Microroutine for ADD (Rsrc)+, RdstMicroroutine for ADD (Rsrc)+, Rdst FFewer microinstructions are needed because branch ewer microinstructions are needed because branch

microinstructions are no longer requiredmicroinstructions are no longer required LLocations 003 and 123 have been combined with theocations 003 and 123 have been combined with the

microinstructions immediately preceding themmicroinstructions immediately preceding them WWhen microinstruction sequencing is controlled byhen microinstruction sequencing is controlled by a a µµPC, PC,

the End signal is used to reset the the End signal is used to reset the µµPCPC to point to the to point to the starting address of the microroutine that fetches the starting address of the microroutine that fetches the next machine instructionnext machine instruction

WiWith this scheme, the End signal is specifiedth this scheme, the End signal is specified explicitly in explicitly in the F0 fieldthe F0 field

Implementation of the Microroutine

Page 65: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

Circuitry for the control signals using the next address field

Details of bit-ORing circuitry for the control signals using the next address field

decoderMicroinstruction

Control store

Next address F1 F2

Other control signals

F10F9F8

Decoder

Decoder

circuitsDecoding

Condition

External

codes

inputs

Rsrc RdstIR

Rdstout

Rdstin

Rsrcout

Rsrcin

AR

InstDecout

ORmode

ORindsrc

R15in R15out R0in R0out

Page 66: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

66

Prefetching Microinstructions Drawback of the microprogrammed control: slow Drawback of the microprogrammed control: slow

operating speedoperating speed FFetching microinstructions from the control store takes etching microinstructions from the control store takes

a long timea long time Fast control storeFast control store LLong microinstructionsong microinstructions PPrefetchingrefetching

PProblems with prefetchingroblems with prefetching NNext microinstruction may depend ext microinstruction may depend onon the status the status

flags and results offlags and results of a a current microinstructioncurrent microinstruction PPrefetch refetch a a wrong microinstructionwrong microinstruction FFetch must be repeated with etch must be repeated with aa correct address correct address

DDisadvantages are minor and prefetching is often usedisadvantages are minor and prefetching is often used

Page 67: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

67

Emulation

Microprogrammed control provides simple, flexible, and Microprogrammed control provides simple, flexible, and inexpensive way of executing machine instructionsinexpensive way of executing machine instructions

Allows diverse classes of instructions to be Allows diverse classes of instructions to be implementedimplemented

IIt is possible to define additional machine instructionst is possible to define additional machine instructions and implement them with microroutineand implement them with microroutine

WWe can add e can add anan instruction set of a different computer instruction set of a different computer AA given computer can emulate instruction given computer can emulate instructionss of a of a

differentdifferent computercomputer NNo software changes need to be made to legacy o software changes need to be made to legacy

programsprograms EEmulation facilitates transition to a new computer mulation facilitates transition to a new computer

system with minimal effortsystem with minimal effort Example: Pentium 4 translates X86 CISC instructions Example: Pentium 4 translates X86 CISC instructions

into its RISC microinstructions insideinto its RISC microinstructions inside

Page 68: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

68

Pentium 4

s

Page 69: Chapter 7 Processing Unit Processing Unit Processing Unit  Datapath  Internal Bus Architecture  Internal Processing Hard-wiredHard-wired Microinstruction

69

Conclusion

Speed: hardwired approachSpeed: hardwired approach FFlexibility: microprogrammedlexibility: microprogrammed MMost present day processors use hardwiredost present day processors use hardwired