chapter 11 cpu structure and function

YonseiYonsei UniversityUniversity

Chapter 11Chapter 11

CPU StructureCPU Structureand Functionand Function

YonseiYonsei UniversityUniversity11-2

• Processor organization• Register organization• Instruction cycle• The Pentium processor• The PowerPC processor

Contents Contents


• Fetch instructions– CPU reads an instruction from memory

• Interpret instructions– The instruction is decoded to determine what

action is required• Fetch data

– The execution may require reading data from memory or an I/O module

• Process data– The execution may require performing arithmetic

or logical operation on data• Write data

– The result of an execution may require writing data to memory or I/O module

CPU StructuresCPU Structures Processor organizationProcessor organization


CPU With The System BusCPU With The System Bus Processor organizationProcessor organization


Internal Structure of the CPUInternal Structure of the CPU Processor organizationProcessor organization


• User-visible registers– Enable to minimize main memory references

• Control and status registers– Enable the control unit to control the operation of

the CPU– Enable OS programs to control the execution of

programs

RegistersRegisters Register organizationRegister organization


• General Purpose• Data• Address• Condition Codes

User Visible RegistersUser Visible Registers Register organizationRegister organization


• Can be assigned to a variety of functions• May be true general purpose• May be restricted• May be used for data or addressing

General Purpose RegistersGeneral Purpose Registers Register organizationRegister organization


• Accumulator• Only to hold data• Cannot be employed in the calculation of an

operand address

Data RegistersData Registers Register organizationRegister organization


• Segment• May be somewhat general purpose• May be devoted to a particular addressing

mode

Address RegistersAddress Registers Register organizationRegister organization


• Segment pointers– In a machine with segmented addressing, it holds

the address of the base of the segment– There may be multiple registers

• Index registers– Used for indexed addressing– May be autoindexed

• Stack pointer– If there is user-visible stack addressing, then

typically the stack is in memory and there is a dedicated register that points to the top of the stack

– This allows implicit addressing

Examples of Address RegistersExamples of Address Registers Register organizationRegister organization


• Whether to use completely general purpose registers or to specialize their use

• The number of registers, either general purpose or data plus address, to be provided

• Register length– Register that must hold addresses obviously

must be at least long enough to hold the largest address

Design Issues Design Issues Register organizationRegister organization


• Sets of individual bits that set by the CPU hardware as the result of operations– e.g. result of last operation was zero

• Condition code bits are collected into one or more registers

• Usually form part of a control register• Can be read (implicitly) by programs

– e.g. Jump if zero

• Cannot be altered by programmers

Condition Code RegistersCondition Code Registers Register organizationRegister organization


• CPU registers that are employed to control the operation of the CPU

• Program Counter– Contains the address of an instruction to be fetched

• Instruction Register– Contains the instruction most recently fetched

• Memory Address Register– Contains the address of a location in memory

• Memory Buffer Register– Contains a word of data to be written to memory or

the word most recently read

Control & Status RegistersControl & Status Registers Register organizationRegister organization


• Status Information• Conditional code plus other status information

Program Status WordProgram Status Word Register organizationRegister organization


• Sign : Contains the sign bit of the result of the last arithmetic operation

• Zero : Set when the result is 0

• Carry : Set if an operation resulted in a carry into or borrow out of a high-order bit

• Equal : Set if a logical compare result is equality

• Overflow : Used to indicate arithmetic overflow

• Interrupt enable/disable : Used to enable or disable interrupts

• Supervisor : Indicate whether the CPU is executing in supervisor or user mode

Flag of PSW Flag of PSW Register organizationRegister organization


• Pointers to a block of memory containing additional status information (Process control blocks)

• Interrupt Vectors register• System stack pointer

– If a stack is used, a system stack pointer is needed

• Page table pointer– In virtual memory system

• Registers for the control of I/O operations

Other RegistersOther Registers Register organizationRegister organization


• Operating system support– Certain types of control information are of

specific utility to the operating system

• Allocation of control information between registers and memory– Common to dedicate thousands words of

memory for control purposes– How much control information should be in

registers and how much in memory

Design IssuesDesign Issues Register organizationRegister organization


MC68000MC68000 Register organizationRegister organization


80868086 Register organizationRegister organization


80386 80386 –– Pentium IIPentium II Register organizationRegister organization


Instruction CycleInstruction Cycle Instruction cycleInstruction cycle


• May require memory access to fetch operands

• Indirect addressing requires more memory accesses

• Can be thought of as additional instruction subcycle

Indirect CycleIndirect Cycle Instruction cycleInstruction cycle


Instruction CycleInstruction Cycle Instruction cycleInstruction cycle


• Alternating instruction fetch and instruction execution activities

• After an instruction is fetched, examine if any indirect addressing is involved– If so, the required operands are fetched

• Following execution, an interrupt may be processed before the next instruction fetch

Instruction Cycle with Indirect Instruction Cycle with Indirect Instruction cycleInstruction cycle


Instruction Cycle State DiagramInstruction Cycle State Diagram Instruction cycleInstruction cycle


• Depends on CPU design• In general:• Fetch

– PC contains address of next instruction– Address moved to MAR– Address placed on address bus– Control unit requests memory read– Result placed on data bus, copied to MBR, then to

IR– Meanwhile PC incremented by 1

Data Flow Data Flow -- Instruction FetchInstruction Fetch Instruction cycleInstruction cycle


Data Flow Data Flow -- Fetch CycleFetch Cycle Instruction cycleInstruction cycle


• IR is examined• If indirect addressing, indirect cycle is

performed– Right most N bits of MBR transferred to MAR– Control unit requests memory read– Result (address of operand) moved to MBR

Data Flow Data Flow -- Data FetchData Fetch Instruction cycleInstruction cycle


Data Flow Data Flow -- Indirect CycleIndirect Cycle Instruction cycleInstruction cycle


• May take many forms• Depends on instruction being executed• May include

– Memory read/write– Input/Output– Register transfers– ALU operations

Data Flow Data Flow -- ExecuteExecute Instruction cycleInstruction cycle


• Simple and Predictable• Current PC saved to allow resumption after

interrupt• Contents of PC copied to MBR• Special memory location (e.g. stack pointer)

loaded to MAR• MBR written to memory• PC loaded with address of interrupt handling

routine• Next instruction (first of interrupt handler)

can be fetched

Data Flow Data Flow -- InterruptInterrupt Instruction cycleInstruction cycle


Data Flow Data Flow -- Interrupt CycleInterrupt Cycle Instruction cycleInstruction cycle


TwoTwo--stage Instruction Pipelinestage Instruction Pipeline Instruction pipeliningInstruction pipelining


• Fetch accessing main memory• Execution usually does not access main

memory• Can fetch next instruction during execution

of current instruction• Called instruction prefetch or fetch overlap

Instruction Pipelining Instruction Pipelining -- PrefetchPrefetch Instruction pipeliningInstruction pipelining


• Doubling of execution ratio is unlikely :– Fetch usually shorter than execution

• Prefetch more than one instruction?

– Any jump or branch means that prefetchedinstructions are not the required instructions

• To reduce the time loss, when a conditional branch instruction is passed, fetch stage fetches the next instruction in memory after branch instruction

• If the branch is not taken, no loss• Else, the fetched instruction must be discarded and a

new instruction fetched

• Add more stages to improve performance

Improved PerformanceImproved Performance Instruction pipeliningInstruction pipelining


• Fetch instruction (FI)– Read the next expected instruction into a buffer

• Decode instruction (DI)– Determine the opcode and the operand specifiers

• Calculate operands (CO)– Calculate effective address of each source

operand• Fetch operands (FO)

– Fetch each operand from memory

• Execute instruction (EI)– Perform the operation and store the result

• Write operand (WR)

PipeliningPipelining Instruction pipeliningInstruction pipelining


Timing of PipelineTiming of Pipeline Instruction pipeliningInstruction pipelining


• Six stages of the pipeline– This will not always be the case

• All stages can be performed in parallel– Particularly assumed that there is no memory

conflict– The desired value may be in cache : Memory

conflict won’t slow down the pipeline

Timing of PipelineTiming of Pipeline Instruction pipeliningInstruction pipelining


• Memory conflict if the cache is not used• Unequal duration of stages

– There will be some waiting involved at stages

• The conditional branch instruction can invalidate several instruction fetches– A similar unpredictable event is an interrupt

Limiting FactorsLimiting Factors Instruction pipeliningInstruction pipelining


Conditional Branch in a PipelineConditional Branch in a Pipeline Instruction pipeliningInstruction pipelining


• The CO stage depends on the contents of a register that could be altered by a previous instruction that is still in the pipeline– Other such register and memory conflicts could

occur

Limiting FactorsLimiting Factors Instruction pipeliningInstruction pipelining


66--stage CPU Instruction Pipelinestage CPU Instruction Pipeline Instruction pipeliningInstruction pipelining


• At each stage, overhead is involved in moving data from buffer to buffer and in performing various preparation and delivery functions– This overhead can lengthen the total execution

time of a single instruction– This is significant when sequential instructions are

logically dependent, either through heavy use of branching or through memory access dependencies

• The amount of control logic increases enormously with the number of stages– The logic controlling the gating between stages is

more complex than the stages being controlled

Speed & The Number of StagesSpeed & The Number of Stages Instruction pipeliningInstruction pipelining


• Cycle time– Time needed to advance a set of instructions one

stage through the pipeline

= maximum stage delay= number of stages in the instruction pipeline= time delay of a latch

Pipeline Performance Pipeline Performance

? ? dd mi ???? ??? max ki ??1

m?kd

Instruction pipeliningInstruction pipelining


• In general, the time delay d is equivalent to a clock pulse and

• Suppose that n instructions are processed– Total time required

– Speedup factor

Pipeline PerformancePipeline Performance

kT

dm ???

? ??)1( ??? nkTk

? ? )1()1(1

???

????

nknk

nknk

TT

Sk

k?

?

Instruction pipeliningInstruction pipelining


• Number of instructions

Speedup FactorsSpeedup Factors Instruction pipeliningInstruction pipelining


• Number of stages

Speedup FactorsSpeedup Factors Instruction pipeliningInstruction pipelining


• Multiple Streams• Prefetch Branch Target• Loop buffer• Branch prediction• Delayed branching

Dealing with BranchesDealing with Branches Instruction pipeliningInstruction pipelining


• Replicate the initial portions of the pipeline and allow the pipeline to fetch both instructions, making use of two streams

• Two problems with this approach– Contention delays for access to the registers and

to memory– Additional branch instructions may enter the

pipeline before original branch decision is resolved • Each such instruction needs an additional

stream

• Despite these drawbacks, this strategy can improve performance

Multiple StreamsMultiple Streams Instruction pipeliningInstruction pipelining


• The target of the branch is prefetched in addition to instructions following branch

• Keep the target until the branch is executed• If the branch is taken, the target has already

been prefetched• Used by IBM 360/91

PrefetchPrefetch Branch TargetBranch Target Instruction pipeliningInstruction pipelining


• A small, very-high-speed memory maintained by fetch stage of the pipeline and containing the n most recently fetched instructions, in sequence

• If a branch is to be taken, the hardware first checks whether the branch target is within the buffer

• If so, the next instruction is fetched from the buffer

• Very good for small loops or jumps

Loop BufferLoop Buffer Instruction pipeliningInstruction pipelining


• With the use of prefetching, the loop buffer will contain some instruction sequentially ahead of the current instruction fetch address

• If a branch occurs to a target a few locations ahead of the address of the branch instruction, the target will already be in the buffer– Useful for the rather common occurrence of IF-

THEN and IF-THEN-ELSE sequences

• Well suited to dealing with loops, or iterations

Benefits of Loop BufferBenefits of Loop Buffer Instruction pipeliningInstruction pipelining


Loop BufferLoop Buffer Instruction pipeliningInstruction pipelining


• Static approaches– Do not depend on the execution history up to the

time of the conditional branch instruction• Predict never taken• Predict always taken• Predict by opcode

• Dynamic approaches– Do depend on the execution history– Improve the accuracy of prediction by recording

the history of conditional branch instructions• Taken/not taken switch• Branch history table

Branch PredictionBranch Prediction Instruction pipeliningInstruction pipelining


• Predict never taken– Assume that jump will not happen– Always fetch next instruction – 68020 & VAX 11/780– VAX will not prefetch after branch if a page fault

would result (O/S v CPU design)

• Predict always taken– Conditional branches are taken more than 50% – Assume that jump will happen– Always fetch target instruction

• Predict by Opcode– Some instructions are more likely to result in a

jump than others

Static ApproachesStatic Approaches Instruction pipeliningInstruction pipelining


• Taken/Not taken switch– History bits : One or more bits can be associated

with each conditional branch instruction that reflects the recent history of the instruction

– History bits are kept in temporary high-speed storage

• Associate some history bits with any conditional branch instruction that is in a cache

– When the instruction is replaced in the cache, its history is lost

• Maintain a small table for recently executed branch instruction with one or more bits in each entry

Dynamic ApproachesDynamic Approaches Instruction pipeliningInstruction pipelining


• Taken/Not taken switch (with a single bit)– With a single bit, it’s only recorded whether the

last execution of this instruction resulted in a branch or not

• Used in the case of a conditional branch instruction that is almost always taken, such as a loop instruction

• Error in prediction will occur twice : once on entering the loop and once on exiting



• Taken/Not taken switch (with two bits)– With two bits, it can be recorded the result of the

last two instances of the execution of the associated instruction and a state in some other fashion

• If the last two branches of the given instruction have taken the same path, the prediction is to take that path again

• If the prediction is wrong, it remains the same the next time the instruction is encountered

• If the prediction is wrong again, the next prediction will be to select the opposite path

Dynamic Approaches Dynamic Approaches Instruction pipeliningInstruction pipelining


Branch Prediction State DiagramBranch Prediction State Diagram Instruction pipeliningInstruction pipelining


• Drawback of the use of history bits– If the decision is made to take the branch, the

target instruction cannot be fetched until the target address is decoded

– Greater efficiency could be achieved if the instruction fetch could be initiated as soon as the branch is made



• Branch history table– A small cache memory associated with the

instruction fetch stage of the pipeline– Each entry in the table consists

• The address of a branch instruction• Some number of history bits that record the

state of use of that information• Information about the target instruction

– This yields a shorter instruction fetch time, but a greater table compared with storing the target address



Predict Never Taken StrategyPredict Never Taken Strategy Instruction pipeliningInstruction pipelining


Branch History TableBranch History Table Instruction pipeliningInstruction pipelining


• Possible to improve pipeline performance by automatically rearranging instructions so that branch instructions occur later than actually desired

Delayed BranchDelayed Branch Instruction pipeliningInstruction pipelining


• Fetch : instructions are fetched from the cache or from the external memory and placed into one of the two 16-byte prefetch buffers

• Decode stage1 : All opcode and addressing-mode information is decoded

• Decode stage2 : This stage expands each opcode into control signals for ALU

• Execute : This stage includes ALU operations, cache access and register update

• Write back : If needed, this stage updates registers and status flags modified during the preceding execution stage

Intel 80486 PipeliningIntel 80486 Pipelining Instruction pipeliningInstruction pipelining


• No delay introduced into the pipeline when a memory access is required

No Data Delay in the PipelineNo Data Delay in the Pipeline Instruction pipeliningInstruction pipelining


• A delay for values used to compute memory address

Pointer Load DelayPointer Load Delay Instruction pipeliningInstruction pipelining


• The processor accesses the cache in the EX stage of the first instruction and stores the value retrieved in the register during the WB stage

• The next instruction needs that register in the D2 stage

• When the D2 stage lines up with the WB stage of the previous instruction, bypass signal paths allow the D2 stage to have access to the same data being used by the WB stage for writing, saving one pipeline stage

Pointer Load DelayPointer Load Delay Instruction pipeliningInstruction pipelining


• Assume that the branch is taken

Branch Instruction TimingBranch Instruction Timing Instruction pipeliningInstruction pipelining


• The compare instruction updates condition codes in the WB stage and bypass paths make this available to the EX stage of the jump instruction at the same time

• In parallel, the processor runs a speculative fetch cycle to the target of the jump during the EX stage of the jump instruction

• If the processor determines a false branch condition, it discards this prefetch and continues execution with the next sequential instruction

Branch Instruction TimingBranch Instruction Timing Instruction pipeliningInstruction pipelining


Pentium II Processor RegistersPentium II Processor Registers Pentium processorPentium processor


EFLAGS RegisterEFLAGS Register Pentium processorPentium processor


• 6 condition codes• 7 flags that may be referred to as control bits

– Trap flag (TF)– Interrupt enable flag (IF)– Direction flag (DF)– I/O privilege flag (IOPL)– Resume flag (RF)– Alignment check (AC)– Identification flag (ID)

EFLAGS RegisterEFLAGS Register Pentium processorPentium processor


Control RegisterControl Registerss Pentium processorPentium processor


• Flags– Protection enable (PE)– Monitor coprocessor (MP)– Emulation (EM)– Task switched (TS)– Extension type (ET)– Numeric error (NE)– Write protect (WP)– Alignment mask (AM)– Not write through (NW)– Cache disable (CD)– Paging (PG)

Control RegistersControl Registers Pentium processorPentium processor


• Nine additional control bits– Virtual-8086 mode extension– Protected-mode virtual interrupts– Time stamp disable– Debugging extensions– Page size extensions– Physical address extension– Machine check enable– Page global enable– Performance counter enable

Control Register 4 (CR4)Control Register 4 (CR4) Pentium processorPentium processor


MMX RegistersMMX Registers Pentium processorPentium processor


• For MMX operations, the floating-point registers are accessed directly

• The first time that an MMX instruction is executed after any floating-point operations, the FP tag word is marked valid

• The EMMS(Empty MMX state) instruction sets bits of the FP tag word to indicate that all registers are empty– The programmer insert this instruction at the end

of an MMX code block so that subsequent FP operations function properly

• When a value is written to an MMX register, bits[79:64] of the corresponding register are set to all ones

Features of MMX RegistersFeatures of MMX Registers Pentium processorPentium processor


• Interrupts and exceptions

• Interrupt vector table

• Interrupt handling

Interrupt ProcessingInterrupt Processing Pentium processorPentium processor


• Generated by a signal from hardware• May occur at random times during the

execution of a program

• Two sources of interrupts– Maskable interrupts

• Processor doesn’t recognize a maskableinterrupt unless the interrupt enable flag(IF) is set

– Nonmaskable interrupts• Recognition of such interrupts cannot be

prevented

InterruptsInterrupts Pentium processorPentium processor


• Generated from software• Provoked by the execution of an instruction

• Two sources of exceptions– Processor-detected exceptions

• Results when the processor encounters an error while attempting to execute an instruction

– Programmed exceptions• These are instructions that generate an

exception

ExceptionsExceptions Pentium processorPentium processor


Exception and Interrupt VectorException and Interrupt Vector Pentium processorPentium processor


• Every type of interrupt is assigned a number– This number is used to index into the interrupt

vector table

• If more than one exception or interrupt is pending, the processor services them in a predictable order

• The order of priority– Class1 : Traps on the previous instruction– Class2 : External interrupts– Class3 : Faults from fetching next instruction– Class4 : Faults from decoding the next instruction– Class5 : Faults on executing an instruction

Interrupt Vector TableInterrupt Vector Table Pentium processorPentium processor


• When an interrupt occurs and is recognized by the processor– If the transfer involves a change of privilege level, the

current stack segment register and the current extended stack pointer register are pushed onto the stack

– The current value of the EFLAGS register is pushed onto the stack

– Both the interrupt and trap flags are cleared– The current code segment pointer and the current

instruction pointer are pushed onto the stack– If the interrupt is accomplished by an error code, the error

code is pushed onto the stack– The interrupt vector contents are fetched and loaded into

the CS and IP or EIP registers

Interrupt HandlingInterrupt Handling Pentium processorPentium processor


PowerPC G3 Block DiagramPowerPC G3 Block Diagram Pentium processorPentium processor


UserUser--Visible RegistersVisible Registers PowerPC processorPowerPC processor


• Fixed-point unit– General

• 32 64-bit general-purpose registers• These may be used to load, store and

manipulate data operands and also be used for register indirect addressing

– Exception register• Includes 3 bits that report exceptions in integer

arithmetic operations

Register OrganizationRegister Organization PowerPC processorPowerPC processor


PowerPC Register FormatsPowerPC Register Formats PowerPC processorPowerPC processor


• Floating-point unit contains additional user-visible registers– General

• 32 64-bit general-purpose registers• These may be used for all floating-point

operations– Floating-point status and control register

• Includes bits that control the operation of the floating-point unit and bits that record the status resulting from floating-point operations



FP Status and Control RegisterFP Status and Control Register PowerPC PowerPC processorprocessor


• Branch processing unit– Conditional register

• 8 4-bit condition code fields– Link register

• Can be used in a conditional branch instruction for indirect addressing of the target address

• Also used for call/return behavior– Count

• The count register can be used to control an iteration loop



• Interpretation of Bits in Condition Register

Condition RegisterCondition Register PowerPC processorPowerPC processor


PowerPC Interrupt TablePowerPC Interrupt Table PowerPC processorPowerPC processor


• Machine state register– Fundamental to the interruption of a program is

the ability to recover the state of the processor at the time of the interrupt

Interrupt ProcessingInterrupt Processing PowerPC processorPowerPC processor


Machine State RegisterMachine State Register PowerPC processorPowerPC processor


• The processor places the address of the instruction to be executed next in the Save/Restore Register 0 (SRR0)

• The processor copies machine state information from the MSR to the SRR1

• The MSR is set to a hardware-defined value specific to the interrupt type

• The processor transfers control to the appropriate interrupt handler

Interrupt HandlingInterrupt Handling PowerPC processorPowerPC processor

chapter 11 cpu structure and function

Documents