basic pipelining concepts - chalmers · basic pipelining concepts appendix a (recommended reading,...

Basic Pipelining Concepts

Appendix A (recommended reading, not everything will be covered today)

– Basic pipelining

– Pipeline hazards

• Data hazards

• Control hazards

• Structural hazards

– Multicycle operations

2009

Instruction Execution

For each instruction:

1. Instruction fetch (IF)

2. Instruction decode, operand fetch (ID)

3. Execute computations (EX)

4. Memory access (MEM)

5. Write back results to registers (WB)

Number and types of steps can vary

between different ISA and implementations

2009

MIPS Single Cycle Implementation

Instructionmemory

PC

ADD

4

op rs rt rd shamt funct

op rs rt address/immediate

32 bit

Instructionfetch

2009

MIPS Single Cycle Implementationop rs rt rd shamt funct


Instructionmemory

PC

ADD

4

Instructionfetch

Instruction decode/register fetch

Registers

Signext.

data_rt

data_rs

32 bit

rs

rt

rd

address/immediate

16 bit

2009



Instructionmemory

PC

ADD

4

Instructionfetch


Registers

rs

rt

Signext.

rd

Shiftleft 2

ADD

Execute/address calc.

ALU

status

result

address/immediate

32 bit16 bit

data_rt

data_rs

2009



Instructionmemory

PC

ADD

4

Instructionfetch


Registers

rs

rt

Signext.

rd

Shiftleft 2

ADD

ALU


16 bit 32 bit

status

resultData

memory

Memoryaccess

address/immediate

data_rt

data_rs

2009



Instructionmemory

PC

ADD

4

Instructionfetch


Registers

rs

rt

Signext.

rd

Shiftleft 2

ADD

ALU


16 bit 32 bit

status

resultData

memory

Memoryaccess

address/immediate

Writeback

data_rt

data_rs

2009

Problems

• All instructions take the time required by the longest instruction

• Alternative solutions:1. Multicycle processors

2. Pipelining

• Both solution require the implementation to change!

We will have closer look at pipelining!

2009

The Assembly Line Concept

• A pipelined processor is based on the assembly line concept

• One station for each stage in the instruction execution

• At any moment there is one instruction at each station

• One new instruction every cycle => CPI=1

Each instruction takes multiple cycles to complete, but the

throughput is high!

2009

Instructionmemory

PC

ADD

4

Instructionfetch


Registers

rs

rt

Signext.

rd

Shiftleft 2

ADD

ALU


16 bit 32 bit

status

resultData

memory

Memoryaccess

address/immediate

Writeback

data_rt

data_rs

Pipeline for MIPS

2009

Pipeline for MIPS

ADD

ALU

Shiftleft 2

Signext.

RegistersInstructionmemory Data

memory

PC

ADD

4

IF/ ID ID/ EX EX/ MEM MEM/ WB

Read addr.

Write addr.

Pipeline registers

2009

Pipelining Example

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800

2009

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800

PC

Pipelining Example

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

add $5, $2, $3

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800

PC

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add $5, $2, $3

$2

$3

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800

PC

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add $5, $2, $3sw $4, 400($7)

$5

100

$2

$3

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800

PC

$2+$3

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add $5, $2, $3sw $4, 400($7)

$7

400

$5

beq $8, $9, 800

$4

100

$2+$3

beq+4

lw $4, 100($5)

add $5, $2, $3

sw $4, 400($7)

beq $8, $9, 800PC

$5+100

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add…sw $4, 400($7)

$8

800

$7

beq $8, $9, 800

$9

400

$5+100$4

$2+$3

$5

beq+4

(beq+4)

$7+400

sw $4, 400($7)

lw $4, 100($5)

beq $8, $9, 800

....... PC

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4…sw $4, 400($7)

$8

beq $8, $9, 800

800

$7+400$9

M[$5+100]

$4

beq+4

3200

$4

(beq+4)(beq+8)

$8-$9

beq+4+3200

Z

beq $8, $9, 800

sw $4, 400($7)

.......

....... PC

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

sw...beq $8, $9, 800

beq+4+3200

(beq+8)(beq+12) (beq+4)

Controls mux

Z

2009

Pipelining Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

beq...(beq+8)(beq+12) (beq+4)(beq+4+3200)

MEM/ WB

.......

.......

.......

branch dest.PC

2009

Pipeline Hazards

• Neighboring instructions are rarely independent

• In a pipeline, this can cause conflicts called hazards

• Three main types of hazards

– Data hazards

– Control hazards

– Structural hazards

2009

Hazard Resolution

Pipeline hazards can be resolved in many different ways

• Stall: Stop parts of the pipeline until the conflicting instructions are sufficiently separated

• Make results available earlier– Move calculations to earlier pipeline stages

– Make results available before they have been stored

– Guess results before they have been computed(!)

• Reorder instructions

2009

Data Hazards

• Three types– Read-After-Write (RAW).

– Write-After-Read (WAR). Do not occur in simple pipelines.

– Write-After-Write (WAW). Do not occur in simple pipelines.

• RAW hazard occurse when– An instruction needs the result of an earlier instruction that is

stored in a register (or memory location),

– and the earlier instruction has not yet written the result

• Usually handled by (some combination of)– data forwarding (bypassing)

– stalling

– instruction reordering

2009

RAW Hazard Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add $5, $2, $3sw $4, 400($7)

$5

100

$2

$3

2009

Solution: Data forwarding

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add $5, $2, $3sw $4, 400($7)

$7

400

$5

beq $8, $9, 800

$4

100

$2+$3

beq+4

2009

Another RAW Hazard Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) add $5, $2, $3sw $4, 400($7)

$7

400

$5

beq $8, $9, 800

$4

100

$2+$3

beq+4

2009

Solution: Stalling

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw...sw $4, 400($7)

$7

400

beq $8, $9, 800

$4

nop nop

$4

PC and IF/ID not updateduntil lw reaches WB

”Bubbles” (nop=no operation) loadedinto ID/EX until lw reaches WB

2009

Speedup Equation for Pipelining

CPIpipelined = Ideal CPI + Pipeline stall cycles per instr.

Speedup = * CPIunpipelined Tcunpipelined

Ideal CPI + #stall cycles/instr Tcpipelined

Ideal CPI for a pipeline is normally = 1.

= Number of pipeline stages.

Gives, Speedup = #pipeline stages

1 + #stall cycles/instr

And, if CPIunpipelined * Tcunpipelined

Tcpipelined

2009

Control Hazards

• Occur when the program counter (PC) is changed by

– a branch och jump instruction

– an exception (interrupt, trap, etc.)

• Usually handled by (some combination of)

– branch prediction

– earlier target address calculation

– stalling

– delayed branch

– instruction reordering (static or dynamic scheduling)

2009

Control Hazard Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

sw...beq $8, $9, 800

beq+4+800

(beq+8)(beq+12) (beq+4)

If branch to be taken, (beq+4)-(beq+12)should not have been fetched

2009

Solution: Stalling

• Stop fetching instructions after a branch instruction until the address of the next instruction has been determined

• This is very inefficient because branches tend to be very frequent

2009

Solution: Earlier Branch Calculation

ALU

Signext.

RegistersInstructionmemory Data

memory

PC

ADD

4


Read addr.

Write addr.

ADDShiftleft 2

Both address and condition needto be calculated earlier (in ID)

Risk that clock cycle must be increased, leading to total performance loss.

2009

Solution: Branch-Delay Hiding Techniques

Stall until branch condition and target is known: Can cause significant penalties

Predict Branch not taken (a fairly rare case)Execute successor instructions in sequence

“Squash” instructions in pipeline if the branch is actually taken

Works well if state is updated late in the pipeline

30%-38% of conditional branches are not taken on average

Predict Branch taken (a fairly common case)62%-70% of conditional branches are taken on average

Makes sense for more complex pipeline organizations

Delayed branch (schedule useful instr. in delay slot)Define branch to take place after a following instruction

2009

Static Scheduling and Delayed Branch

– Scheduling an instruction from before is always safe

– Scheduling from target or from the not-taken path is not always safe; must be guaranteed that speculative instr. do no harm.

2009

Conditional Delay Slot Execution

Cancelling or nullifying branch instruction: Cancel the instruction in the delay slot if branch does not conform with the prediction

Measurements on SPEC:

80% of the branch-delay slots can be filled with useful instructions

70% will be filled at run-time; 10% of the useful instructions will be cancelled because of mispredictions

2009

Evaluating Branch Hazard Avoidance Techniques

Pipeline speedup =#pipeline stages

1 + Branch frequency x Branch penalty

Schedulingscheme

Branch penaltyfor integ. Pgm

CPI Speedup vs.Unpipelined

Stall pipeline 1 1.17 4.3

Predict taken 1 1.17 4.3

Predict not taken 0.69 1.12 4.5

Delayed branch 0.21 1.04 4.8

2009

Exceptions

• An exception (interrupt, trap, …) always causes a jump in execution

• Exceptions are extra difficult to handle as they cannot be predicted and may occur at different stages for different instructions

• Certain precise exceptions require that instructions in the beginning of the pipeline are thrown away and later restarted when the exception handler is finished

2009

Respecting the Execution order

Exceptions may be generated in another order than the instruction execution order

Example sequence:

lw (e.g., page fault in MEM)

add (e.g., page fault in IF)

The add instruction causes a fault before the load

Pipelinestage

Problem causingexception

IF Page fault on instruction fetch;

misaligned memory access;

memory protection violation

ID Undefined or illegal opcode

EX Arithmetic exception

MEM Page fault on data access;Misaligned memory access;

Memory protection violation

WB none

2009

Structural Hazards

• Occur when two instructions require the same resource at the same time

• Blocked resources can be

– Pipeline stages, or functional units in pipeline stages

– Memory

– Register file

• Usually handled by (a combination of)

– stalling

– instruction reordering (e.g. dynamic scheduling, which will be covered later in the course)

2009

Structural Hazard Example

ALU

ADDShiftleft 2

ADD

4

Datamemory

Instructionmemory

PC

Registers

Signext.


Read addr.

Write addr.

lw $4, 100($5) nopsw $4, 400($7)

$8

200

$7

beq $8, $9, 800

$9

400

$5+100$4

beq+4

(beq+4)

lw has to wait on a cache miss inMEM => stall for previous stages

2009

Multicycle Operations in the Pipeline

– Integer unit: Handles integer instructions, branches, and loads/stores

– Other units: May take several cycles each. Some units are pipelined (mult,add) others are not (div)

2009

Parallel Execution of Instructions

MULTD F2,F4,F6 IFIDM1M2M3M4 M5M6M7MEMWB

ADDD F8,F10,F12 IFIDA1A2A3 A4 MEMWB

SUBI R2,R3,#8 IFIDEXMEMWB

LD F14,0(R2) IFIDEXMEMWB

Structural and RAW hazards:

Structural hazards. Stall in ID stage when

• the functional unit is occupied

• many instructions can reach the WB stage at the same time

RAW hazards: Normal bypassing from MEM and WB stages

Stall in ID stage if any of the source operands is a destination operand of an instruction in any of the FP functional units

2009

WAR and WAW Hazards for Multicycle Operations

– WAR hazards are a non-issue because operands are read in program order

– WAW hazards may occur

Example of a WAW hazard:DIVF F0,F2,F4 FP divide 24 cyclesSUBF F0,F8,F10 FP sub 3 cycles

SUB finishes before DIV ; out-of-order completion

WAW hazards are avoided by: stalling the SUBF until DIVF reaches the MEM stage, or

disabling the write to register F0 for the DIVF instruction

2009

Summary

Pipelining:

– Speeds up throughput, not latency

– Speedup #stages

– Hazards are fundamental limits:

• Structural: need more HW

• Data (RAW,WAR,WAW): need forwarding and compiler scheduling

• Control: delayed branch, branch prediction

Complications:

– Precise exceptions: maintain execution order

– ISA must be designed to match pipelining requirements

– Multi-cycle operations may result in out-of-order completion

– Out-of-order completion introduces WAW hazards and problems with precise interrupts

2009

basic pipelining concepts - chalmers · basic pipelining concepts appendix a (recommended reading,...

Documents