1 pipeline and vector processing chapter # 9. 2 contents parallel processing pipelining arithmetic...

PIPELINE AND VECTOR PROCESSING

CHAPTER # 9

CONTENTS

Parallel Processing Pipelining Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors

Figure 9-1

Processor with multiple functional units

Integer multiply

Adder-sub tractor

Floating-pointmultiply

Floating-pointdivide

Floating-pointAdd-subtract

Incrementer

Logic unit

Shift unit

Processorregister

To memory

Instruction and stream.

Single instruction stream, single data stream (SISD).

Single instruction stream, multiple data stream (SIMD).

Multiple instruction stream, single data stream (MISD).

Multiple instruction stream, multiple data stream (MIMD).

Figure 9-2

Example of Pipelining.

Ai Bi Ci

R1 Ai , R2 Bi

Input Ai and Bi

R3 R1 * R2, R4 Ci Multiply and input Ci

R5 R3 + R4 Add Ci to product

Multiplier

ClockPulsenumber

Segment1R1 R2

Segment2R3 R4

Segment3 R5

1 A1 B1 ---- ---- ----

2 A2 B2 A1*B1 C1 ---- 3 A3 B3 A2*B2 C2 A1*B1+C1

4 A4 B4 A3*B3 C3 A2*B2+C2

5 A5 B5 A4*B4 C4 A3*B3+C3

6 A6 B6 A5*B5 C5 A4*B4+C4

7 A7 B7 A6*B6 C6 A5*B5+C5

8 ---- ---- A7*B7 C7 A6*B6+C6

9 ---- ---- ---- ---- A7*B7+C7

Table 9-1

Content of registers in pipeline example.

Figure 9-3

Four segment pipeline.

Input R4S1 R3R2 S4S3S2R1

Figure 9-4

Space-time diagram for pipeline.

1 2 3 4 5 6 7 8 9

T1 T2 T3 T4 T5 T6

Clock cycleSegment

Figure 9-5

Multiple functional units in parallel.

Arithmetic Pipeline

Compare the exponents. Align the mantissas. Add or subtract the mantissas. Normalize the result.

Mantissas Exponents

a b A B

Segment 1

Segment 2

Segment 3

Segment 4

CompareExponentBy subtraction

Choose exponent

Adjust

Exponent

Align mantissas

Add or subtract

mantissas

Normalize

result

Difference

Figure 9-6

Pipeline for floating-point and subtraction.

Instruction Pipeline

Fetch the instruction from memory. Decode the instruction. Calculate the effective address. Fetch the operands from memory. Execute the instruction. Store the result in the proper place.

Figure 9-7

Four-segment CPU pipeline.

Segment 1

Segment 2

Segment 3

Segment 4

Decode instructionAnd calculateEffective address

Fetch instruction from memory

Branch?

Fetch operandFrom memory

Execute instruction

Interrupt?Interrupthandling

Update PC

Empty pipe

FI is the segment that fetches an instruction.

DA is the segment that decodes the instruction and calculate the effective address.

FO is the segment that fetches the operand.

EX is the segment that executes the instruction.

Segments and their purpose.

1 2 3 4 5 6 7 8 9 10 11 12 13

Instruction:

(Branch)

FI DA FO EX

FI DA FO

FO FI DA

FI -- --

-- -- --

Figure 9-8

Timing of instruction pipeline.

Pipeline Conflicts

Resource conflicts Data dependency conflicts Branch difficulties conflicts

Three-segment instruction pipeline

I: Instruction fetch A: ALU operation E: Execute instruction

Delayed Load

LOAD R1 M[address 1]

LOAD R2 M[address 2]

ADD R3 R1+R2

STORE M[address 3] R3

654321

Clock cycles

1. Load R1

2. Load R2

3. Add R1+R2

4. Store R3

Pipeline timing with data conflict

7654321 Clock cycles

1. Load R1

2. Load R2

3. No-operation

4. Add R1+R2

5. Store R3

Pipeline timing with delayed load

Figure 9-9

Three segment pipeline timing.

Figure 9-10

Examples of delayed branch.

Clock cycles

1. Load

2. Increment

3. Add

4. Subtract

10987654321

5. Branch to X

6. NO-operation

7. NO-operation

8. Instruction in X

Using no-operation instructions

Clock cycles

1. Load

2. Increment

4. Add

5. Subtract

3. Branch to X

6. Instruction in X

1 2 3 4 5 6 7 8

Figure 9-10

Examples of delayed branch.

Rearranging instruction

Application of Vector Processing

Long range weather forecasting. Petroleum explorations. Seismic data analysis. Medical diagnosis. Aerodynamics and space flight simulations.

Figure 9-11

Instruction format for vector processor

Operationcode

Base addressSource 1

Base addressSource 2

Base addressdestination

Vectorlength

Figure 9-12

Pipeline for calculating an inner product

SourceA

SourceB

Multiplier pipeline

Adder pipeline

Figure 9-13

Multiple module memory organization

AR AR AR AR

DRDRDRDR

Memoryarray

Address bus

Data bus

Types of Array Processors

Attached Array Processor SIMD Array Processor

Figure 9-14

Attached Array Processor with host computer

General-Purposecomputer

input-outputinterface

Attached arrayprocessor

Local memoryMain memoryHigh-speed memory to

Memory bus

Figure 9-15

SIMD array processor organization

Master controlunit

Main memory

1 pipeline and vector processing chapter # 9. 2 contents parallel processing pipelining arithmetic...

b segment

b i c i r

segment instruction

b i input

segment pipeline

input c i r

mantissas r r

exponent r r

Documents

1 pipelining and vector processing computer organization...

lecture 6: vector processing -...

curriculum structure semester vii -...

pipelining and vector processing - profs.basu.ac.ir · pdf...

1 pipelining and vector processing computer organization...

pipeline processing (1)

camera post-processing pipeline - computer...

camera processing pipeline - stanford university

computer architecture: simd/vector/gpu vector processing

pipeline and vector processing · pipeline and vector...

camera processing pipeline - york university

pipeline and vector processing

pipelining and vector processing

pre-processing pipeline

a new post-processing pipeline

pipeline and vector processing - · pdf filepipeline and...

chapter 8 s. dandamudi - carleton university · ∗ cray...

terrastream : terrain processing pipeline

fd.io vector packet processing

pipeline and vector processing (computer architecture)