pa adv - eca.cs.purdue.edu · xplicit vs. implicit •p arallel architectures •s isd •s imd...

Advanced Topics

Chapter 17

Parallelism

CS250 -- Part V

1D

r.Rajesh Subram

anyan, 2005

Topics

•Introduction

•C

haracterizations of Parallelism

•M

icroscopic Vs M

acroscopic

•Sym

metric V

s. Asym

metric

•Fine G

rain Vs C

oarse-grain

•E

xplicit Vs. Im

plicit

•Parallel A

rchitectures

•SISD

•SIM

D

CS250 -- Part V

2D

r.Rajesh Subram

anyan, 2005

Topics

•M

IMD

•Sym

metric M

ultiprocessors

•C

omm

unication, Coordination, C

ontention

•Perform

ance of Multiprocessors

•C

onsequence for Programm

ers

•L

ocks and Mutual E

xclusion

•Program

ming E

xplicit and Implicit Parallel C

omputers

•R

edundant Parallel Architectures

•D

istributed and Cluster C

omputers

CS250 -- Part V

3D

r.Rajesh Subram

anyan, 2005

Introduction

•Previous parts covered

−processors

−m

emory

−I/O

•N

ow

−advanced topics

−use of parallelism

to increase speed

−parallel hardw

are, taxonomy,lim

itations etc.

•Parallel and pipelined architectures

−tw

o fundamental techniques or basis of increasing speed

CS250 -- Part V

4D

r.Rajesh Subram

anyan, 2005

Characterizations of P

arallelism

•Parallel and non-parallel are the extrem

es

•T

he greyregion in-betw

een describes amount and type of

parallelism

−m

icroscopic vs macroscopic

−sym

metric vs. asym

metric

−fine grain vs coarse-grain

−explicit vs. im

plicit

CS250 -- Part V

5D

r.Rajesh Subram

anyan, 2005

Microscopic V

s Macroscopic

•M

icroscopic

−parallelism

in a computer rem

ains hidden insidesubcom

ponents

−parallel hardw

are within specific com

ponent

•M

acroscopic

−parallelism

is a basic premise around w

hich the system is

designed

CS250 -- Part V

6D

r.Rajesh Subram

anyan, 2005

Exam

ples

•M

icroscopic

−A

LU

, registers, physical mem

ory,parallel busarchitecture.

•M

acroscopic

−use parallelism

across multiple large-scale com

ponents ofa

computer system

s, multiple identical processors,

multiple dissim

ilar processors

CS250 -- Part V

7D

r.Rajesh Subram

anyan, 2005

Symm

etric Vs. A

symm

etric

•Sym

metric

−replication of identical elem

ents

−e.g. dual processors

•A

symm

etric

−m

ultiple elements that function at the sam

e time but differ

from each other

−PC

with graphics coprocessor and m

ath coprocessor

CS250 -- Part V

8D

r.Rajesh Subram

anyan, 2005

Fine G

rain Vs C

oarse-grain

•Fine G

rain

−parallelism

at the levelofindividual instructions or

individual data elements

−e.g. graphics processor that uses 16 hardw

are units toupdate 16 bytes of an im

age at the same tim

e

•C

oarse-grain

−parallelism

on the levelofprogram

s or large blocks ofdata

−e.g. dual processor,using a processor each to print andcom

pose an email sim

ultaneously

CS250 -- Part V

9D

r.Rajesh Subram

anyan, 2005

Explicit V

s. Implicit

•Im

plicit

−hardw

are handles parallelism autom

atically without

requiring a programm

er to initiate or control parallelexecution

•E

xplicit

−program

mer m

ust control each parallel unit

CS250 -- Part V

10D

r.Rajesh Subram

anyan, 2005

Parallel A

rchitectures

•Parallel architectures: parallelism

of the central featuresaround w

hich the entire system is designed

•Flynn’s

classification

Nam

e M

eanin

gS

ISD

S

ing

leIn

structio

n S

ing

le Data stream

SIM

D

Sin

gle

Instru

ction

Mu

ltiple D

ata streams

MIM

D

Mu

ltiple

Instru

ction

s Mu

ltiple D

ata streams

Terminology used to characterize com

puters according to theam

ount and type of parallelism.

Note, hybrids that span

multiple types also exist.

CS250 -- Part V

11D

r.Rajesh Subram

anyan, 2005

SISD

•Single Instruction Single D

ata stream

•A

lso called sequential or uniprocessor architecture

•Follow

s Von N

eumann’s

architecture

•D

oes not support macroscopic parallelism

, can useparallelism

internally

CS250 -- Part V

12D

r.Rajesh Subram

anyan, 2005

SIMD

•Single Instruction stream

s Multiple D

ata streams

•E

ach instruction specifies a single operation applied to many

data items sim

ultaneously

•V

ector,array processors

for i from 1 to N

{

V[i]

←V

[i]×

Q;

}

Algorithm

for vector normalization in a sequential com

puter.

•Sam

e in a SIMD

computer

V<

-V

XQ

•G

raphic processors

CS250 -- Part V

13D

r.Rajesh Subram

anyan, 2005

MIM

D

•M

ultiple Instruction streams M

ultiple Data stream

s

•E

ach processor performs independent com

putations at thesam

e time

•E

xample of M

IMD

−sym

metric m

ultiprocessors (SMP)

−asym

metric m

ultiprocessors (AM

P)

−m

ath graphics coprocessors

−I/O

processors

−C

DC

peripheral processors

CS250 -- Part V

14D

r.Rajesh Subram

anyan, 2005

Symm

etric Multiprocessors

Main

Mem

or y

(variou

sm

od

ules)

Devices

P1Pi

P2

Pi+1

PN

Pi+2

Symm

etric multiprocessor w

ith N identical processors. E

achprocessor has access to m

emory I/O

devices.

CS250 -- Part V

15D

r.Rajesh Subram

anyan, 2005

Perform


•Speedup =

execution time on a single processor/execution

time required on a m

ultiprocessor

CP

U

I/O d

evices

PP

1

PP

2

PP

3P

P4

PP

5

PP

6

PP

7P

P8

PP

9

PP

10

Illustration of the ideal and typical performance of a

multiprocessor as the num

ber of processors is increased.V

alues on the y-axis list the relative speedup compared to a

single processor.

CS250 -- Part V

16D

r.Rajesh Subram

anyan, 2005

Perform


•R

easons why

multiprocessor architectures have not fulfilled

the promise of scalable, high perform

ance computing.

−O

Sbottlenecks

−m

emory contention

−I/O

bottlenecks

When used for general-purpose com

puting,am

ultiprocessor does not perform w

ell. In some cases,

added overhead means perform

ance decreases as more

processorsare added.

CS250 -- Part V

17D

r.Rajesh Subram

anyan, 2005

Consequence for P

rogramm

ers

•L

ocks and mutual exclusion

•Program

ming explicit and im

plicit parallel computers

•Program

ming sym

metric and asym

metric m

ultiprocessors

CS250 -- Part V

18D

r.Rajesh Subram

anyan, 2005

Locks and M

utual Exclusion

•Shared variables in m

ultiprocessors face the following

problem.

−W

hat happens if two

processors attempt to increm

ent atthe sam

e time?

−V

ariable is incremented only once instead of the correct

two

times.

•O

ne solution: locks

CS250 -- Part V

19D

r.Rajesh Subram

anyan, 2005

Locks and M

utual Exclusion

loadx, R5

incrR5

storeR5, x

An exam

ple sequence of machine instructions used to increm

enta

variable in mem

ory.Inm

ost architectures, increment entails a

load and store.

CS250 -- Part V

20D

r.Rajesh Subram

anyan, 2005

Locks

lock17loadx, R

5incrR

5storeR

5, xrelease 17

Illustration of the instructions used to guarantee exclusive accessto a variable. A

separate lock is assigned to each shared item.

•O

nly one copyof

lock exists, only one processor can gainaccess to lock at a tim

e.

CS250 -- Part V

21D

r.Rajesh Subram

anyan, 2005

Locks

•Problem

s

−program

mers forget to apply locks to shared variables

−errors due to sim

ultaneous access depend on timing and

are rare to detect

−locking can reduce perform

ance

−locking adds overhead

CS250 -- Part V

22D

r.Rajesh Subram

anyan, 2005

Program

ming E

xplicit and Implicit P

arallel Com

puters

Fr om

apro gram

mer’s

point of view, a system

that uses explicitparallelism

is significantly more

complex

topro gram

that asystem

that uses implicit parallelism

.

CS250 -- Part V

23D

r.Rajesh Subram

anyan, 2005

Redundant P

arallel Architectures

•U

se of parallel hardware to im

prove reliability and preventfailure

•M

ultiple copies of a hardware unit that operate in parallel to

perform an operation (sam

e data and same operation on all

units)

•W

hat happens if redundant copies of hardware disagree?

−votes?

−error m

essage.

CS250 -- Part V

24D

r.Rajesh Subram

anyan, 2005

Distributed and C

luster Com

puters

•T

ightly coupled

−parallel hardw

are units are located inside the same

computer system

•L

oosely coupled

−m

ultiple computer system

s that are interconnected by acom

munication m

echanism

−e.g. distributed architectures, cluster com

puter

CS250 -- Part V

25D

r.Rajesh Subram

anyan, 2005

Distributed and C

luster Com

puters

•D

istributed architecture

−set of com

puters connected by a computer netw

ork orInternet

•C

luster computer

−set of independent com

puters connected by a high-speedcom

puter network that are all dedicated to solving one

problem at a tim

e

−also called netw

ork cluster

CS250 -- Part V

26D

r.Rajesh Subram

anyan, 2005

Summ

ary

•Parallelism

is one of the fundamental optim

ization techniquesused to increase hardw

are performance

•E

xplicit parallelism gives

aprogram

mer control over

the useof parallel facilities; im

plicit parallelism handles parallelism

automatically

•SIM

D architecture allow

s an instruction to operate on anarray of values

−vector processors

−graphic processors

CS250 -- Part V

27D

r.Rajesh Subram

anyan, 2005

Summ

ary

•M

IMD

employs m

ultiple, independent processors thatoperate sim

ultaneously and can each execute a separateprogram

.

−sym

metric m

ultiprocessors

−asym

metric m

ultiprocessors

•Speedup

−theoretically m

achine with N

processors should performN

times as fast as a single processor

−in

practice, speedup does not increase linearly with

number of processors due to m

emory contention and

comm

unication overhead.

•A

lternativesto

SIMD

and MIM

D are redundant, distributed,

and cluster architecturesC

S250 -- Part V28

Dr.R

ajesh Subramanyan, 2005

CS250 -- Part V

29D

r.Rajesh Subram

anyan, 2005

pa adv - eca.cs.purdue.edu · xplicit vs. implicit •p arallel architectures •s isd •s imd...

Documents