pa adv - eca.cs.purdue.edu · xplicit vs. implicit •p arallel architectures •s isd •s imd...
TRANSCRIPT
Advanced Topics
Chapter 17
Parallelism
CS250 -- Part V
1D
r.Rajesh Subram
anyan, 2005
Topics
•Introduction
•C
haracterizations of Parallelism
•M
icroscopic Vs M
acroscopic
•Sym
metric V
s. Asym
metric
•Fine G
rain Vs C
oarse-grain
•E
xplicit Vs. Im
plicit
•Parallel A
rchitectures
•SISD
•SIM
D
CS250 -- Part V
2D
r.Rajesh Subram
anyan, 2005
Topics
•M
IMD
•Sym
metric M
ultiprocessors
•C
omm
unication, Coordination, C
ontention
•Perform
ance of Multiprocessors
•C
onsequence for Programm
ers
•L
ocks and Mutual E
xclusion
•Program
ming E
xplicit and Implicit Parallel C
omputers
•R
edundant Parallel Architectures
•D
istributed and Cluster C
omputers
CS250 -- Part V
3D
r.Rajesh Subram
anyan, 2005
Introduction
•Previous parts covered
−processors
−m
emory
−I/O
•N
ow
−advanced topics
−use of parallelism
to increase speed
−parallel hardw
are, taxonomy,lim
itations etc.
•Parallel and pipelined architectures
−tw
o fundamental techniques or basis of increasing speed
CS250 -- Part V
4D
r.Rajesh Subram
anyan, 2005
Characterizations of P
arallelism
•Parallel and non-parallel are the extrem
es
•T
he greyregion in-betw
een describes amount and type of
parallelism
−m
icroscopic vs macroscopic
−sym
metric vs. asym
metric
−fine grain vs coarse-grain
−explicit vs. im
plicit
CS250 -- Part V
5D
r.Rajesh Subram
anyan, 2005
Microscopic V
s Macroscopic
•M
icroscopic
−parallelism
in a computer rem
ains hidden insidesubcom
ponents
−parallel hardw
are within specific com
ponent
•M
acroscopic
−parallelism
is a basic premise around w
hich the system is
designed
CS250 -- Part V
6D
r.Rajesh Subram
anyan, 2005
Exam
ples
•M
icroscopic
−A
LU
, registers, physical mem
ory,parallel busarchitecture.
•M
acroscopic
−use parallelism
across multiple large-scale com
ponents ofa
computer system
s, multiple identical processors,
multiple dissim
ilar processors
CS250 -- Part V
7D
r.Rajesh Subram
anyan, 2005
Symm
etric Vs. A
symm
etric
•Sym
metric
−replication of identical elem
ents
−e.g. dual processors
•A
symm
etric
−m
ultiple elements that function at the sam
e time but differ
from each other
−PC
with graphics coprocessor and m
ath coprocessor
CS250 -- Part V
8D
r.Rajesh Subram
anyan, 2005
Fine G
rain Vs C
oarse-grain
•Fine G
rain
−parallelism
at the levelofindividual instructions or
individual data elements
−e.g. graphics processor that uses 16 hardw
are units toupdate 16 bytes of an im
age at the same tim
e
•C
oarse-grain
−parallelism
on the levelofprogram
s or large blocks ofdata
−e.g. dual processor,using a processor each to print andcom
pose an email sim
ultaneously
CS250 -- Part V
9D
r.Rajesh Subram
anyan, 2005
Explicit V
s. Implicit
•Im
plicit
−hardw
are handles parallelism autom
atically without
requiring a programm
er to initiate or control parallelexecution
•E
xplicit
−program
mer m
ust control each parallel unit
CS250 -- Part V
10D
r.Rajesh Subram
anyan, 2005
Parallel A
rchitectures
•Parallel architectures: parallelism
of the central featuresaround w
hich the entire system is designed
•Flynn’s
classification
Nam
e M
eanin
gS
ISD
S
ing
leIn
structio
n S
ing
le Data stream
SIM
D
Sin
gle
Instru
ction
Mu
ltiple D
ata streams
MIM
D
Mu
ltiple
Instru
ction
s Mu
ltiple D
ata streams
Terminology used to characterize com
puters according to theam
ount and type of parallelism.
Note, hybrids that span
multiple types also exist.
CS250 -- Part V
11D
r.Rajesh Subram
anyan, 2005
SISD
•Single Instruction Single D
ata stream
•A
lso called sequential or uniprocessor architecture
•Follow
s Von N
eumann’s
architecture
•D
oes not support macroscopic parallelism
, can useparallelism
internally
CS250 -- Part V
12D
r.Rajesh Subram
anyan, 2005
SIMD
•Single Instruction stream
s Multiple D
ata streams
•E
ach instruction specifies a single operation applied to many
data items sim
ultaneously
•V
ector,array processors
for i from 1 to N
{
V[i]
←V
[i]×
Q;
}
Algorithm
for vector normalization in a sequential com
puter.
•Sam
e in a SIMD
computer
V<
-V
XQ
•G
raphic processors
CS250 -- Part V
13D
r.Rajesh Subram
anyan, 2005
MIM
D
•M
ultiple Instruction streams M
ultiple Data stream
s
•E
ach processor performs independent com
putations at thesam
e time
•E
xample of M
IMD
−sym
metric m
ultiprocessors (SMP)
−asym
metric m
ultiprocessors (AM
P)
−m
ath graphics coprocessors
−I/O
processors
−C
DC
peripheral processors
CS250 -- Part V
14D
r.Rajesh Subram
anyan, 2005
Symm
etric Multiprocessors
Main
Mem
or y
(variou
sm
od
ules)
Devices
P1Pi
P2
Pi+1
PN
Pi+2
Symm
etric multiprocessor w
ith N identical processors. E
achprocessor has access to m
emory I/O
devices.
CS250 -- Part V
15D
r.Rajesh Subram
anyan, 2005
Perform
ance of Multiprocessors
•Speedup =
execution time on a single processor/execution
time required on a m
ultiprocessor
CP
U
I/O d
evices
PP
1
PP
2
PP
3P
P4
PP
5
PP
6
PP
7P
P8
PP
9
PP
10
Illustration of the ideal and typical performance of a
multiprocessor as the num
ber of processors is increased.V
alues on the y-axis list the relative speedup compared to a
single processor.
CS250 -- Part V
16D
r.Rajesh Subram
anyan, 2005
Perform
ance of Multiprocessors
•R
easons why
multiprocessor architectures have not fulfilled
the promise of scalable, high perform
ance computing.
−O
Sbottlenecks
−m
emory contention
−I/O
bottlenecks
When used for general-purpose com
puting,am
ultiprocessor does not perform w
ell. In some cases,
added overhead means perform
ance decreases as more
processorsare added.
CS250 -- Part V
17D
r.Rajesh Subram
anyan, 2005
Consequence for P
rogramm
ers
•L
ocks and mutual exclusion
•Program
ming explicit and im
plicit parallel computers
•Program
ming sym
metric and asym
metric m
ultiprocessors
CS250 -- Part V
18D
r.Rajesh Subram
anyan, 2005
Locks and M
utual Exclusion
•Shared variables in m
ultiprocessors face the following
problem.
−W
hat happens if two
processors attempt to increm
ent atthe sam
e time?
−V
ariable is incremented only once instead of the correct
two
times.
•O
ne solution: locks
CS250 -- Part V
19D
r.Rajesh Subram
anyan, 2005
Locks and M
utual Exclusion
loadx, R5
incrR5
storeR5, x
An exam
ple sequence of machine instructions used to increm
enta
variable in mem
ory.Inm
ost architectures, increment entails a
load and store.
CS250 -- Part V
20D
r.Rajesh Subram
anyan, 2005
Locks
lock17loadx, R
5incrR
5storeR
5, xrelease 17
Illustration of the instructions used to guarantee exclusive accessto a variable. A
separate lock is assigned to each shared item.
•O
nly one copyof
lock exists, only one processor can gainaccess to lock at a tim
e.
CS250 -- Part V
21D
r.Rajesh Subram
anyan, 2005
Locks
•Problem
s
−program
mers forget to apply locks to shared variables
−errors due to sim
ultaneous access depend on timing and
are rare to detect
−locking can reduce perform
ance
−locking adds overhead
CS250 -- Part V
22D
r.Rajesh Subram
anyan, 2005
Program
ming E
xplicit and Implicit P
arallel Com
puters
Fr om
apro gram
mer’s
point of view, a system
that uses explicitparallelism
is significantly more
complex
topro gram
that asystem
that uses implicit parallelism
.
CS250 -- Part V
23D
r.Rajesh Subram
anyan, 2005
Redundant P
arallel Architectures
•U
se of parallel hardware to im
prove reliability and preventfailure
•M
ultiple copies of a hardware unit that operate in parallel to
perform an operation (sam
e data and same operation on all
units)
•W
hat happens if redundant copies of hardware disagree?
−votes?
−error m
essage.
CS250 -- Part V
24D
r.Rajesh Subram
anyan, 2005
Distributed and C
luster Com
puters
•T
ightly coupled
−parallel hardw
are units are located inside the same
computer system
•L
oosely coupled
−m
ultiple computer system
s that are interconnected by acom
munication m
echanism
−e.g. distributed architectures, cluster com
puter
CS250 -- Part V
25D
r.Rajesh Subram
anyan, 2005
Distributed and C
luster Com
puters
•D
istributed architecture
−set of com
puters connected by a computer netw
ork orInternet
•C
luster computer
−set of independent com
puters connected by a high-speedcom
puter network that are all dedicated to solving one
problem at a tim
e
−also called netw
ork cluster
CS250 -- Part V
26D
r.Rajesh Subram
anyan, 2005
Summ
ary
•Parallelism
is one of the fundamental optim
ization techniquesused to increase hardw
are performance
•E
xplicit parallelism gives
aprogram
mer control over
the useof parallel facilities; im
plicit parallelism handles parallelism
automatically
•SIM
D architecture allow
s an instruction to operate on anarray of values
−vector processors
−graphic processors
CS250 -- Part V
27D
r.Rajesh Subram
anyan, 2005
Summ
ary
•M
IMD
employs m
ultiple, independent processors thatoperate sim
ultaneously and can each execute a separateprogram
.
−sym
metric m
ultiprocessors
−asym
metric m
ultiprocessors
•Speedup
−theoretically m
achine with N
processors should performN
times as fast as a single processor
−in
practice, speedup does not increase linearly with
number of processors due to m
emory contention and
comm
unication overhead.
•A
lternativesto
SIMD
and MIM
D are redundant, distributed,
and cluster architecturesC
S250 -- Part V28
Dr.R
ajesh Subramanyan, 2005
CS250 -- Part V
29D
r.Rajesh Subram
anyan, 2005
CS250 -- Part V
29D
r.Rajesh Subram
anyan, 2005