chpt5 (1)
TRANSCRIPT
-
7/24/2019 Chpt5 (1)
1/68
Chapter 5
The Sieve of Eratosthenes
-
7/24/2019 Chpt5 (1)
2/68
2
Chapter Objectives
Analysis of block allocation schemes
Function !"#$cast
!erformance enhancements
Focus !roblem% The &reek mathematicianEratosthenes 'Er()a)tas()the)ne*(+ 2,-./01 $C
3ante4 to fin4 a 3ay of eneratin the prime
numbers up to some number n)
6 7o formula 3ill enerate these primes)
6 8o3ever+ he 4evise4 a metho4 3hich has become
kno3n as the sieve of Eratosthenes)
-
7/24/2019 Chpt5 (1)
3/68
9
Outline to the Solution
The se:uential alorithm
Sources of parallelism
;ata 4ecomposition options
!arallel alorithm 4evelopment+ analysis
An !" proram
$enchmarkin
Optimi*ations
-
7/24/2019 Chpt5 (1)
4/68
1
Sieve of Eratosthenes
Se:uential Alorithm in !seu4oco4e
1. Create a list of unmarked natural numbers
2, 3, , n
2. k23. Repeat
(a) Mark all multiples of kbetween k2
and n
(b) Let ksmallest unmarkednumber > k
until k2> n
. !"e unmarked numbers are primes
-
7/24/2019 Chpt5 (1)
5/68
-
7/24/2019 Chpt5 (1)
6/68
-
;ata Structure =se4 For
Se:uential Alorithm
Assume a $oolean array of n elements
Array in4ices are > throuh n.2 an4 they
represent the numbers 2+ 9+ )))+ n)
The boolean value at in4e< i represents 3hether
of not the number i?2 is marke4)
6 "n4ices that are marke4 represent composite
numbers 'i)e)+ not prime "nitially+ all numbers are unmarke4
-
7/24/2019 Chpt5 (1)
7/68
,
One etho4 to !aralleli*e
$ecause the focus of the alorithm is themarkin of elements in an array+ 4omain4ecomposition makes sense)
;omain 4ecomposition6 ;ivi4e 4ata into n./ pieces
6Associate computational steps 3ith 4ata
One primitive task per array element6 These 3ill be alomerate4 into larer roups
of elements)
-
7/24/2019 Chpt5 (1)
8/68
@
!aralleli*in Alorithm Step 9'a
ecall Step 9'a%9 a ark all multiples of kbet3een k2an4 n
The follo3in straihtfor3ar4 mo4ification allo3s
this to be compute4 in parallel%
for allj3here k2jn4o
ifjmo4 kB > then
markj'i)e) it is not a prime en4if
en4for
Eachjabove represents a primitive task
-
7/24/2019 Chpt5 (1)
9/68
0
!aralleli*in Alorithm Step 9'b
ecall Step 9'a%9 b Fin4 smallest unmarke4 number k
!aralleli*in re:uires t3o steps%
6 in.re4uction 'to fin4 smallest unmarke4
number k
6 $roa4cast 'to et result to all tasks
!lus. remember these are in a repeat.until loop3hich loops until k2 n)
-
7/24/2019 Chpt5 (1)
10/68
/>
&oo4 7e3s 6 $a4 7e3s
De have foun4 lots of parallelism to e
-
7/24/2019 Chpt5 (1)
11/68
//
Alomeration &oals
De 3ant to%
6 Consoli4ate tasks
6 e4uce communication cost
6 $alance computations amon processes
De often call the result of partitionin+
alomeration+ an4 mappin the 4ata
4ecomposition or just the 4ecomposition)
-
7/24/2019 Chpt5 (1)
12/68
/2
;ata ;ecomposition Options
/) "nterleave4 'cyclic6 ;ifferent !Es han4le the belo3 sets of
inteers+ 3here p is the number of !Es%
!>han4les 2+ 2?p+ 2?2p+ ))) +
!/han4les 9+ 9?p+ 9?2p+))) +
!2han4les 1+ 1?p+ 1?2p+ ))) +
6 "t(s easy to 4etermine the o3ner or han4ler of
each number% The number iis han4le4 by process 'i.2 mo4p
-
7/24/2019 Chpt5 (1)
13/68
/9
;ata ;ecomposition Options
1. Interleaved (cyclic) - continue4
6 $ut+ this scheme lea4s to a loa4 imbalance for thisproblem.
6 "f 3e are usin t3o processes+ process > marks the
2.multiples amon even nrs 3hile process / marks
2. multiples amon o44 nrs) !rocess > marks (n-1)/2elements G process / marks none)
6 On the other han4+ for four processes+ process 2 is
markin multiples of 1 3hich is 4uplicatin process >(s
3ork)6 oreover+ fin4in the ne
-
7/24/2019 Chpt5 (1)
14/68
/1
;ata ;ecomposition Options
2) $lock
6Array I/+nJ 3ill be 4ivi4e4 into p contiuous
blocks of rouhly the same si*e for each !E
6 De 3ant to balance the loa4s 3ith minimum
4ifferences bet3een the processes)
6 "t is not 4esirable to have some processes
4oin no 3ork at all)6 De(ll tolerate the a44e4 complication to
4etermine o3ner 3hen nnot a multiple ofp
-
7/24/2019 Chpt5 (1)
15/68
/5
$lock ;ecomposition Options
De 3ant to balance the 3orkloa4 3hen n
is not a multiple ofp
Each process ets either n/por n/pelements
De seek simple e
-
7/24/2019 Chpt5 (1)
16/68
/-
etho4 L/
et r B nmo4p "f rB >+ all blocks have same si*e
Else6 First rblocks have si*e n/p
6 emaininp-rblocks have si*e n/p
E
-
7/24/2019 Chpt5 (1)
17/68
/,
E
-
7/24/2019 Chpt5 (1)
18/68
/@
etho4 L/ Calculations
et r B nmo4p
The first element controlle4 by process i is
E
-
7/24/2019 Chpt5 (1)
19/68
/0
etho4 L/ Calculations 'cont) 2H1
et r B nmo4p ast element controlle4 by process i
7ote this is just the element imme4iately beforethe first element controlle4 by process i? /)
E)
1),1min()1( +++ ripni
1% elements diided amon* # pro+esses
-
7/24/2019 Chpt5 (1)
20/68
2>
etho4 L/ Calculations 'cont) 9H1
et r B nmo4p
!rocess controllin elementj
E
-
7/24/2019 Chpt5 (1)
21/68
2/
etho4 L/ Calculations 'cont) 1H1
Althouh 4erivin the e
-
7/24/2019 Chpt5 (1)
22/68
22
etho4 L2
Scatters larer blocks amon processes
6 7ot all iven to !Es 3ith lo3est in4ices
First element controlle4 by process i 3ill be
ast element controlle4 by process i 3ill be
!rocess controllin elementj 3ill be
pin
1)1( + pni
njp ,)1)1(( +
-
7/24/2019 Chpt5 (1)
23/68
29
etho4 L2 'cont) 2H9
Scatters larer blocks amon processes
6 7ot all iven to !Es 3ith lo3est in4ices
E >
/ 9
2 -
9 />
1 /9
1% elements diided amon* # pro+esses
-
7/24/2019 Chpt5 (1)
24/68
21
etho4 L2 'cont) 9H9
E
-
7/24/2019 Chpt5 (1)
25/68
25
Some E
-
7/24/2019 Chpt5 (1)
26/68
2-
Comparin etho4s
Operations Method 1 Method 2
o3 in4e< 1 2
8ih in4e< - 1
O3ner , 1
-ssumin* no operations for floor/ fun+tion
0ur +"oi+e
-
7/24/2019 Chpt5 (1)
27/68
2,
Another E
-
7/24/2019 Chpt5 (1)
28/68
2@
acros in C
A macro 'in any lanuae is an in.line routine
that is e
-
7/24/2019 Chpt5 (1)
29/68
20
Short if.then.else in C
The construct in C of
logical N if.part % then.part
For e
-
7/24/2019 Chpt5 (1)
30/68
9>
E
-
7/24/2019 Chpt5 (1)
31/68
9/
;efine $lock ;ecomposition acros
#define BLOCK_LOW(id,p,n) \((id)*(n)/(p))
&iven i4+ p+ an4 n+ this e
-
7/24/2019 Chpt5 (1)
32/68
92
;efine $lock ;ecomposition acros
#define BLOCK_SIZE(id,p,n) \
(BLOCK_LOW((id)+1)- \BLOCK_LOW(id))
&iven i4+ p+ an4 n this e
-
7/24/2019 Chpt5 (1)
33/68
99
ocal vs) &lobal "n4ices
L 1
L 1 2
L 1
L 1 2
L 1 2
1 2 3
# $
% & ' 1 11 12
7ote% De nee4 to 4istinuish bet3een these)
-
7/24/2019 Chpt5 (1)
34/68
91
E
-
7/24/2019 Chpt5 (1)
35/68
95
Fast arkin
$lock 4ecomposition allo3s for the same
markin as the se:uential alorithm+ but it isspe4 up%
De 4on(t check each array element to see if it is
a multiple of k '3hich re:uires nHp mo4ulo
operations 3ithin each block for each prime)
"nstea4 3ithin each block
6 Fin4 the first multiple of k, say cellj
6 Then mark the cellsj+ j ? k+ j ? 2k+ j ? 9k+ This allo3s a loop similar to the one in the
se:uential proram
6e:uires about 'nHpk assinment statements)
-
7/24/2019 Chpt5 (1)
36/68
9-
;ecomposition Affects "mplementation
arest prime use4 by sieve alorithm is
boun4e4 by n First process has nHpelements6 "f nHp n+ then the first process 3ill control
all primes throuh n)6 7ormally nis much larer thanp, so this 3ill
be the case)
Conse:uently+ in this case+ the first process canbroa4cast the ne
-
7/24/2019 Chpt5 (1)
37/68
9,
Convert the Se:uential Alorithm to a
!arallel Alorithm
1. Create list of unmarked natural numbers 2, 3, , n
2. k2
3. Repeat
(a) Mark all multiples of kbetween k2and n
(b) ksmallest unmarked number > k
until k2> n
. !"e unmarked numbers are primes
7a+" pro+ess +reates its s"are of list7a+" pro+ess does t"is
7a+" pro+ess marks its s"are of list
8ro+ess onl9
(+) 8ro+ess broad+asts kto rest of pro+esses
#. Redu+tion to determine number of primes found
-
7/24/2019 Chpt5 (1)
38/68
9@
Function !"#$cast
in. 0I_B2. (
3"id *45ffe, /* 6dd "f 1. e7e8en. */
in. "5n., /* # e7e8en. ."4"2d2. */
0I_92.2.:pe d2.2.:pe, /* ;:pe "fe7e8en. */
in. ""., /* I9 "f "". p"e */
0I_C"88 "88) /* C"885ni2." */
0I_B2. (
-
7/24/2019 Chpt5 (1)
39/68
90
TaskHChannel &raph for 1 !rocesses
e4 are "HOchannels
$lack are use4
for the
re4uction step)
-
7/24/2019 Chpt5 (1)
40/68
1>
TaskHChannel o4el A44e4 Assumption
The analysis of alorithms typically performe4 assumes
that this mo4el supports the concurrent transmission ofmessaes from multiple tasks+ as lon as
6 they use 4ifferent channels
6 no t3o active channels have the same source or4estination)
This is claime4 to be a reasonable assumption
6 base4 on current commercial systems
6 for some clusters
This is not a reasonable assumption for net3orks of
3orkstations connecte4 by hub or any communicationssystems supportin only one messae at a time)
See Ch) 9+ p @@ of uinn(s te
-
7/24/2019 Chpt5 (1)
41/68
1/
Analysis
'i)e)+Uki( is time nee4e4 to mark a cell
Se:uential e
-
7/24/2019 Chpt5 (1)
42/68
12
Co4e for Sieve of Eratosthenes'Complete co4e starts on pae /21
#in75de '8pi>?@#in75de '82.?>?@#in75de '.di">?@#in75de A:0I>?A#define IN(2,4) ((2)'(4)(2)(4))
y!")h is a hea4er file containin the macros 3e are
nee4in an4 function prototypes for the utilities 3e are
4evelopin)
uinn inclu4es some other macros in y!")h that are
nee4e4 for later prorams in for this book)
After this+ 3e 3ill al3ays inclu4e this file in our co4e)
-
7/24/2019 Chpt5 (1)
43/68
19
in. 82in (in. 2, ?2 *23D)
>>> /* B5n? "f d2.2 de722.i"n ?ee */ 0I_Ini. (
-
7/24/2019 Chpt5 (1)
44/68
11
Capturin Comman4 ine Walues
Example: Invoking the UNIX compiler mpicc
mpicc -o myprog myprog.c
3oul4 result in the follo3in values bein passe4 to
mpicc %argc 1 i)e) number of tokens on
comman4 line 6 an int
argv[0] mpicc each arvIiJ is a chararray argv[1] -o
argv[2] myprog i.e. name !or o"#ect$le
argv[%] myprog.c i.e. &o'rce $le
if ( F )
-
7/24/2019 Chpt5 (1)
45/68
15
if (2 F$ ) if (Fid) pin.f (AC"882nd 7ine
'8@\nA, 23D%)&
0I_in27ie()&e!i. (1)&
n $ 2."i(23D1)&De are assumin the user 3ill specify the upper
rane of the sieve as a comman4 line arument+ e))+ sieve />>>
"f this arument is missin 'arc B 2+ 3e
terminate the processin an4 return a / 'e
-
7/24/2019 Chpt5 (1)
46/68
1-
7"J_3275e $ + BLOCK_LOW(id,p,n-1)& ?i?_3275e $ + BLOCK_HIGH(id,p,n-1)&
ie $ BLOCK_SIZE(id,p,n-1)&
De use the macros 4efine4 to 4o the block
4ecomposition use4 by metho4 2)emember these are 4efine4 in the hea4er
file y!")h
De 3ill ive each process a contiuous
block of the array that 3ill store the marks)Walues above can 4iffer for processes since
they have 4ifferent i4 numbers)
-
7/24/2019 Chpt5 (1)
47/68
1,
p"%_ie $ (n-1)/p&
if (( + p"%_ie) ' (in.).((d"547e) n)) if (Fid) pin.f (A;"" 82n:
p"ee\nA)& 0I_in27ie()&
e!i. (1)&
ecall+ this alorithm 3orks only if the s:uare
of the larest value in process > is reater than the
upper limit of the sieve)This co4e checks for that an4 e
-
7/24/2019 Chpt5 (1)
48/68
1@
82=ed $ (?2 *) 8277" (ie)& if (82=ed $$ NLL) pin.f (AC2nn". 277"2.e en"5?
8e8":\nA)& 0I_in27ie()& e!i. (1)&
This allocates memory for the process( share
of the array+ 3ith Xmarke4K a pointer to a char array
A byte is the smallest unit of memory that can
be in4e
-
7/24/2019 Chpt5 (1)
49/68
10
f" (i $ %& i ' ie& i++) 82=edDi $ %&
At last+ 3e have step / of the alorithm
if (Fid) inde! $ %& pi8e $ &
This looks strane+ but the variable in4e< is only the in4e< in
the array of process >)
De con4itionali*e its initiali*ation to process > to emphasi*e
this) Only the i4 of > 3ill make this true)
"t is a oo4 i4ea to 4o this to keep straiht the local an4 lobal
in4ices)
Each process setsprimeto 2) This is step 2 of alorithm
-
7/24/2019 Chpt5 (1)
50/68
5>
d" if (pi8e * pi8e @ 7"J_3275e) fi. $ pi8e * pi8e - 7"J_3275e& e7e
if (F(7"J_3275e pi8e))fi. $ %&
e7e fi. $ pi8e - (7"J_3275e pi8e)&
This is step 9 in the se:uential alorithm)
De nee4 to 4etermine the 'local in4e< correspon4in to
the first inteer nee4in markin)
Y is the mo4ulo operator in C G returns the remain4er
"f the remain4er is >+ then 3e start markin at >+
other3ise 3e move in to the first multiple of prime)
-
7/24/2019 Chpt5 (1)
51/68
5/
f" (i $ fi.& i ' ie& i +$ pi8e)82=edDi $ 1&
This loop 4oes the sievin)Each process marks the multiples of the current
prime number from the first in4e< throuh the en4 of the
array)
This completes step 9'a
if (Fid) J?i7e (82=edD++inde!)&
pi8e $ inde! + & !rocess > no3 fin4s the ne
-
7/24/2019 Chpt5 (1)
52/68
52
0I_B2. (
-
7/24/2019 Chpt5 (1)
53/68
59
0I_ed5e (Mf\nA,e72ped_.i8e)&
0I_in27ie ()& e.5n %& Turn off timer+ print the results+ an4 finali*e)
$ h ki
-
7/24/2019 Chpt5 (1)
54/68
51
$enchmarkin
Test case% Fin4 all primes />> million
un se:uential alorithm on one processor
;etermine in nanosecon4s by
6 This assumes comple
-
7/24/2019 Chpt5 (1)
55/68
55
$enchmarkin 'cont)
Estimate runnin time of parallel alorithm bysubstitutin an4 into e
-
7/24/2019 Chpt5 (1)
56/68
5-
E> 21)0>>
2 /2),2/ /9)>//
9 @)@19 0)>90
1 -),-@ ,)>55
5 5),01 5)009
- 1)0-1 5)/50
, 1)9,/ 1)-@,
@ 9)02, 1)222
Observation:As illustrate4 in Fi 5),+ this is a very close
appro
-
7/24/2019 Chpt5 (1)
57/68
5,
"mprovements
;elete even inteers
6 Cuts number of computations in half
6 Frees storae for larer values of n
6 Cuts the e
-
7/24/2019 Chpt5 (1)
58/68
5@
eorani*e oops
Suppose cache has 1 lines of 1 bytes each) So line / hol4s 9+5+,+0
line 2 hol4s //+/9+/5+/, etc)
Then if 3e sieve all the multiples of one prime before 4ointhe ne
-
7/24/2019 Chpt5 (1)
59/68
50
eorani*e oops
7o3 use @ bytes in t3o cache lines an4 sieve multiples ofall primes for the first @ bytes before oin to the ne
-
7/24/2019 Chpt5 (1)
60/68
->
Comparin 'as sho3n in te
-
7/24/2019 Chpt5 (1)
61/68
-/
Comparin 1 Wersions
rocs !ie"e1
!ie"e 2 !ie"e # !ie"e 4
/ 21)0>> /2)29, /2)1-- 2)519
2 /2),2/ -)->0 -)9,@ /)99>
9 @)@19 5)>/0 1)2,2 >)0>/
1 -),-@ 1)>,2 9)2>/ >)-,0
5 5),01 9)-52 2)550 >)519
- 1)0-1 9)2,> 2)/2, >)15-
, 1)9,/ 9)>50 /)@2> >)90/
@ 9)02, 2)@5- /)5@5 >)912
1;fold improement
%;fold improement
Note: &raphical 4isplay of this chart in Fi) 5)/>
-
7/24/2019 Chpt5 (1)
62/68
-2
Summary
Sieve of Eratosthenes% parallel 4esin
uses 4omain 4ecomposition
Compare4 t3o block 4istributions
6 Chose one 3ith simpler formulas
"ntro4uce40I_B2.
Optimi*ations reveal importance of
ma
-
7/24/2019 Chpt5 (1)
63/68
-9
7e3 Sieve aterial A44e4
eference% !arallel Computin% Theory an4
!ractice+ Secon4 E4ition+ ichael uinn+
c&ra3.8ill+ /001+ paes />./9)
The follo3in sli4es are not from material in our
current te
-
7/24/2019 Chpt5 (1)
64/68
-1
Sieve of EratosthenesA Control.!arallel Approach
$ata parallelismrefers to usin multiple !Es to
apply the samese:uence of operations to
4ifferent 4ata elements)
%unctional or control parallelisminvolves
applyin a 4ifferentse:uence of operations to
4ifferent 4ata elements
o4el assume4 for this e
-
7/24/2019 Chpt5 (1)
65/68
-5
A Control !arallel Sieve Approach
Each processor repeats the follo3in t3o step process%
6 "4entify the ne
-
7/24/2019 Chpt5 (1)
66/68
--
Control !arallel Sieve 'cont)
$asic alorithm for share4 memory ";
/) !rocessor accesses variable hol4in current prime2) Searches for ne
-
7/24/2019 Chpt5 (1)
67/68
-,
!arallel Spee4up etric
'"nitial Overvie3
A measure of the increase in runnin time
4ue to parallelism)
Spee4up B 'se:uential timeH'parallel time6 The se:uential time is the 3orst case
se:uential runnin time
6The parallel time is the 3orst case parallelrunnin time)
-
7/24/2019 Chpt5 (1)
68/68
8o3 uch Spee4up is !ossibleN
Suppose n B />>> Se:uential alorithm6 Time to strike out multiples of prime p is
'n?/. p2Hp6 ultiples of 2% ''/>>>?/ 61H2B00,H2B10@6 ultiples of 9% ''/>>>?/ 60H9B002H9B99>6 Total time to strike all prime multiples B /1//
i)e)+ number of XstepsK
2 !Es ives spee4up /1//H,>-B2)>>
9 !Es ives spee4up /1//H100B2)@9 9 !Es re:uire 100 strikeout time units+ so no
more spee4up is possible usin a44itional !Es6 ultiples of 2(s 4ominate 3ith 10@ strikeout steps