chpt5 (1)

Upload: light

Post on 22-Feb-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 Chpt5 (1)

    1/68

    Chapter 5

    The Sieve of Eratosthenes

  • 7/24/2019 Chpt5 (1)

    2/68

    2

    Chapter Objectives

    Analysis of block allocation schemes

    Function !"#$cast

    !erformance enhancements

    Focus !roblem% The &reek mathematicianEratosthenes 'Er()a)tas()the)ne*(+ 2,-./01 $C

    3ante4 to fin4 a 3ay of eneratin the prime

    numbers up to some number n)

    6 7o formula 3ill enerate these primes)

    6 8o3ever+ he 4evise4 a metho4 3hich has become

    kno3n as the sieve of Eratosthenes)

  • 7/24/2019 Chpt5 (1)

    3/68

    9

    Outline to the Solution

    The se:uential alorithm

    Sources of parallelism

    ;ata 4ecomposition options

    !arallel alorithm 4evelopment+ analysis

    An !" proram

    $enchmarkin

    Optimi*ations

  • 7/24/2019 Chpt5 (1)

    4/68

    1

    Sieve of Eratosthenes

    Se:uential Alorithm in !seu4oco4e

    1. Create a list of unmarked natural numbers

    2, 3, , n

    2. k23. Repeat

    (a) Mark all multiples of kbetween k2

    and n

    (b) Let ksmallest unmarkednumber > k

    until k2> n

    . !"e unmarked numbers are primes

  • 7/24/2019 Chpt5 (1)

    5/68

  • 7/24/2019 Chpt5 (1)

    6/68

    -

    ;ata Structure =se4 For

    Se:uential Alorithm

    Assume a $oolean array of n elements

    Array in4ices are > throuh n.2 an4 they

    represent the numbers 2+ 9+ )))+ n)

    The boolean value at in4e< i represents 3hether

    of not the number i?2 is marke4)

    6 "n4ices that are marke4 represent composite

    numbers 'i)e)+ not prime "nitially+ all numbers are unmarke4

  • 7/24/2019 Chpt5 (1)

    7/68

    ,

    One etho4 to !aralleli*e

    $ecause the focus of the alorithm is themarkin of elements in an array+ 4omain4ecomposition makes sense)

    ;omain 4ecomposition6 ;ivi4e 4ata into n./ pieces

    6Associate computational steps 3ith 4ata

    One primitive task per array element6 These 3ill be alomerate4 into larer roups

    of elements)

  • 7/24/2019 Chpt5 (1)

    8/68

    @

    !aralleli*in Alorithm Step 9'a

    ecall Step 9'a%9 a ark all multiples of kbet3een k2an4 n

    The follo3in straihtfor3ar4 mo4ification allo3s

    this to be compute4 in parallel%

    for allj3here k2jn4o

    ifjmo4 kB > then

    markj'i)e) it is not a prime en4if

    en4for

    Eachjabove represents a primitive task

  • 7/24/2019 Chpt5 (1)

    9/68

    0

    !aralleli*in Alorithm Step 9'b

    ecall Step 9'a%9 b Fin4 smallest unmarke4 number k

    !aralleli*in re:uires t3o steps%

    6 in.re4uction 'to fin4 smallest unmarke4

    number k

    6 $roa4cast 'to et result to all tasks

    !lus. remember these are in a repeat.until loop3hich loops until k2 n)

  • 7/24/2019 Chpt5 (1)

    10/68

    />

    &oo4 7e3s 6 $a4 7e3s

    De have foun4 lots of parallelism to e

  • 7/24/2019 Chpt5 (1)

    11/68

    //

    Alomeration &oals

    De 3ant to%

    6 Consoli4ate tasks

    6 e4uce communication cost

    6 $alance computations amon processes

    De often call the result of partitionin+

    alomeration+ an4 mappin the 4ata

    4ecomposition or just the 4ecomposition)

  • 7/24/2019 Chpt5 (1)

    12/68

    /2

    ;ata ;ecomposition Options

    /) "nterleave4 'cyclic6 ;ifferent !Es han4le the belo3 sets of

    inteers+ 3here p is the number of !Es%

    !>han4les 2+ 2?p+ 2?2p+ ))) +

    !/han4les 9+ 9?p+ 9?2p+))) +

    !2han4les 1+ 1?p+ 1?2p+ ))) +

    6 "t(s easy to 4etermine the o3ner or han4ler of

    each number% The number iis han4le4 by process 'i.2 mo4p

  • 7/24/2019 Chpt5 (1)

    13/68

    /9

    ;ata ;ecomposition Options

    1. Interleaved (cyclic) - continue4

    6 $ut+ this scheme lea4s to a loa4 imbalance for thisproblem.

    6 "f 3e are usin t3o processes+ process > marks the

    2.multiples amon even nrs 3hile process / marks

    2. multiples amon o44 nrs) !rocess > marks (n-1)/2elements G process / marks none)

    6 On the other han4+ for four processes+ process 2 is

    markin multiples of 1 3hich is 4uplicatin process >(s

    3ork)6 oreover+ fin4in the ne

  • 7/24/2019 Chpt5 (1)

    14/68

    /1

    ;ata ;ecomposition Options

    2) $lock

    6Array I/+nJ 3ill be 4ivi4e4 into p contiuous

    blocks of rouhly the same si*e for each !E

    6 De 3ant to balance the loa4s 3ith minimum

    4ifferences bet3een the processes)

    6 "t is not 4esirable to have some processes

    4oin no 3ork at all)6 De(ll tolerate the a44e4 complication to

    4etermine o3ner 3hen nnot a multiple ofp

  • 7/24/2019 Chpt5 (1)

    15/68

    /5

    $lock ;ecomposition Options

    De 3ant to balance the 3orkloa4 3hen n

    is not a multiple ofp

    Each process ets either n/por n/pelements

    De seek simple e

  • 7/24/2019 Chpt5 (1)

    16/68

    /-

    etho4 L/

    et r B nmo4p "f rB >+ all blocks have same si*e

    Else6 First rblocks have si*e n/p

    6 emaininp-rblocks have si*e n/p

    E

  • 7/24/2019 Chpt5 (1)

    17/68

    /,

    E

  • 7/24/2019 Chpt5 (1)

    18/68

    /@

    etho4 L/ Calculations

    et r B nmo4p

    The first element controlle4 by process i is

    E

  • 7/24/2019 Chpt5 (1)

    19/68

    /0

    etho4 L/ Calculations 'cont) 2H1

    et r B nmo4p ast element controlle4 by process i

    7ote this is just the element imme4iately beforethe first element controlle4 by process i? /)

    E)

    1),1min()1( +++ ripni

    1% elements diided amon* # pro+esses

  • 7/24/2019 Chpt5 (1)

    20/68

    2>

    etho4 L/ Calculations 'cont) 9H1

    et r B nmo4p

    !rocess controllin elementj

    E

  • 7/24/2019 Chpt5 (1)

    21/68

    2/

    etho4 L/ Calculations 'cont) 1H1

    Althouh 4erivin the e

  • 7/24/2019 Chpt5 (1)

    22/68

    22

    etho4 L2

    Scatters larer blocks amon processes

    6 7ot all iven to !Es 3ith lo3est in4ices

    First element controlle4 by process i 3ill be

    ast element controlle4 by process i 3ill be

    !rocess controllin elementj 3ill be

    pin

    1)1( + pni

    njp ,)1)1(( +

  • 7/24/2019 Chpt5 (1)

    23/68

    29

    etho4 L2 'cont) 2H9

    Scatters larer blocks amon processes

    6 7ot all iven to !Es 3ith lo3est in4ices

    E >

    / 9

    2 -

    9 />

    1 /9

    1% elements diided amon* # pro+esses

  • 7/24/2019 Chpt5 (1)

    24/68

    21

    etho4 L2 'cont) 9H9

    E

  • 7/24/2019 Chpt5 (1)

    25/68

    25

    Some E

  • 7/24/2019 Chpt5 (1)

    26/68

    2-

    Comparin etho4s

    Operations Method 1 Method 2

    o3 in4e< 1 2

    8ih in4e< - 1

    O3ner , 1

    -ssumin* no operations for floor/ fun+tion

    0ur +"oi+e

  • 7/24/2019 Chpt5 (1)

    27/68

    2,

    Another E

  • 7/24/2019 Chpt5 (1)

    28/68

    2@

    acros in C

    A macro 'in any lanuae is an in.line routine

    that is e

  • 7/24/2019 Chpt5 (1)

    29/68

    20

    Short if.then.else in C

    The construct in C of

    logical N if.part % then.part

    For e

  • 7/24/2019 Chpt5 (1)

    30/68

    9>

    E

  • 7/24/2019 Chpt5 (1)

    31/68

    9/

    ;efine $lock ;ecomposition acros

    #define BLOCK_LOW(id,p,n) \((id)*(n)/(p))

    &iven i4+ p+ an4 n+ this e

  • 7/24/2019 Chpt5 (1)

    32/68

    92

    ;efine $lock ;ecomposition acros

    #define BLOCK_SIZE(id,p,n) \

    (BLOCK_LOW((id)+1)- \BLOCK_LOW(id))

    &iven i4+ p+ an4 n this e

  • 7/24/2019 Chpt5 (1)

    33/68

    99

    ocal vs) &lobal "n4ices

    L 1

    L 1 2

    L 1

    L 1 2

    L 1 2

    1 2 3

    # $

    % & ' 1 11 12

    7ote% De nee4 to 4istinuish bet3een these)

  • 7/24/2019 Chpt5 (1)

    34/68

    91

    E

  • 7/24/2019 Chpt5 (1)

    35/68

    95

    Fast arkin

    $lock 4ecomposition allo3s for the same

    markin as the se:uential alorithm+ but it isspe4 up%

    De 4on(t check each array element to see if it is

    a multiple of k '3hich re:uires nHp mo4ulo

    operations 3ithin each block for each prime)

    "nstea4 3ithin each block

    6 Fin4 the first multiple of k, say cellj

    6 Then mark the cellsj+ j ? k+ j ? 2k+ j ? 9k+ This allo3s a loop similar to the one in the

    se:uential proram

    6e:uires about 'nHpk assinment statements)

  • 7/24/2019 Chpt5 (1)

    36/68

    9-

    ;ecomposition Affects "mplementation

    arest prime use4 by sieve alorithm is

    boun4e4 by n First process has nHpelements6 "f nHp n+ then the first process 3ill control

    all primes throuh n)6 7ormally nis much larer thanp, so this 3ill

    be the case)

    Conse:uently+ in this case+ the first process canbroa4cast the ne

  • 7/24/2019 Chpt5 (1)

    37/68

    9,

    Convert the Se:uential Alorithm to a

    !arallel Alorithm

    1. Create list of unmarked natural numbers 2, 3, , n

    2. k2

    3. Repeat

    (a) Mark all multiples of kbetween k2and n

    (b) ksmallest unmarked number > k

    until k2> n

    . !"e unmarked numbers are primes

    7a+" pro+ess +reates its s"are of list7a+" pro+ess does t"is

    7a+" pro+ess marks its s"are of list

    8ro+ess onl9

    (+) 8ro+ess broad+asts kto rest of pro+esses

    #. Redu+tion to determine number of primes found

  • 7/24/2019 Chpt5 (1)

    38/68

    9@

    Function !"#$cast

    in. 0I_B2. (

    3"id *45ffe, /* 6dd "f 1. e7e8en. */

    in. "5n., /* # e7e8en. ."4"2d2. */

    0I_92.2.:pe d2.2.:pe, /* ;:pe "fe7e8en. */

    in. ""., /* I9 "f "". p"e */

    0I_C"88 "88) /* C"885ni2." */

    0I_B2. (

  • 7/24/2019 Chpt5 (1)

    39/68

    90

    TaskHChannel &raph for 1 !rocesses

    e4 are "HOchannels

    $lack are use4

    for the

    re4uction step)

  • 7/24/2019 Chpt5 (1)

    40/68

    1>

    TaskHChannel o4el A44e4 Assumption

    The analysis of alorithms typically performe4 assumes

    that this mo4el supports the concurrent transmission ofmessaes from multiple tasks+ as lon as

    6 they use 4ifferent channels

    6 no t3o active channels have the same source or4estination)

    This is claime4 to be a reasonable assumption

    6 base4 on current commercial systems

    6 for some clusters

    This is not a reasonable assumption for net3orks of

    3orkstations connecte4 by hub or any communicationssystems supportin only one messae at a time)

    See Ch) 9+ p @@ of uinn(s te

  • 7/24/2019 Chpt5 (1)

    41/68

    1/

    Analysis

    'i)e)+Uki( is time nee4e4 to mark a cell

    Se:uential e

  • 7/24/2019 Chpt5 (1)

    42/68

    12

    Co4e for Sieve of Eratosthenes'Complete co4e starts on pae /21

    #in75de '8pi>?@#in75de '82.?>?@#in75de '.di">?@#in75de A:0I>?A#define IN(2,4) ((2)'(4)(2)(4))

    y!")h is a hea4er file containin the macros 3e are

    nee4in an4 function prototypes for the utilities 3e are

    4evelopin)

    uinn inclu4es some other macros in y!")h that are

    nee4e4 for later prorams in for this book)

    After this+ 3e 3ill al3ays inclu4e this file in our co4e)

  • 7/24/2019 Chpt5 (1)

    43/68

    19

    in. 82in (in. 2, ?2 *23D)

    >>> /* B5n? "f d2.2 de722.i"n ?ee */ 0I_Ini. (

  • 7/24/2019 Chpt5 (1)

    44/68

    11

    Capturin Comman4 ine Walues

    Example: Invoking the UNIX compiler mpicc

    mpicc -o myprog myprog.c

    3oul4 result in the follo3in values bein passe4 to

    mpicc %argc 1 i)e) number of tokens on

    comman4 line 6 an int

    argv[0] mpicc each arvIiJ is a chararray argv[1] -o

    argv[2] myprog i.e. name !or o"#ect$le

    argv[%] myprog.c i.e. &o'rce $le

    if ( F )

  • 7/24/2019 Chpt5 (1)

    45/68

    15

    if (2 F$ ) if (Fid) pin.f (AC"882nd 7ine

    '8@\nA, 23D%)&

    0I_in27ie()&e!i. (1)&

    n $ 2."i(23D1)&De are assumin the user 3ill specify the upper

    rane of the sieve as a comman4 line arument+ e))+ sieve />>>

    "f this arument is missin 'arc B 2+ 3e

    terminate the processin an4 return a / 'e

  • 7/24/2019 Chpt5 (1)

    46/68

    1-

    7"J_3275e $ + BLOCK_LOW(id,p,n-1)& ?i?_3275e $ + BLOCK_HIGH(id,p,n-1)&

    ie $ BLOCK_SIZE(id,p,n-1)&

    De use the macros 4efine4 to 4o the block

    4ecomposition use4 by metho4 2)emember these are 4efine4 in the hea4er

    file y!")h

    De 3ill ive each process a contiuous

    block of the array that 3ill store the marks)Walues above can 4iffer for processes since

    they have 4ifferent i4 numbers)

  • 7/24/2019 Chpt5 (1)

    47/68

    1,

    p"%_ie $ (n-1)/p&

    if (( + p"%_ie) ' (in.).((d"547e) n)) if (Fid) pin.f (A;"" 82n:

    p"ee\nA)& 0I_in27ie()&

    e!i. (1)&

    ecall+ this alorithm 3orks only if the s:uare

    of the larest value in process > is reater than the

    upper limit of the sieve)This co4e checks for that an4 e

  • 7/24/2019 Chpt5 (1)

    48/68

    1@

    82=ed $ (?2 *) 8277" (ie)& if (82=ed $$ NLL) pin.f (AC2nn". 277"2.e en"5?

    8e8":\nA)& 0I_in27ie()& e!i. (1)&

    This allocates memory for the process( share

    of the array+ 3ith Xmarke4K a pointer to a char array

    A byte is the smallest unit of memory that can

    be in4e

  • 7/24/2019 Chpt5 (1)

    49/68

    10

    f" (i $ %& i ' ie& i++) 82=edDi $ %&

    At last+ 3e have step / of the alorithm

    if (Fid) inde! $ %& pi8e $ &

    This looks strane+ but the variable in4e< is only the in4e< in

    the array of process >)

    De con4itionali*e its initiali*ation to process > to emphasi*e

    this) Only the i4 of > 3ill make this true)

    "t is a oo4 i4ea to 4o this to keep straiht the local an4 lobal

    in4ices)

    Each process setsprimeto 2) This is step 2 of alorithm

  • 7/24/2019 Chpt5 (1)

    50/68

    5>

    d" if (pi8e * pi8e @ 7"J_3275e) fi. $ pi8e * pi8e - 7"J_3275e& e7e

    if (F(7"J_3275e pi8e))fi. $ %&

    e7e fi. $ pi8e - (7"J_3275e pi8e)&

    This is step 9 in the se:uential alorithm)

    De nee4 to 4etermine the 'local in4e< correspon4in to

    the first inteer nee4in markin)

    Y is the mo4ulo operator in C G returns the remain4er

    "f the remain4er is >+ then 3e start markin at >+

    other3ise 3e move in to the first multiple of prime)

  • 7/24/2019 Chpt5 (1)

    51/68

    5/

    f" (i $ fi.& i ' ie& i +$ pi8e)82=edDi $ 1&

    This loop 4oes the sievin)Each process marks the multiples of the current

    prime number from the first in4e< throuh the en4 of the

    array)

    This completes step 9'a

    if (Fid) J?i7e (82=edD++inde!)&

    pi8e $ inde! + & !rocess > no3 fin4s the ne

  • 7/24/2019 Chpt5 (1)

    52/68

    52

    0I_B2. (

  • 7/24/2019 Chpt5 (1)

    53/68

    59

    0I_ed5e (Mf\nA,e72ped_.i8e)&

    0I_in27ie ()& e.5n %& Turn off timer+ print the results+ an4 finali*e)

    $ h ki

  • 7/24/2019 Chpt5 (1)

    54/68

    51

    $enchmarkin

    Test case% Fin4 all primes />> million

    un se:uential alorithm on one processor

    ;etermine in nanosecon4s by

    6 This assumes comple

  • 7/24/2019 Chpt5 (1)

    55/68

    55

    $enchmarkin 'cont)

    Estimate runnin time of parallel alorithm bysubstitutin an4 into e

  • 7/24/2019 Chpt5 (1)

    56/68

    5-

    E> 21)0>>

    2 /2),2/ /9)>//

    9 @)@19 0)>90

    1 -),-@ ,)>55

    5 5),01 5)009

    - 1)0-1 5)/50

    , 1)9,/ 1)-@,

    @ 9)02, 1)222

    Observation:As illustrate4 in Fi 5),+ this is a very close

    appro

  • 7/24/2019 Chpt5 (1)

    57/68

    5,

    "mprovements

    ;elete even inteers

    6 Cuts number of computations in half

    6 Frees storae for larer values of n

    6 Cuts the e

  • 7/24/2019 Chpt5 (1)

    58/68

    5@

    eorani*e oops

    Suppose cache has 1 lines of 1 bytes each) So line / hol4s 9+5+,+0

    line 2 hol4s //+/9+/5+/, etc)

    Then if 3e sieve all the multiples of one prime before 4ointhe ne

  • 7/24/2019 Chpt5 (1)

    59/68

    50

    eorani*e oops

    7o3 use @ bytes in t3o cache lines an4 sieve multiples ofall primes for the first @ bytes before oin to the ne

  • 7/24/2019 Chpt5 (1)

    60/68

    ->

    Comparin 'as sho3n in te

  • 7/24/2019 Chpt5 (1)

    61/68

    -/

    Comparin 1 Wersions

    rocs !ie"e1

    !ie"e 2 !ie"e # !ie"e 4

    / 21)0>> /2)29, /2)1-- 2)519

    2 /2),2/ -)->0 -)9,@ /)99>

    9 @)@19 5)>/0 1)2,2 >)0>/

    1 -),-@ 1)>,2 9)2>/ >)-,0

    5 5),01 9)-52 2)550 >)519

    - 1)0-1 9)2,> 2)/2, >)15-

    , 1)9,/ 9)>50 /)@2> >)90/

    @ 9)02, 2)@5- /)5@5 >)912

    1;fold improement

    %;fold improement

    Note: &raphical 4isplay of this chart in Fi) 5)/>

  • 7/24/2019 Chpt5 (1)

    62/68

    -2

    Summary

    Sieve of Eratosthenes% parallel 4esin

    uses 4omain 4ecomposition

    Compare4 t3o block 4istributions

    6 Chose one 3ith simpler formulas

    "ntro4uce40I_B2.

    Optimi*ations reveal importance of

    ma

  • 7/24/2019 Chpt5 (1)

    63/68

    -9

    7e3 Sieve aterial A44e4

    eference% !arallel Computin% Theory an4

    !ractice+ Secon4 E4ition+ ichael uinn+

    c&ra3.8ill+ /001+ paes />./9)

    The follo3in sli4es are not from material in our

    current te

  • 7/24/2019 Chpt5 (1)

    64/68

    -1

    Sieve of EratosthenesA Control.!arallel Approach

    $ata parallelismrefers to usin multiple !Es to

    apply the samese:uence of operations to

    4ifferent 4ata elements)

    %unctional or control parallelisminvolves

    applyin a 4ifferentse:uence of operations to

    4ifferent 4ata elements

    o4el assume4 for this e

  • 7/24/2019 Chpt5 (1)

    65/68

    -5

    A Control !arallel Sieve Approach

    Each processor repeats the follo3in t3o step process%

    6 "4entify the ne

  • 7/24/2019 Chpt5 (1)

    66/68

    --

    Control !arallel Sieve 'cont)

    $asic alorithm for share4 memory ";

    /) !rocessor accesses variable hol4in current prime2) Searches for ne

  • 7/24/2019 Chpt5 (1)

    67/68

    -,

    !arallel Spee4up etric

    '"nitial Overvie3

    A measure of the increase in runnin time

    4ue to parallelism)

    Spee4up B 'se:uential timeH'parallel time6 The se:uential time is the 3orst case

    se:uential runnin time

    6The parallel time is the 3orst case parallelrunnin time)

  • 7/24/2019 Chpt5 (1)

    68/68

    8o3 uch Spee4up is !ossibleN

    Suppose n B />>> Se:uential alorithm6 Time to strike out multiples of prime p is

    'n?/. p2Hp6 ultiples of 2% ''/>>>?/ 61H2B00,H2B10@6 ultiples of 9% ''/>>>?/ 60H9B002H9B99>6 Total time to strike all prime multiples B /1//

    i)e)+ number of XstepsK

    2 !Es ives spee4up /1//H,>-B2)>>

    9 !Es ives spee4up /1//H100B2)@9 9 !Es re:uire 100 strikeout time units+ so no

    more spee4up is possible usin a44itional !Es6 ultiples of 2(s 4ominate 3ith 10@ strikeout steps