design parallel programs

Upload: thanhvan34

Post on 02-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Design Parallel Programs

    1/29

    Tran, Van Hoai

    Designing Parallel Programs

    Dr. Tran, Van HoaiDepartment of Systems & Networking

    Faculty of Computer Science and EngineeringHCMC University of Technology

    E-mail:[email protected]

    2009-2010

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    2/29

    Tran, Van Hoai

    Issues

    Considerations

    Parallel machine architectures Decomposition strategies

    Programming models

    Performance aspects: scalability, load balance Parallel debugging, analysis, tuning

    I/O on parallel machines

    Not easy to suggest a methodical approach

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    3/29

    Tran, Van Hoai

    Steps in Designing (I. Foster) Partitioning(phn hoch): decomposing the problem into small tasks

    which can be performed in parallel

    Communication(lin lc): determining communication structures, al-

    gorithms to coordinate tasks

    Agglomeration(kt t): combining the tasks into larger ones consid-ering performance requirements and implementation costs.

    Mapping(nh x): assigning tasks to processors to maximize processorutilization and to minimize communication costs.

    Problem Partition Communicate

    Map

    Agglomerate

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    4/29

    Tran, Van Hoai

    Other practical issues

    Data distribution: input/output & intermediate data

    Data access: management the access of shared data Stage synchronization

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    5/29

    Tran, Van Hoai

    Partition (Decomposition)Tasks:programmer-definedunits of computation

    Tasks can be executedsimultaneously

    Once defined, tasks areindivisibleunits of computation

    Fine-grained decomposition

    Two dimensions of decomposition:

    Domain decomposition: data associated with the problem

    Functional decomposition: computation operating on thedata

    Avoiding the replication

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    6/29

    Tran, Van Hoai

    Domain DecompositionSteps:

    Dividing the data into equally-sized small tasks

    Input/output & intermediate data Different partitions may be possible

    Different decompositions may exist for different phases

    Determing the operations of computation on each taskTask = (data,operations)

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    7/29

    Tran, Van Hoai

    Functional DecompositionSteps:

    Dividing the computation into disjoint tasks

    Examining data requirements of the tasksAvoiding data replication

    Hydrology Model Ocean Model

    Atmospheric Model

    Land Surface Model

    Climate model

    Search tree can be con-sidered as functional de-

    composition

    Functional decomposition is a program structuring tech-nique (modularity)Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    8/29

    Tran, Van Hoai

    Decomposition Methods

    Domain decomposition (data decomposition)

    Functional decomposition (task decomposition) Recursive decomposition

    Exploratory decomposition

    Speculative decomposition

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    9/29

    Tran, Van Hoai

    Recursive Decomposition

    Suitable for problems that can be solved using the divide-

    and-conquer paradigmEach of thesubproblemsgenerated by divide step becomes

    tasks

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    10/29

    Tran, Van Hoai

    Quick Sort

    QUICKSORT( A, q, r)if q < rthen

    x:= A[q]s:= q

    for i:= q+ 1to rdoif A[i] xthen

    s:= s+ 1

    swap( A[s], A[i])end if

    end forswap( A[q], A[s])

    QUICKSORT( A, q, s)QUICKSORT( A, s+ 1, r)end if

    3 2 1 5 8 4 3 7

    1 2 3 8 4 3 7

    1 2 3 3 4 5 8 7

    1 2 3 4 5 7 83

    1 2 3 3 4 5 7 8

    5 Pivot

    Final position

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    11/29

    Tran, Van Hoai

    Quick Sort

    5 11211 10 6 8 3 7 4 9 2

    3 4 21

    1 2 3 4

    1 2 3 4

    5 12 11 10 6 8 7 9

    5 6 8 7 9 12 1110

    5 6 7 8

    5 6 7 8

    9 10 12 11

    10 1211

    11 12

    Quicksort task-dependency graph based on recursive decomposition

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    12/29

    Tran, Van Hoai

    Minimum Finding

    Possibly use divide-and-conquer algorithms to solve problems which aretraditionally solved by non divide-and-conquer approaches

    FINDMIN( A)min:= A[0]

    s:= q

    for i:= 1to n 1doif A[i]< minthen

    min:= A[i]

    end ifend for

    RECURSIVE_MINFIND(A, n)if n= 1then

    min:= A[0]

    else

    lmin:= RECURSIVE_FINDMIN( A, n/2)rmin:=RECURSIVE_FINDMIN( &A[n/2], nn/2)if lmin < rminthen

    min:=lminelse

    min:=rmin

    end if

    end if

    Parallel Computing 2009-2010

    T V H i

  • 8/11/2019 Design Parallel Programs

    13/29

    Tran, Van Hoai

    Domain Decomposition

    Operating on large amounts of data

    Often performed in two steps: Partitioning the data

    Inducing the computational partitioning from the data partitioning

    Data to be partitioned: input/out/intermediate

    Parallel Computing 2009-2010

    T V H i

  • 8/11/2019 Design Parallel Programs

    14/29

    Tran, Van Hoai

    Domain DecompositionDense matrix-vector multiplication

    task 1

    task 2

    task n

    A b y

    21 n

    3-D grid decomposition

    Parallel Computing 2009-2010

    Tran Van Hoai

  • 8/11/2019 Design Parallel Programs

    15/29

    Tran, Van Hoai

    Matrix-Matrix Multiplication

    Partitioning the output data

    A11 A12

    A21 A22

    . B

    11 B12

    B21 B22

    C

    11 C12

    C21 C22

    Partitioning

    Task 1: C11=A11B11+A12B21

    Task 2: C12=A11B12+A12B22

    Task 3: C21=A21B11+A22B21

    Task 4: C22=A21B12+A22B22

    Parallel Computing 2009-2010

    Tran Van Hoai

  • 8/11/2019 Design Parallel Programs

    16/29

    Tran, Van Hoai

    Matrix-Matrix Multiplication

    There are different decompositions of computationsDecomposition 1

    Task 1: C11=A11B11

    Task 2: C11=C11+A12B21

    Task 3: C12=A11B12Task 4: C12=C12+A12B22

    Task 5: C21=A21B11

    Task 6: C21=C21+A22B21Task 7: C22=A21B12

    Task 8: C22=C22+A22B22

    Decomposition 2

    Task 1: C11=A11B11

    Task 2: C11=C11+A12B21

    Task 3: C12=A12B22Task 4: C12=C12+A11B12

    Task 5: C21=A22B21

    Task 6: C21=C21+A21B11Task 7: C22=A21B12

    Task 8: C22=C22+A22B22

    Parallel Computing 2009-2010

    Tran Van Hoai

  • 8/11/2019 Design Parallel Programs

    17/29

    Tran, Van Hoai

    Matrix-Matrix MultiplicationPartitioning the intermediate dataStage 1

    A11 A12A21 A22

    . B11 B12

    B21 B22

    D111 D112

    D122 D122

    D211 D212

    D222 D222

    Stage 2

    D111 D112

    D122 D122

    +

    D211 D212

    D222 D222

    C11 C12

    C21 C22

    Parallel Computing 2009-2010

    Tran Van Hoai

  • 8/11/2019 Design Parallel Programs

    18/29

    Tran, Van Hoai

    Matrix-Matrix MultiplicationA decomposition induced by a partitioning ofD

    Task 01: D111=A11B11

    Task 02: D211=A12B21

    Task 03: D112=A11B12Task 04: D212=A12B22

    Task 05: D121=A21B11

    Task 06: D221=A22B21

    Task 07: D122=A21B12

    Task 08: D222=A22B22

    Task 09: C11=D111+D211

    Task 10: C12=D112+D212Task 11: C12=D121+D221

    Task 12: C22=D122+D222

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    19/29

    Tran, Van Hoai

    Matrix-Matrix Multiplication

    A 21

    A 11 B 11 B 12

    D 121 D 122

    D 111 D 112

    A 22

    A 12

    B 21 B 22 D 221 D 222

    D 211 D 212

    C 22C 21

    C12

    C11

    1 2 3 4 5 6 7 8

    9 10 11 12

    +

    Taskdependency graph

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    20/29

    ,

    Domain DecompositionMost widely-used decomposition technique

    Large problems often have large amounts of data

    Splitting work based on data is natural way to obtain a high concur-rency

    Can be combined with other methods

    2

    2 1

    1

    3 7 9 11 4 5 8 1 Domain decomposition

    Recursive Decomposition

    10 6 137 19 3 9

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    21/29

    Exploratory Decomposition

    Decomposing computations corresponding to a search of

    a space of solutionsNot as general purpose

    Possibly resulting in speedup anomalies Slow-down or superlinear speedup

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    22/29

    Solution

    Total sequential work: 2m+1

    Total parallel work: 1 Total parallel work: 4mTotal sequential work: m

    m m m m m m m m

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    23/29

    Speculative DecompositionExtracting concurrency in problems in which next steps is

    one of many possible actions that can only be determined

    when the current task finishesPrinciple:

    Assuming a certain outcome of currently executed tasks

    Executing some of the next steps (speculation)

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    24/29

    A

    B

    C

    D

    E

    F

    G

    H

    ISysteminouts

    SystemOutput

    Network of discrete event simulation

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    25/29

    Speculative ExecutionIf predictions are wrong

    Work is wasted

    Work may need to beundonestate-restoration overhead

    Onlyway to extract concurrency

    Parallel Computing 2009-2010

  • 8/11/2019 Design Parallel Programs

    26/29

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    27/29

    CommunicationCommunication is specified in 2 phases:

    Defining channel structure, (technology-dependent)

    Specifying messages sent and received

    Determining communication requirements in functional de-composition is easier than in domain decomposition

    Data requirements among graph is presented astask-dependencygraph(TDG): certain task(s) can only start once someother task(s) have finished

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    28/29

    Task-Dependency GraphKey concepts derived from task-dependency graph

    Degree of concurrency: number of tasks can be concur-

    rently executedCritical path: the longest vertex-weighted path

    weights represent task size

    Task granularityaffects both of the characteristics above

    Parallel Computing 2009-2010

    Tran, Van Hoai

  • 8/11/2019 Design Parallel Programs

    29/29

    Task-Interaction Graph (TIG)Capture the pattern of interaction between tasks

    TIG usually contains TDG as a subgraph

    i.e., there may be interactions between tasks even if there are nodependencies (e.g., accesses on shared data)

    70 1 2 3 4 5 6 98 1011

    Task 11

    Task 8

    Task 4

    Task 0

    0

    12

    3

    4

    56 7

    8 11109

    TDG and TIG are important in developing effectivelymapping (maximize concurreny and minimize overheads)

    Parallel Computing 2009-2010