topology-aware optimization of communications for parallel ... · tania malik (ucd hcl)...

70
Topology-aware Optimization of Communications for Parallel Matrix Multiplication on Hierarchical Heterogeneous HPC Platforms Tania Malik, Vladimir Rychkov, Alexey Lastovetsky, Jean-No¨ el Quintin Heterogeneous Computing Laboratory University College Dublin, Ireland Heterogeneity in Computing Workshop Phoenix-Arizona, USA 19-25 May, 2014 Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 1 / 26

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization of Communications forParallel Matrix Multiplication on Hierarchical

Heterogeneous HPC Platforms

Tania Malik, Vladimir Rychkov, Alexey Lastovetsky, Jean-Noel Quintin

Heterogeneous Computing LaboratoryUniversity College Dublin, Ireland

Heterogeneity in Computing WorkshopPhoenix-Arizona, USA

19-25 May, 2014

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 1 / 26

Page 2: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Outline

Motivation

Problem Formulation

Topology-aware Communication Optimization Approach

Cost functionHeuristic

Experiments

Conclusion

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 2 / 26

Page 3: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

Introduction

For efficient execution of data-parallel applications on HPC platform:

Balance the load between processorsOptimize communication cost

Communications on heterogeneous platform involve:

Multiple message hopsNon-optimal routesTraffic congestionSignificantly affect performance

With topology information, communication operations can beoptimized

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 3 / 26

Page 4: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

Introduction

For efficient execution of data-parallel applications on HPC platform:

Balance the load between processorsOptimize communication cost

Communications on heterogeneous platform involve:

Multiple message hopsNon-optimal routesTraffic congestionSignificantly affect performance

With topology information, communication operations can beoptimized

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 3 / 26

Page 5: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

Topology-Aware Optimisation of Communications

Number of topology-aware MPI collective operations have beenproposed for optimal scheduling of messages

Improves communication performanceNon-intrusive to source code

Applicable to collective operations onlyDoes not affect point-to-point exchanges

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 4 / 26

Page 6: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

Topology-Aware Optimisation of Communications

Number of topology-aware MPI collective operations have beenproposed for optimal scheduling of messages

Improves communication performanceNon-intrusive to source codeApplicable to collective operations onlyDoes not affect point-to-point exchanges

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 4 / 26

Page 7: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

What To Do

To address the problem of communication optimization in suchdata-parallel MPI applications, must take into account:

Topology informationApplication communication flow

Choose specific parallel application

Matrix multiplication based on the Scalable Universal MatrixMultiplication Algorithm (SUMMA)

Target dedicated heterogeneous HPC platforms with networkhierarchy

Interconnected clusters

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26

Page 8: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

What To Do

To address the problem of communication optimization in suchdata-parallel MPI applications, must take into account:

Topology informationApplication communication flow

Choose specific parallel application

Matrix multiplication based on the Scalable Universal MatrixMultiplication Algorithm (SUMMA)

Target dedicated heterogeneous HPC platforms with networkhierarchy

Interconnected clusters

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26

Page 9: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Motivation

What To Do

To address the problem of communication optimization in suchdata-parallel MPI applications, must take into account:

Topology informationApplication communication flow

Choose specific parallel application

Matrix multiplication based on the Scalable Universal MatrixMultiplication Algorithm (SUMMA)

Target dedicated heterogeneous HPC platforms with networkhierarchy

Interconnected clusters

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26

Page 10: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Problem Formulation

Select parallel matrix multiplication application for heterogeneousplatform based on SUMMA

SUMMA originally designed for homogeneous platformCommunication flow consists of multiple broadcasts

Assuming workload is already balanced

Existing load balancing algorithm are oblivious to network topology

Rearrange existing heterogeneous data partition based on networktopology and application communication flow

Approach is non-intrusive to the source code butapplication-specific

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 6 / 26

Page 11: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Problem Formulation

Select parallel matrix multiplication application for heterogeneousplatform based on SUMMA

SUMMA originally designed for homogeneous platformCommunication flow consists of multiple broadcasts

Assuming workload is already balanced

Existing load balancing algorithm are oblivious to network topology

Rearrange existing heterogeneous data partition based on networktopology and application communication flow

Approach is non-intrusive to the source code butapplication-specific

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 6 / 26

Page 12: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Communication Flow of Heterogeneous SUMMA

A B

Figure : Communication flow of heterogeneous SUMMA: one-to-all

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 7 / 26

Page 13: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Load Balancing

Number of partitioning algorithms exist for efficient load balancing

Column-Based Partitioning

(Kalinov and Lastovetsky 1999) (KL)

Minimising Total Communication Volume

(Beaumont, Boudet, Rastello, Robert, 2001) (BR)

1D Functional Performance Model-based Partitioning

(Lastovetsky, Reddy, 2007) (FPM1D)

2D Functional Performance Model-based Matrix PartitioningAlgorithm

Clarke, Lastovetsky, Rychkov, 2011 (FPM-BR)

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 8 / 26

Page 14: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Communication Flow of Heterogeneous SUMMA

A B

Figure : Communication flow of heterogeneous SUMMA implementing FPM-BR:ring

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 9 / 26

Page 15: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Comparison of some SUMMA-based algorithms

Table : Comparison of some SUMMA-based algorithms

Algorithm Data partitioning Communication vol. Communication flow

SUMMA homogeneous – broadcastsBR constant speeds min nb-p2p one-to-allFPM-BR speed functions min nb-p2p one-to-all/ring

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 10 / 26

Page 16: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Matrix Partitioning Algorithm

FPM-BR algorithm:

Balances the workloadMinimizes the total volume of communication

However, none of the Matrix Multiplication load balancingalgorithms takes into account the underlying networks topology

Goal is to reduce communication cost of the parallel application thatimplements the FPM-BR matrix multiplication algorithm

Rearrange existing heterogeneous data partition based onnetwork topology and application communication flow

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 11 / 26

Page 17: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Matrix Partitioning Algorithm

FPM-BR algorithm:

Balances the workloadMinimizes the total volume of communication

However, none of the Matrix Multiplication load balancingalgorithms takes into account the underlying networks topology

Goal is to reduce communication cost of the parallel application thatimplements the FPM-BR matrix multiplication algorithm

Rearrange existing heterogeneous data partition based onnetwork topology and application communication flow

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 11 / 26

Page 18: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Matrix Partitioning Algorithm

FPM-BR algorithm:

Balances the workloadMinimizes the total volume of communication

However, none of the Matrix Multiplication load balancingalgorithms takes into account the underlying networks topology

Goal is to reduce communication cost of the parallel application thatimplements the FPM-BR matrix multiplication algorithm

Rearrange existing heterogeneous data partition based onnetwork topology and application communication flow

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 11 / 26

Page 19: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Matrix Partitioning Algorithm

FPM-BR algorithm:

Balances the workloadMinimizes the total volume of communication

However, none of the Matrix Multiplication load balancingalgorithms takes into account the underlying networks topology

Goal is to reduce communication cost of the parallel application thatimplements the FPM-BR matrix multiplication algorithm

Rearrange existing heterogeneous data partition based onnetwork topology and application communication flow

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 11 / 26

Page 20: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Exhaustive Search Partitions

Performed exhaustive search with all possible arrangements ofrectangles

Found several arrangements that reduced and increased communicationcost

Figure : Communication optimalarrangements

Figure : Worst case arrangements

Observed regularity in thecomm-optimal arrangementsrelated to the topology

Rectangles were grouped byclustersLess inter-cluster comm.

Table : Exhaustive search experimentalresults

Cost Exec time (sec)Worst case Optimal Worst case Optimal

Exhaustive search 89.80 73.59 6.00 2.78

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 12 / 26

Page 21: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Exhaustive Search Partitions

Figure : Communication optimalarrangements

Figure : Worst case arrangements

Observed regularity in thecomm-optimal arrangementsrelated to the topology

Rectangles were grouped byclustersLess inter-cluster comm.

Table : Exhaustive search experimentalresults

Cost Exec time (sec)Worst case Optimal Worst case Optimal

Exhaustive search 89.80 73.59 6.00 2.78

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 12 / 26

Page 22: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Exhaustive Search Partitions

Figure : Communication optimalarrangements

Figure : Worst case arrangements

Observed regularity in thecomm-optimal arrangementsrelated to the topology

Rectangles were grouped byclustersLess inter-cluster comm.

Table : Exhaustive search experimentalresults

Cost Exec time (sec)Worst case Optimal Worst case Optimal

Exhaustive search 89.80 73.59 6.00 2.78

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 12 / 26

Page 23: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Exhaustive Search Partitions

Figure : Communication optimalarrangements

Figure : Worst case arrangements

Observed regularity in thecomm-optimal arrangementsrelated to the topology

Rectangles were grouped byclustersLess inter-cluster comm.

Table : Exhaustive search experimentalresults

Cost Exec time (sec)Worst case Optimal Worst case Optimal

Exhaustive search 89.80 73.59 6.00 2.78

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 12 / 26

Page 24: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Exhaustive Search Partitions

Figure : Communication optimalarrangements

Figure : Worst case arrangements

Observed regularity in thecomm-optimal arrangementsrelated to the topology

Rectangles were grouped byclustersLess inter-cluster comm.

Table : Exhaustive search experimentalresults

Cost Exec time (sec)Worst case Optimal Worst case Optimal

Exhaustive search 89.80 73.59 6.00 2.78

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 12 / 26

Page 25: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Exhaustive Search Partitions

Figure : Communication optimalarrangements

Figure : Worst case arrangements

Observed regularity in thecomm-optimal arrangementsrelated to the topology

Rectangles were grouped byclustersLess inter-cluster comm.

Table : Exhaustive search experimentalresults

Cost Exec time (sec)Worst case Optimal Worst case Optimal

Exhaustive search 89.80 73.59 6.00 2.78

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 12 / 26

Page 26: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Search Space Size

Column widths are different:

Cannot move a rectangle to another column unless the whole columnsare interchanged

In column, no restrictions on interchanges of rectangles

Let

c be the number of columnsri be the number of rectangles in column i , 1 ≤ i ≤ cNumber of combinations will be equal to the product r1!× . . .× rc !

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 13 / 26

Page 27: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

Search Space Size

Column widths are different:

Cannot move a rectangle to another column unless the whole columnsare interchanged

In column, no restrictions on interchanges of rectangles

Letc be the number of columnsri be the number of rectangles in column i , 1 ≤ i ≤ cNumber of combinations will be equal to the product r1!× . . .× rc !

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 13 / 26

Page 28: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

NP-Complete

Which arrangement of rectangles is communication-optimal?NP-complete problem

Exhaustive search can be avoidable

By applying some heuristic that efficiently finds a near optimalarrangement

Requires to estimate the communication cost incurred byeach data partitioning

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 14 / 26

Page 29: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Problem Formulation

NP-Complete

Which arrangement of rectangles is communication-optimal?NP-complete problem

Exhaustive search can be avoidableBy applying some heuristic that efficiently finds a near optimalarrangement

Requires to estimate the communication cost incurred byeach data partitioning

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 14 / 26

Page 30: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost Function

Based on observation from exhaustive search

Propose cost function for FPM-BR

Ring Communication flowTwo level network Hierarchy

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 15 / 26

Page 31: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for Matrix A

Figure : Inter-cluster Communicationrelated to matrix A

Let

o= Overlaps of matrixrectangles

h= No. of inter-clusterCommunication

v= Height of overlap

costA =o∑

i=1h(i)× v(i)

Worst case:2× (11 + 3 + 3 + 3 + 4 + 2 + 6) = 64

Optimal:1×(6+8)+2×(1+9+2+6) = 50

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 16 / 26

Page 32: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for Matrix A

Figure : Inter-cluster Communicationrelated to matrix A

Let

o= Overlaps of matrixrectangles

h= No. of inter-clusterCommunication

v= Height of overlap

costA =o∑

i=1h(i)× v(i)

Worst case:2× (11 + 3 + 3 + 3 + 4 + 2 + 6) = 64

Optimal:1×(6+8)+2×(1+9+2+6) = 50

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 16 / 26

Page 33: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for Matrix A

Figure : Inter-cluster Communicationrelated to matrix A

Let

o= Overlaps of matrixrectangles

h= No. of inter-clusterCommunication

v= Height of overlap

costA =o∑

i=1h(i)× v(i)

Worst case:2× (11 + 3 + 3 + 3 + 4 + 2 + 6) = 64

Optimal:1×(6+8)+2×(1+9+2+6) = 50

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 16 / 26

Page 34: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for Matrix B

Figure : Inter-cluster Communicationrelated to matrix B

Let

c=Total columns

h= No. of inter-clusterCommunication

v= Column width

costB =c∑

i=1h(i)× v(i)

Worst case:(1× 12) + (2× 12) + (3× 9) = 63

Optimal:(1× 12) + (2× 12) + (2× 9) = 54

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 17 / 26

Page 35: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for Matrix B

Figure : Inter-cluster Communicationrelated to matrix B

Let

c=Total columns

h= No. of inter-clusterCommunication

v= Column width

costB =c∑

i=1h(i)× v(i)

Worst case:(1× 12) + (2× 12) + (3× 9) = 63

Optimal:(1× 12) + (2× 12) + (2× 9) = 54

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 17 / 26

Page 36: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for Matrix B

Figure : Inter-cluster Communicationrelated to matrix B

Let

c=Total columns

h= No. of inter-clusterCommunication

v= Column width

costB =c∑

i=1h(i)× v(i)

Worst case:(1× 12) + (2× 12) + (3× 9) = 63

Optimal:(1× 12) + (2× 12) + (2× 9) = 54

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 17 / 26

Page 37: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for M Arrangement

Use Euclidean norm

Represent combined cost and can be used to compare any twoarrangements

‖(costA(M), costB(M))‖Worst case:

√642 + 632 = 89.80

Optimal case:√

502 + 542 = 73.59

finding the communication-optimal arrangement can be formulated asminimization of the Euclidean norm:

‖(costA(M), costB(M))‖ → min

Use cost function in Heuristic

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 18 / 26

Page 38: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for M Arrangement

Use Euclidean norm

Represent combined cost and can be used to compare any twoarrangements

‖(costA(M), costB(M))‖Worst case:

√642 + 632 = 89.80

Optimal case:√

502 + 542 = 73.59

finding the communication-optimal arrangement can be formulated asminimization of the Euclidean norm:

‖(costA(M), costB(M))‖ → min

Use cost function in Heuristic

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 18 / 26

Page 39: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Topology-aware Optimization Approach

Cost function for M Arrangement

Use Euclidean norm

Represent combined cost and can be used to compare any twoarrangements

‖(costA(M), costB(M))‖Worst case:

√642 + 632 = 89.80

Optimal case:√

502 + 542 = 73.59

finding the communication-optimal arrangement can be formulated asminimization of the Euclidean norm:

‖(costA(M), costB(M))‖ → min

Use cost function in Heuristic

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 18 / 26

Page 40: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Propose heuristic to avoid too many combination

Permutation based on groups

Requires to test g2! + . . . + gc ! arrangements of submatrices

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 19 / 26

Page 41: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Propose heuristic to avoid too many combinationPermutation based on groups

Requires to test g2! + . . . + gc ! arrangements of submatrices

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 19 / 26

Page 42: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 20 / 26

Page 43: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 20 / 26

Page 44: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 20 / 26

Page 45: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 20 / 26

Page 46: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 20 / 26

Page 47: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 48: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 49: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 50: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 51: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 52: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 53: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 54: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 55: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-2

Figure :Permutation orderk=1

Figure :Permutation orderk=2

Accept c1 as optimal order

Generate group permutations gi !

For each permutation k = 1 to gi

Find k that has minimum cost functionfor extended sub-matrix

Cost function for k1=45 and k2=35

Add minimum k to resultingarrangement

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 21 / 26

Page 56: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-3

Figure : Permutationorder k=1

Figure : Permutationorder k=2

Repeat the same steps for all ccolumn

Cost function of k1=74 andk2=65

Choose k2 as optimal order

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 22 / 26

Page 57: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-3

Figure : Permutationorder k=1

Figure : Permutationorder k=1

Figure : Permutationorder k=2

Repeat the same steps for all ccolumn

Cost function of k1=74 andk2=65

Choose k2 as optimal order

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 22 / 26

Page 58: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-3

Figure : Permutationorder k=1

Figure : Permutationorder k=2

Figure : Permutationorder k=2

Repeat the same steps for all ccolumn

Cost function of k1=74 andk2=65

Choose k2 as optimal order

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 22 / 26

Page 59: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-3

Figure : Permutationorder k=1

Figure : Permutationorder k=2

Repeat the same steps for all ccolumn

Cost function of k1=74 andk2=65

Choose k2 as optimal order

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 22 / 26

Page 60: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-3

Figure : Permutationorder k=1

Figure : Permutationorder k=2

Repeat the same steps for all ccolumn

Cost function of k1=74 andk2=65

Choose k2 as optimal order

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 22 / 26

Page 61: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Heuristic

Heuristic for the Communication-Optimal Arrangement-3

Figure : Permutationorder k=1

Figure : Permutationorder k=2

Repeat the same steps for all ccolumn

Cost function of k1=74 andk2=65

Choose k2 as optimal order

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 22 / 26

Page 62: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Heterogeneous Inter-Cluster Experiments

Figure : Matrix partitioning for32 nodes

Figure : Matrix partitioning for90 nodes

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 23 / 26

Page 63: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Heterogeneous Inter-Cluster Experiments

Figure : Matrix partitioning for32 nodes

Figure : Matrix partitioning for90 nodes

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 23 / 26

Page 64: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Heterogeneous Inter-Cluster Experiments

Figure : Matrix partitioning for32 nodes

Figure : Matrix partitioning for90 nodesTania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 23 / 26

Page 65: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Heterogeneous Inter-Cluster Experiments

Table : Inter-cluster experimental results

Nodes Cost Exec time (sec) RatioOrig Heuristic Orig Heuristic

16 533 432 58.00 42.58 1.3632 868 710 119.30 88.30 1.3590 1719 1263 400.80 297.83 1.34

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 24 / 26

Page 66: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Homogeneous Inter-Node Experiment

Figure : Partitioning for 4 homogeneousmulti-core nodes

Table : Homogeneous inter-node experimental results

Nodes Cost Exec time (sec) RatioOrig Heuristic Orig Heuristic

4 336 199 3.85 3.17 1.21

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 25 / 26

Page 67: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Homogeneous Inter-Node Experiment

Figure : Partitioning for 4 homogeneousmulti-core nodes

Table : Homogeneous inter-node experimental results

Nodes Cost Exec time (sec) RatioOrig Heuristic Orig Heuristic

4 336 199 3.85 3.17 1.21

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 25 / 26

Page 68: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Homogeneous Inter-Node Experiment

Figure : Partitioning for 4 homogeneousmulti-core nodes

Table : Homogeneous inter-node experimental results

Nodes Cost Exec time (sec) RatioOrig Heuristic Orig Heuristic

4 336 199 3.85 3.17 1.21

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 25 / 26

Page 69: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Experimental Result

Homogeneous Inter-Node Experiment

Figure : Partitioning for 4 homogeneousmulti-core nodes

Table : Homogeneous inter-node experimental results

Nodes Cost Exec time (sec) RatioOrig Heuristic Orig Heuristic

4 336 199 3.85 3.17 1.21Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 25 / 26

Page 70: Topology-aware Optimization of Communications for Parallel ... · Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 5 / 26. Motivation What To Do To address

Conclusion

Conclusion

Heuristic approach for combinatorial problem

Prediction is based on topology and Communication flow

Minimize inter-cluster communication cost

Tania Malik (UCD HCL) Topology-aware Communication Optimization IPDPS 2014 26 / 26