lecture 20 - stanford universityweb.stanford.edu/class/cme213/files/lectures/lecture_20.pdf ·...

Post on 10-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CME213

EricDarve

SPRING 2017

LINEAR ALGEBRAMATRIX-VECTOR PRODUCTS

Applicationexample:matrix-vectorproduct

● WearegoingtousethatexampletoillustrateadditionalMPIfunctionalities.

● Thiswillleadustoprocessgroupsandtopologies.● First,wegoovertwoimplementationsthatusethe

functionalitieswehavealreadycovered.● Twosimpleapproaches:

• Rowpartitioningofthematrix,or• Columnpartitioning

Rowpartitioning

Thisisthemostnatural.

MatrixA Vectorb

Step1:replicateboneachprocess:MPI_Allgather()Step2:performproductSeeMPIcode:matvecrow/

Allgather()

Columnpartitioning

Step1:calculatepartialproductswitheachprocess

Partialproducts

MatrixA Vectorb

Columnpartitioning(cont’d)

● Step2:reduceallpartialresults:MPI_Reduce()● Step3:sendsub-blockstoallprocesses:MPI_Scatter()

● Stepsareverysimilartorowpartitioning.

VectorAb

Abetterpartitioning

● Ifthenumberofprocessesbecomeslargecomparedtothematrixsize,weneeda2Dpartitioning:

● Eachcoloredsquarecanbeassignedtoaprocess.● Thisallowsusingmoreprocesses.● Inaddition,atheoreticalanalysis(moreonthislater)showsthat

thisschemerunsfaster.

Outlineofalgorithm:step1

Firstcolumncontainsb

Sendbtothediagonalprocesses

Sendbdowneachcolumn.

Thisisabroadcastoperation.

Step2and3

● Step2:performmatrix-vectorproductlocally● Step3:reduceacrosscolumnsandstoreresultincolumn0.

Reductionacrosscolumns

Reduction:2nReduction:n/2

Communicationcost(inanutshell)Whyis2Dpartitioningbetter?

Larger blocks Narrow columns

Difficultieswith2Dpartitioning

● Thistypeofdecompositionbringssomedifficulties.● Weusedtwocollectiveoperations:

•Abroadcastinsideacolumn.•Areductioninsidearow.

● TodothisinMPI,weneedtwoconcepts:•Communicatorsorprocessgroups.Thisdefinesasubsetofalltheprocesses.Foreachsubset,collectiveoperationsareallowed,e.g.,broadcastforthegroupofprocessesinsideacolumn.•Processtopologies.Formatrices,thereisanatural2Dtopologywith(i,j)blockindexing.MPIsupportssuchgrids(anydimension).UsingMPIgrids(called“Cartesiantopologies”)simplifiesmanyMPIcommands.

PROCESS GROUPS AND COMMUNICATORS

Processgroups

● Groupsareneededformanyreasons.● Enablescollectivecommunicationoperationsacrossa

subsetofprocesses.● Allowstoeasilyassignindependenttaskstodifferent

groupsofprocesses.● Providesagoodmechanismtointegrateaparallellibrary

intoanMPIcode.

Groupsandcommunicators● Agroup isanorderedsetofprocesses.● Eachprocessinagroupisassociatedwithauniqueintegerrank.

RankvaluesstartatzeroandgotoN-1,whereNisthenumberofprocessesinthegroup.

● Agroupisalwaysassociatedwithacommunicatorobject.● Acommunicatorencompassesagroupofprocessesthatmay

communicatewitheachother.AllMPImessagesmustspecifyacommunicator.

● Forexample,thehandleforthecommunicatorthatcomprisesalltasksisMPI_COMM_WORLD.

● Fromtheprogrammer'sperspective,agroupandacommunicatorarealmostthesame.Thegrouproutinesareprimarilyusedtospecifywhichprocessesshouldbeusedtoconstructacommunicator.

● Processesmaybeinmorethanonegroup/communicator.Theyhaveauniquespecificrankwithineachgroup/communicator.

Mainfunctions

MPIprovidesover40routinesrelatedtogroups,communicators,andvirtualtopologies!int MPI_Comm_group(MPI_Comm comm, MPI_Group *group)

Returngroupassociatedwithcommunicator,e.g.,MPI_COMM_WORLD

int MPI_Group_incl(MPI_Group group, int p, int *ranks,

MPI_Group *new_group)

ranks integerarraywithpentries.

Createsanewgroup new_group withpprocesses,whichhaveranksfrom0top-1.Processi istheprocessthathasrankranks[i]ingroup.

int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *new_comm)

Newcommunicatorbasedongroup.SeeMPIcode:groups/

PROCESS TOPOLOGIES

Processtopologies

● Manyproblemsarenaturallymappedtocertaintopologiessuchasgrids.

● Thisisthecaseforexampleformatrices,orfor2Dand3Dstructuredgrids.

● ThetwomaintypesoftopologiessupportedbyMPIareCartesiangridsandgraphs.

● MPItopologiesallowsimplifyingmanycommonMPItasks.

● MPItopologiesarevirtual— theremaybenorelationbetweenthephysicalstructureofthenetworkandtheprocesstopology.

Advantagesofusingtopologies

● Convenience:virtualtopologiesmaybeusefulforapplicationswithspecificcommunicationpatterns.

● Communicationefficiency:aparticularimplementationmayoptimizetheprocessmappingbaseduponthephysicalcharacteristicsofagivenparallelmachine.• Forexamplenodesthatarenearbyonthegrid

(East/West/North/Southneighbors)maybecloseinthenetwork(lowestcommunicationtime).

● ThemappingofprocessesontoanMPIvirtualtopologyisdependentupontheMPIimplementation.

MPIfunctionsfortopologiesManyfunctionsareavailable.Weonlycoverthebasicones.int MPI_Cart_create(MPI_Comm comm_old, int ndims,

int *dims, int *periods, int reorder,

MPI_Comm *comm_cart)

ndims numberofdimensionsdims[i] sizeofgridalongdimensioni.Shouldnotexceedthenumberofprocessesincomm_old.Thearrayperiods isusedtospecifywhetherornotthetopologyhaswraparoundconnections.Ifperiods[i] isnon-zero,thenthetopologyhaswraparoundconnectionsalongdimensioni.reorder isusedtodetermineiftheprocessesinthenewgrouparetobereorderedornot.Ifreorder isfalse,thentherankofeachprocessinthenewgroupisidenticaltoitsrankintheoldgroup.

Example

Theprocessesareorderedaccordingtotheirrankrow-wiseinincreasingorder.

0(0,0)

1(0,1)

2(1,0)

3(1,1)

4(2,0)

5(2,1)

PeriodicCartesiangrids

● Wechoseperiodicityalongthefirstdimension(periods[0]=1)whichmeansthatanyreferencebeyondthefirstorlastentryofanyrowwillbewrappedaroundcyclically.

● Forexample,rowindexi=-1 ismappedintoi=2.

● Thereisnoperiodicityimposedontheseconddimension.Anyreferencetoacolumnindexoutsideofitsdefinedrangeresultsinanerror.Tryit!

Obtainingyourrankandcoordinatesint MPI_Cart_rank(MPI_Comm comm_cart,

int *coords, int *rank)int MPI_Cart_coords(MPI_Comm comm_cart, int rank,

int maxdims, int *coords)

● Thisallowsretrievingarankorthecoordinatesinthegrid.Thismaybeusefultogetinformationaboutotherprocesses.

● coords aretheCartesiancoordinatesofaprocess.

● Itssizeisthenumberofdimensions.● RememberthatthefunctionMPI_Comm_rank isstillavailableto

queryyourownrank.

● SeeMPIcode:mpi_cart/

Gettingtherankofyourneighborsint MPI_Cart_shift(MPI_Comm comm_cart, int dir,

int s_step, int *rank_source, int *rank_dest)

● dir direction

● s_step lengthshift

● rank_dest containsthegrouprankoftheneighboringprocessinthespecifieddimensionanddistance.

● rank_source istherankoftheprocessforwhichthecallingprocessistheneighboringprocessinthespecifieddimensionanddistance.

● Thus,thegroupranksreturnedinrank_dest andrank_source canbeusedasparametersforMPI_Sendrecv().

rank_destrank_sources_step = 4

SplittingaCartesiantopology

● ItisverycommonthatonewantstosplitaCartesiantopologyalongcertaindimensions.

● Forexample,wemaywanttocreateagroupforthecolumnsorrowsofamatrix.

int MPI_Cart_sub(MPI_Comm comm_cart,

int *keep_dims, MPI_Comm *comm_subcart)

● keep_dims booleanflagthatdetermineswhetherthatdimensionisretainedinthenewcommunicatorsorsplit,e.g.,iffalsethenasplitoccurs.

Example

x

y

z

keep_dims[] = {true, false, true}

keep_dims[] = {false, false, true}

Applicationexample:2Dpartitioning

Firstcolumncontainsb

Sendbtothediagonalprocesses

Sendbdowneachcolumn.Broadcast!

Startwith2Dcommunicator Usecolumngroup

2Dtopologyformatrix

Sendtodiagonalblock

Column-wisebroadcast

matvec2D

Reduction!Userowgroup

SeeMPIcode:matvec2D/

Codeforrowreduction

Topologiesforfinite-elementcalculations

● Atypicalsituationisthatprocessesneedtocommunicatewiththeirneighbors.

● Thisbecomescomplicatedtoorganizeforunstructuredgrids.● Inthatcase,graphtopologiesareveryconvenient.Theyallow

defininganeighborrelationshipinageneralway,usingagraph.Example:MPI_Graph_create

● Examplesofcollectivecommunications:• MPI_neighbor_allgather(): gatherdata,andallprocessesgettheresult• MPI_neighbor_alltoall(): processessendtoandreceivefromallneighborprocesses

top related