laplace s equation – mpi + cuda
TRANSCRIPT
Laplace�sequation– MPI+CUDA• Letsfirstassumewedon’t
usedatatypes(insteadwe
willmanuallypack/unpack)
Datadistribution
Createdatatypes
Exchangedatawithneighbors
(north,south,east,west)
Dolocalcomputation
north
south
east
west
north
south
east
west
ProgrammingCUDA• Eachlittleboxisalightcomputationalthread
• Goingbacktothedatadistributionmethods(k-cyclic),
theCUDAworkdistributioncanbeassimilatedtoa<X-
cyclic,Y-cyclic>distribution
– Thegoalistoevenlydistributethecomputationalworkoverall
threadsavailableontheGPU
– Warp:agroupof32parallelthreads,thatexecutesexactly the
samething(butarenameddistinguishably)
– Awarpexecutesonecommoninstructionatatime(efficiency
requiresthatallthreadsinthewarpsdothesamething,takethe
samebranches,dothesameoperation).
– Ifmultipleexecutionpatharepossible(duetobranches),ifthe
decisiononwhichbranchtotakeisnotcompletebetweenthe
threadsallofthepossibleexecutionpathareexecuted!
• Thisalsohintthatatomicoperationsissuedbythreadsinawarpto
thesamememorylocationwouldbeexecutedsequentially
– occupancy
• Moreinfo@https://docs.nvidia.com/cuda/cuda-c-
programming-guide/
north
south
east
west
THREADS_PER_BLOCK_X
THREADS_PER_BLO
CK_Y
Debugging
• Commercialtools(DDT,TV,…)
• Ifpossibilitytoexportxterm:mpirun –np2xterm –egdb –args <myappargs>
• Ifnot,addasleep(oralooparoundasleepinyourapplications)anduse”gdb –p<pid>”toattachtoyourprocess(onceconnectedtothesamenodewheretheapplicationisrunning)
• gdb canexecuteGDBcommandsfromaFILE(with--command=FILE,-x)
Profiling
• Non-CUDAapplication:valgrind (free),or
vtune (Intel),Score-P,Tau,Vampir
• CUDAapplication:nvprof fromCUDA
Possiblecodeoptimizations
• CUDA:– Asthecomputationissymmetricalandhighlybalanced,onecanhave
adifferentworkdistributionanddomorecomputationsperthread
– Usesharedmemory
– Dividethecomputationsin2parts:whatneedsexternaldataandwhatdoesn’t.
• MPI:– Usedatatypes
– UseRMA
• Overlap communicationandcomputations
– Createaspecializedkerneltopackandunpackallthebordersinoneoperation
– Asstartingakernelhasahighlatencymergethispack/unpackkernelwiththeupdatesbasedontheghostregions