lecture 9: dense matrices and decomposition · 2018-04-30 · lecture 9: dense matrices and...

52
Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering Yonghong Yan [email protected] http://cse.sc.edu/~yanyh

Upload: truongnhan

Post on 21-Aug-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Lecture9:DenseMatricesandDecomposition

1

CSCE569ParallelComputing

DepartmentofComputerScienceandEngineeringYonghong Yan

[email protected]://cse.sc.edu/~yanyh

Page 2: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Review:ParallelAlgorithmDesignandDecomposition

• IntroductiontoParallelAlgorithms– TasksandDecomposition– ProcessesandMapping

• DecompositionTechniques– RecursiveDecomposition– DataDecomposition– ExploratoryDecomposition– HybridDecomposition

• CharacteristicsofTasksandInteractions– TaskGeneration,Granularity,andContext– CharacteristicsofTaskInteractions.

2

Page 3: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Decomposition,Tasks,andDependencyGraphs

• Decomposeworkintotasksthatcanbeexecutedconcurrently• Decompositioncouldbeinmanydifferentways.• Tasksmaybeofsame,different,orevenindeterminatesizes.• Taskdependencygraph:

– node=task– edge=controldependence,output-inputdependency– Nodependency==parallelism

3

Page 4: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

DegreeofConcurrency

• Definition:thenumberoftasksthatcanbeexecutedinparallel• Maychangeoverprogramexecution

• Metrics– Maximumdegreeofconcurrency

• Maximumnumberofconcurrenttasksatanypointduringexecution.

– Averagedegreeofconcurrency• Theaveragenumberoftasksthatcanbeprocessedinparallelovertheexecutionoftheprogram

• Speedup:serial_execution_time/parallel_execution_time• Inverserelationshipofdegreeofconcurrencyandtask

granularity– Taskgranularityé(lesstasks),degreeofconcurrencyê– Taskgranularityê(moretasks),degreeofconcurrencyé

4

Page 5: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

CriticalPathLength

• Adirectedpath: asequenceoftasksthatmustbeserialized– Executedoneafteranother

• Criticalpath:– Thelongestweightedpaththroughoutthegraph

• Criticalpathlength:shortesttimeinwhichtheprogramcanbefinished– Lowerboundonparallelexecutiontime

5

Abuildingproject

Page 6: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

CriticalPathLengthandDegreeofConcurrencyDatabasequerytaskdependencygraph

Questions:Whatarethetasksonthecriticalpathforeachdependencygraph?Whatistheshortestparallelexecutiontime?Howmanyprocessorsareneededtoachievetheminimumtime?Whatisthemaximumdegreeofconcurrency?Whatistheaverageparallelism(averagedegreeofconcurrency)?

Totalamountofwork/(criticalpathlength)2.33(63/27)and1.88(64/34)

6

Page 7: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

• Finertaskgranularityèmoreoverheadoftaskinteractions– Overheadasaratioofusefulworkofatask

• Example:sparsematrix-vectorproductinteractiongraph

• Assumptions:– eachdot(A[i][j]*b[j])takesunittimetoprocess– eachcommunication(edge)causesanoverheadofaunittime

• Ifnode0isatask:communication=3;computation=4• Ifnodes0,4,and5areatask:communication=5;computation=15

– coarser-graindecomposition→smallercommunication/computationratio(3/4vs 5/15)

TaskInteractionGraphs,Granularity,andCommunication

7

Page 8: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ProcessesandMapping

Agoodmappingmustminimizeparallelexecutiontimeby:

• Mappingindependenttaskstodifferentprocesses– Maximizeconcurrency

• Tasksoncriticalpathhavehighpriorityofbeingassignedtoprocesses

• Minimizinginteractionbetweenprocesses– mappingtaskswithdenseinteractionstothesameprocess.

• Difficulty:thesecriteriaoftenconflictwitheachother– E.g.Nodecomposition,i.e.onetask,minimizesinteractionbut

nospeedupatall!

8

Page 9: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

RecursiveDecomposition:Min

9

procedure SERIAL_MIN (A, n)min = A[0];for i := 1 to n − 1 doif (A[i] < min) min := A[i];

return min;

procedure RECURSIVE_MIN (A,n)if (n= 1)thenmin :=A [0];elselmin :=RECURSIVE_MIN (A,n/2 );rmin :=RECURSIVE_MIN (&(A[n/2]),n- n/2);if (lmin <rmin)thenmin :=lmin;elsemin :=rmin;

returnmin;

Findingtheminimuminavectorusingdivide-and-conquer

Applicabletootherassociativeoperations,e.g.sum,AND…Knownasreductionoperation

Page 10: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

DataDecomposition-- Themostcommonlyusedapproach

• Steps:1. Identifythedataonwhichcomputationsareperformed.2. Partitionthisdataacrossvarioustasks.

• Partitioninginducesadecompositionoftheproblem,i.e.computationispartitioned

• Datacanbepartitionedinvariousways– Criticalforparallelperformance

• Decompositionbasedon– outputdata– inputdata– input+outputdata– intermediatedata

10

Page 11: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OutputDataDecomposition:ExampleCount the frequency of item sets in database transactions

11

• Decomposetheitemsetstocount– eachtaskcomputestotalcountforeach

ofitsitemsets– appendtotalcountsforitemsetsto

producetotalcountresult

Page 12: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

InputDataPartitioning:Example

12

Page 13: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Densematrixalgorithms

• DenselinearalgebraandBLAS• Imageprocessing/stencil• Iterativemethods

13

Page 14: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Motifs

TheMotifs(formerly“Dwarfs”)from“TheBerkeleyView” (Asanovic etal.)formkeycomputationalpatterns

14TheLandscapeofParallelComputingResearch:AViewfromBerkeleyhttp://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf

Page 15: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Denselinearalgebra• Softwarelibrarysolvinglinearsystem

• BLAS(BasicLinearAlgebraSubprogram)– Vector,matrixvector,matrixmatrix

• LinearSystems:Ax=b• LeastSquares:choosextominimize||Ax-b||2

– Overdetermined orunderdetermined– Unconstrained,constrained,weighted

• EigenvaluesandvectorsofSymmetricMatrices• Standard(Ax=λx),Generalized(Ax=λBx)

• EigenvaluesandvectorsofUnsymmetric matrices• Eigenvalues,Schur form,eigenvectors,invariantsubspaces• Standard,Generalized

• SingularValuesandvectors(SVD)– Standard,Generalized

• Differentmatrixstructures– Real,complex;Symmetric,Hermitian,positivedefinite;dense,triangular,banded…

• Levelofdetail– SimpleDriver– ExpertDriverswitherrorbounds,extra-precision,otheroptions– Lowerlevelroutines(“applycertainkindoforthogonaltransformation”,matmul…) 15

Page 16: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS(BasicLinearAlgebraSubprogram)

• BLAS1,1973-1977– 15operations(mostly)onvectors(1-darray)

• “AXPY”(y =α·x+y ),dotproduct,scale(x=α·x )– Upto4versionsofeach(S/D/C/Z),46routines,3300LOC– WhyBLAS1?TheydoO(n1)opsonO(n1)data:AXPY

• 2nflopson3nread/writes• Computationalintensity=(2n)/(3n)=2/3

16

Page 17: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS2

• BLAS2,1984-1986– 25operations(mostly)onmatrix/vectorpairs– “GEMV”:y=α·A·x +β·x,“GER”:A=A+α·x·yT,x=T-1·x– Upto4versionsofeach(S/D/C/Z),66routines,18KLOC

• WhyBLAS2?TheydoO(n2)opsonO(n2)data– Computationalintensitystilljust~(2n2)/(n2)=2

17

X=

Page 18: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3

• BLAS 3,1987-1988– 9operations(mostly)onmatrix/matrixpairs

• “GEMM”:C=α·A·B+β·C,C=α·A·AT+β·C,B=T-1·B– Upto4versionsofeach(S/D/C/Z),30routines,10KLOC– WhyBLAS3?TheydoO(n3)opsonO(n2)data

• Computationalintensity(2n3)/(4n2)=n/2– bigatlast!• Goodformachineswithcaches,deepmem hierarchy

18

A[M][K]*B[K][N]=C[M][N]

Page 19: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

DecompositionforAXPY,MatrixVector,andMatrixMultiplication

19

Page 20: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS1:AXPY

• y=α·x+y– x andy arevectorsofsizeN

• InC,x[N],y[N]– α isscalar

• Decompositionissimple– Niterations(NelementsofXandY)aredistributedamongthreads– 1:1mappingbetweeniterationandelementofXandY– XandYareshared

20

chunk=3

T0 T1

Page 21: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS2:MatrixVectorMultiplication

• y=A·x– A[M][N],x[N],y[N]

• Row-wisedecomposition

21

Mt

i_start

Page 22: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

22

A[M][K]*B[k][N]=C[M][N]• Base• Base_1:columnmajororderofaccess• row1D_dist• column1D_dist• rowcol2D_dist

• DecompositionistocalculateMtandNt

M

K N

Page 23: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

23

• Row-based1-D

!!!!!!!!!!!!!!!!!!X!!!!!!!!!!!!!!!!!!!!!!!!!!=!

!!!!!!!A!!!!!!!!!X!!!!!!!!!!!!B!!!!!!!!!!!!=!!!!!!!!!!!!!!C!

T0!T1!T2!T3!

Page 24: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

24

• Column-based1-D

!!!!!!!!!!!!!!!!!!X!!!!!!!!!!!!!!!!!!!!!!!!!!=!

!!!!!!!A!!!!!!!!!X!!!!!!!!!!!!B!!!!!!!!!!!!=!!!!!!!!!!!!!!C!

T0!!!T1!!T2!!T3!

Page 25: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

BLAS3:DenseMatrixMultiplication

25

• Row/Column-based2-D

NeednestedparallelismexportOMP_NESTED=true

Page 26: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Densematrixalgorithms

• DenselinearalgebraandBLAS• Imageprocessing/stencil• Iterativemethods

26

Page 27: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

WhatisMultimedia• Multimedia is a combination of

text, graphic, sound, animation, and video that is delivered interactively to the user by electronic or digitally manipulated means.

https://en.wikipedia.org/wiki/Multimedia

Videoscontainsframe(images)

Page 28: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFormatandProcessing

• Pixels– Imagesarematrixofpixels

• Binaryimages– Eachpixeliseither0or1

Page 29: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFormatandProcessing

• Pixels– Imagesarematrixofpixels

• Grayscale images– Eachpixelvaluenormallyrangefrom0(black)to255(white)– 8bitsperpixel

Page 30: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFormatandProcessing

• Pixels– Imagesarematrixofpixels

• Colorimages– Eachpixelhasthree/fourvalues(4bitsor8bitseach)each

representingacolorscale

Page 31: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Histogram

• Animagehistogramisagraphofpixelintensity(onthe x-axis)versusnumberofpixels(onthe y-axis).The x-axishasallavailablegraylevels,andthe y-axisindicatesthenumberofpixelsthathaveaparticulargray-levelvalue.

31https://www.allaboutcircuits.com/technical-articles/image-histogram-characteristics-machine-learning-image-processing/

Page 32: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

HistogramsofMonochromeImage

32http://homepages.inf.ed.ac.uk/rbf/BOOKS/PHILLIPS/cips2edsrc/HIST.C

Page 33: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

HistogramofColorImages

• Imagedensity

33https://docs.opencv.org/3.4.0/d3/dc1/tutorial_basic_linear_transform.html

Page 34: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OpenMP ParallelizationofHistogram

• Decompositionbasedonoutput(pixelvalues,0- 255)– Eachthreadsearchesthewholeimagetoonlycountthose

pixelsthathavethevalueitshouldcountfor• E.g.with4threads:0-63forthread0,64-127forthread1,…

• Decompositionbasedontheinput(image)– Eachthreadsearchpartoftheimagetocountallthepixels

andstorethepartial histogramlocally– Addupallthepartialhistogram

34

Page 35: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFiltering

• Changingpixelvaluesbydoinga convolution betweenakernel(filter)andan image.

Page 36: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ImageFiltering:Themagicofthefiltermatrix

• http://lodev.org/cgtutor/filtering.html• https://en.wikipedia.org/wiki/Kernel_(image

_processing)

• Itisthebasicofconvolutionneuralnetwork

Page 37: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ConvolutionNeuralNetworkforObjectDetection

• Pooling:sample-baseddiscretizationprocess

37

http://cs231n.github.io/convolutional-networks/

Page 38: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OpenMP ParallelizationofImageFiltering

• Decompositionaccordingtotheinputimage• Sinceinputandoutputimagesareseparate,itisstraightforward– Couldberow1D,col1D,rowcol2D

• False-sharingforwritingboundaryofoutputimages

38

Page 39: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Densematrixalgorithms

• DenselinearalgebraandBLAS• Imageprocessing/stencil• Iterativemethods

39

Page 40: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

IterativeMethods

• Iterativemethodscanbeexpressedinthegeneralform:x(k) =F(x(k-1))

Hopefully:x(k)® s(solutionofmyproblem)

• Widevarietyofcomputationalscienceproblem– CFD,moleculardynamics,weather/climateforecast,

cosmology,

• Willitconverge?Howrapidly?

Page 41: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

IterativeStencilApplications

Loopuntilsomeconditionistrue

PerformcomputationwhichinvolvescommunicatingwithN,E,W,Sneighborsofapoint(5pointstencil)

[Convergencetest?]

Stencilissimilarasimagefiltering/convolution

x(k) =F(x(k-1))

Page 42: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Jacobi.c

• Assignment2and3:

42https://passlab.github.io/CSCE569/Assignment_2/jacobi.c

Page 43: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Jacobi

• Aniterativemethodforapproximatingthesolutiontoasystemoflinearequations.

• Ax=b wheretheith equationis

• a’sandb’sareknown,wanttosolveforx’s

inniii bxaxaxa =+++ ,11,11, !

úû

ùêë

é-= å

¹ijjjii

iii xaba

x ,,

1

Page 44: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

OpenMP ParallelizationofJacobi

• Similarasimagefiltering– Enclosedbythewhile tobeiterative

• omp parallel forouterwhile loop• omp for forinnerfor loops• single andreductionareneeded

44

Page 45: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

GhostCellExchange

• Forassignment3:

45

Page 46: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Background:Cmultidimensionalarray

46

Page 47: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

Vector/MatrixandArrayinC

• Chasrow-majorstorageformultipledimensionalarray– A[2,2]isfollowedbyA[2,3]

• 3-dimensionalarray– B[3][100][100]

• Thinkitasrecursivedefinition– A[4][10][32]

47

charA[4][4]

Page 48: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ColumnMajor

Fortraniscolumnmajor

48

Page 49: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayLayout:WhyWeCare?

1.Makesabigdifferenceforaccessspeed• Forperformance,setupcodetogoinrowmajororderinC

– Caching:eachreadfrommemorywillbringotheradjacentelementstothecacheline

• (Bad)Example:4vs 16accesses– matmul_base_1

49

for i = 1 to nfor j = 1 to n

A[j][i] = value

Page 50: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayLayout:WhyWeCare?

2.Affectdecompositionanddatamovement• Decompositionmaycreatesubmatrices thatareinnon-contiguousmemorylocations,e.g.A3andB1

• Submatrices incontiguousmemorylocationof2-Drowmajormatrix– Asingle-rowsubmatrix,e.g.A2– Asubmatrix formedwithadjacentrowswithfullcolumn

length,e.g.A1

50

A1

A2 B1

A3

Page 51: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayLayout:WhyWeCare?

2.Affectdecompositionandsubmatrix• Roworcolumnwisedistributionof2-Drow-majorarray• #ofdatamovementtoexchangedatabetweenT0andT1

– Row-wise:onememorycopybyeach– Column-wise:16copieseach

51T0T1T2T3

Row-wisedistribution Column-wisedistribution

Page 52: Lecture 9: Dense Matrices and Decomposition · 2018-04-30 · Lecture 9: Dense Matrices and Decomposition 1 CSCE 569 Parallel Computing Department of Computer Science and Engineering

ArrayandpointersinC

• InC,anarrayisapointer+dimensionality– Theyareliterallythesameinbinary,i.e.pointertothefirst

element,referencedasbaseaddress• Castandassignmentfromarraytopointe,int A[M][N]

• A,&A[0][0],andA[0]havethesamevalue,i.e.thepointertothefirstelementofthearray

• Castapointertoanarray– int *ap;int (*A)[N]=(int(*)[N])ap;A[i][j]….

• Addresscalculationforarrayreferences– AddressofA[i][j]=A+(i*N+j)*sizeof (int)

52