linear algebra on gpus
DESCRIPTION
Linear Algebra on GPUs. Jens Krüger Technische Universität München. Linear algebra?. Why are we interested in Linear Algebra? It is THE machinery to solve PDEs PDEs are at the core of many graphics applications Physics based simulation, Animation, Mesh fairing …. LA on GPUs?. - PowerPoint PPT PresentationTRANSCRIPT
Linear algebra?
1. Why are we interested in Linear Algebra?
It is THE machinery to solve PDEs1. PDEs are at the core of many graphics applications
Physics based simulation, Animation, Mesh fairing …
LA on GPUs?
2. … and why put LA on GPU?
A perfect couple…GPUs are fast stream processors,and many LA operations are “streamable”
…which goes hand in handThe solution is already on the GPU and ready for display
Getting started …
GPU as workhorse for numerical computations
Computergraphics
applications
Visual simulationVisual computing
Education and Training
Basic linearalgebra operators
General linearalgebra package
Programmable GPUsHighbandwidth
Parallelcomputing
Getting started …
GPU as workhorse for numerical computations
Computergraphics
applications
Visual simulationVisual computing
Education and Training
Basic linearalgebra operators
General linearalgebra package
Programmable GPUsHighbandwidth
Parallelcomputing
• Per-pixel vs. per-vertex operations– 6 gigapixels/second vs. 0.7 gigavertices/second– Efficient texture memory cache– Texture read-write access
– 2D Textures are even better– 2D RGBA textures really rock
– Textures best we can do
Internal affairs
1 N1
N
Vector representation
Representation (cont.)
Dense Matrix representation– Treat a dense matrix as a set of column vectors– Again, store these vectors as 2D textures
Matrix
N
N
i N Vectors
......N
1 i N
N 2D-Textures
... ...1 i N
N
i
Representation (cont.)
Banded Sparse Matrix representation– Treat a banded matrix as a set of diagonal vectors– Combine opposing vectors to save space
N
Matrix 2 Vectors
N
1 2
2 2D-Textures1 2
N-i
Operations 1
• Vector-Vector Operations– Reduced to 2D texture operations– Coded in pixel shaders
• Example: Vector1 + Vector2 Vector3
return tex0 + tex1Pass through
Vector 1 Vector 2
TexUnit 0 TexUnit 1
Vertex Shader Pixel Shader
Vector 3
Render TextureStatic quad
Operations 2 (reduce)
Reduce operation for scalar products
Reduce m x n regionin fragment shader
2 passnd
......
st1 pass
...
original Texture
...
The “single float” on GPUs
Some operations generate single float values e.g. reduce
Read-back to main-mem is slow Keep single floats on the GPU as 1x1 textures
...
Operations (cont.)
Matrix-Vector Operations– Split it up into Vector – Vector operations
Matrix
N
N
i N Vectors
......N
1 i N
N 2D-Textures
... ...1 i N
N
N
Matrixi 2 Vectors
N
1 2
2 2D-Textures1 2
N-i
Building a Framework
Presented so far:
• Representations on the GPU for– Single float values– Vectors– Matrices
• Dense• Banded• Random sparse (see SIGGRAPH ‘03)
• Operations on these representations– Add, multiply, reduce, …– Upload, download, clear, clone, …
Framework Classes (UML)
+getDevice
clClass
+getSize+clear
clVector
clPackedMatrix
+getSize+matrixVectorOp
clAbstractMatrix+getRenderSurface+getTexture+releaseTexture+releaseRenderSurface
clMemMan
clUnpackedVector+unpack+repack
clPackedVector
+reduce+multiply+copy+add+sub+vectorOp
clFragmentVector
+setRow+getRow
clFragmentMatrix
+setData+setDataSorted
clVertexMatrixclUnpackedMatrix
+setData+getData+add+mul+sub+div+invert
clFloat
Framework Example: CG
Encapsulate into Classes for more complex algorithms– Example use: Conjugate Gradient Method, complete source:
void clCGSolver::solveInit() {Matrix->matrixVectorOp(CL_SUB,X,B,R); // R = A*x-bR->multiply(-1); // R = -RR->clone(P); // P = R R->reduceAdd(R, Rho); // rho = sum(R*R);
}void clCGSolver::solveIteration() {
Matrix->matrixVectorOp(CL_NULL,P,NULL,Q);// Q = Ap;P->reduceAdd(Q,Temp); // temp = sum(P*Q);Rho->div(Temp,Alpha); // alpha = rho/temp;X->addVector(P,X,1,Alpha); // X = X + alpha*PR->subtractVector(Q,R,1,Alpha); // R = R - alpha*QR->reduceAdd(R,NewRho); // newrho = sum(R*R);NewRho->divZ(Rho,Beta); // beta = newrho/rhoR->addVector(P,P,1,Beta); // P = R+beta*P;clFloat *temp; temp=NewRho;NewRho=Rho; Rho=temp; // swap rho and newrho pointers
}void clCGSolver::solve(int maxI) {
solveInit();for (int i = 0;i< maxI;i++) solveIteration();
}int clCGSolver::solve(float rhoTresh, int maxI) {
solveInit(); Rho->clone(NewRho);for (int i = 0;i< maxI && NewRho.getData() > rhoTresh;i++) solveIteration();return i;
}
Example 12D Waves (explicit)
• Finite difference discretization:
• You could write a custom shader for this filter
• Think about this as a matrix-vector operation
t 1 t t t t t t 1i, j i 1, j i, j 1 i 1, j i, j 1 i, j i, jx x x x x (2 4 ) x x
2 2 22
2 2 2
x x xct u v
tx 3tx 4tx 5tx 6tx 7
tx 9
11tx1
2tx1
3tx1
4tx1
5tx1
6tx1
7tx1
8tx1
9tx
tx 1tx 2
tx 7
tx 3tx 4tx 5tx 6tx 7
tx 9
11tx1
2tx1
3tx1
4tx1
5tx1
6tx1
7tx1
8tx1
9tx
tx 1
tx 3tx 4tx 5tx 6tx 7
tx 9
11tx1
2tx1
3tx1
4tx1
5tx1
6tx1
7tx1
8tx1
9tx
tx 1tx 2
tx 7
2
2
2
2
2
2
2
2
2
=-
11tx1
2tx1
3tx1
4tx1
5tx1
6tx1
7tx1
8tx1
9tx
11tx1
2tx1
3tx1
4tx1
5tx1
6tx1
7tx1
8tx1
9tx
11tx1
2tx1
3tx1
4tx1
5tx1
6tx1
7t-x1
8tx1
9tx
2D Waves (cont.)
clMatrix->matrixVectorOp(CL_SUB,clCurrent,clLast,clNext); // next = matrix*current-last;
clLast->copyVector(clCurrent); // save for next iteration
clCurrent->copyVector(clNext); // save for next iteration
cluNext->unpack(clNext); // unpack for rendering
renderHF(cluNext->m_pVectorTexture); // render as heightfield
for (i=sY;i<sX*sY;i++) data[i] = ß; // setup diagonal-sY
matrix->getRow(sX*(sY-1))->setData(data);
for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? ß : 0; // setup diagonal-1
matrix->getRow(sX*sY-1)->setData(data);
for (i=0;i<sX*sY;i++) data[i] = 2-4ß; // setup diagonal
matrix->getRow(sX*sY)->setData(data);
for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? ß : 0; // setup diagonal+1
matrix->getRow(sX*sY+1)->setData(data);
for (i=0;i<sX*(sY-1);i++) data[i] = ß; // setup diagonal+sY
matrix->getRow(sX*(sY+1))->setData(data);
One Time Matrix Initialization:
Per Frame Iteration
2
2
2
2
2
2
2
2
2
Key Idea– Use different time discretization (e.g. Crank Nicholson)– Results in system of linear equations– Iterative solution using CG
Example 22D Waves (implicit)
=
11
tx
12tx1
3tx1
4tx1
5tx1
6tx1
7tx1
8tx1
9tx
tc3tc4
tc5tc 6tc 7
tc 9
tc1
tc2
tc 7
2D Waves (cont.)
cluRHS->computeRHS(cluLast, cluCurrent); // generate c(i,j,t)
clRHS->repack(cluRHS); // encode into RGBA
iSteps = pCGSolver->solve(iMaxSteps); // solve using CG
cluLast->copyVector(cluCurrent); // save for next iteration
clNext->unpack(cluCurrent); // unpack for rendering
renderHF(cluCurrent->m_pVectorTexture); // render as heightfield
for (i=sY;i<sX*sY;i++) data[i] = -alpha; // setup diagonal-sY
matrix->getRow(sX*(sY-1))->setData(data);
for (i=0;i<sX*sY;i++) data[i] = (i%sX) ? - alpha : 0; // setup diagonal-1
matrix->getRow(sX*sY-1)->setData(data);
for (i=0;i<sX*sY;i++) data[i] = 4*alpha+1 // setup diagonal
matrix->getRow(sX*sY)->setData(data);
for (i=0;i<sX*sY;i++) data[i] = ((i+1)%sX) ? -alpha:0; // setup diagonal+1
matrix->getRow(sX*sY+1)->setData(data);
for (i=0;i<sX*(sY-1);i++) data[i] = -alpha // setup diagonal+sY
matrix->getRow(sX*(sY+1))->setData(data);
One Time Matrix Initialization:
Per Frame Iteration