![Page 1: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/1.jpg)
April 19, 2010 HIPS 2010 1
Transforming Linear Algebra Libraries: From Abstraction to
Parallelism
Ernie Chan
![Page 2: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/2.jpg)
April 19, 2010 HIPS 2010 2
Motivation
Statically
![Page 3: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/3.jpg)
April 19, 2010 HIPS 2010 3
Outline
Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic
GraphPerformanceConclusion
![Page 4: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/4.jpg)
April 19, 2010 HIPS 2010 4
Inversion of a Triangular Matrix
Formal Linear Algebra Methods Environment (FLAME) High-level abstractions for expressing linear
algebra algorithms
Triangular Inversion (Trinv)
R := U-1
![Page 5: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/5.jpg)
April 19, 2010 HIPS 2010 5
Inversion of a Triangular Matrix
![Page 6: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/6.jpg)
April 19, 2010 HIPS 2010 6
Inversion of a Triangular Matrix
LAPACK-style Implementation
DO J = 1, N, NB
JB = MIN( NB, N-J+1 )
CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’,
$ JB, N-J-JB+1, -ONE, A( J, J ), LDA,
$ A( J, J+JB ), LDA )
CALL DGEMM( ‘No transpose’, ‘No transpose’,
$ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA,
$ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA )
CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’,
$ J-1, JB, ONE, A( J, J ), LDA,
$ A( 1, J ), LDA )
CALL DTRTI2( ‘Upper’, ‘Non-unit’,
$ JB, A( J, J ), LDA, INFO )
ENDDO
![Page 7: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/7.jpg)
April 19, 2010 HIPS 2010 7
Inversion of a Triangular Matrix
FLASH Matrix of matrices
![Page 8: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/8.jpg)
April 19, 2010 HIPS 2010 8
Inversion of a Triangular Matrix
FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL );
while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL );}
![Page 9: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/9.jpg)
April 19, 2010 HIPS 2010 9
Inversion of a Triangular Matrix
Extensible Markup Language (XML)<?xml version="1.0" encoding="ISO-8859-1"?>
<Function name="FLA_Trinv" type="blk" variant="3">
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Declaration>
<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>
</Declaration>
<Loop>
<Guard>A</Guard>
<Update>
<Statement name="FLA_Trsm">
<Option type="side">FLA_LEFT</Option>
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Option type="trans">FLA_NO_TRANSPOSE</Option>
<Option type="diag">FLA_NONUNIT_DIAG</Option>
<Parameter>FLA_MINUS_ONE</Parameter>
<Parameter partition="11">A<Parameter>
<Parameter partition="12">A<Parameter>
<Statement name="FLA_Gemm">
<Option type="trans">FLA_NO_TRANSPOSE</Option>
<Option type="trans">FLA_NO_TRANSPOSE</Option>
<Parameter>FLA_ONE<Parameter>
![Page 10: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/10.jpg)
April 19, 2010 HIPS 2010 10
Inversion of a Triangular Matrix
Extensible Markup Language (XML) Cont. <Parameter partition="01">A</Parameter>
<Parameter partition="12">A</Parameter>
<Parameter>FLA_ONE</Parameter>
<Parameter partition="02">A</Parameter>
</Statement>
<Statement name="FLA_Trsm">
<Option type="side">FLA_RIGHT</Option>
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Option type="trans">FLA_NO_TRANSPOSE</Option>
<Option type="diag">FLA_NONUNIT_DIAG</Option>
<Parameter>FLA_ONE</Parameter>
<Parameter partition="11">A</Parameter>
<Parameter partition="01">A</Parameter>
</Statement>
<Statement name="FLA_Trinv">
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Option type="diag">FLA_NONUNIT_DIAG</Option>
<Parameter partition="11">A</Parameter>
</Statement>
</Update>
</Loop>
</Function>
![Page 11: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/11.jpg)
April 19, 2010 HIPS 2010 11
Outline
Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic
GraphPerformanceConclusion
![Page 12: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/12.jpg)
April 19, 2010 HIPS 2010 12
Requisite Semantic Information
Partitioning Scheme<?xml version="1.0" encoding="ISO-8859-1"?>
<Function name="FLA_Trinv" type="blk" variant="3">
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Declaration>
<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>
</Declaration>
<Loop>
<Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->
<Update>
<Statement name="FLA_Trsm“>
<!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->
</Statement>
<Statement name="FLA_Gemm“>
<!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->
</Statement>
<Statement name="FLA_Trsm“>
<!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->
</Statement>
<Statement name="FLA_Trinv“>
<!–- ‘Upper’, ‘Non-unit’, A11 -->
</Statement>
</Update>
</Loop>
</Function>
![Page 13: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/13.jpg)
April 19, 2010 HIPS 2010 13
Requisite Semantic Information
Problem Size*<?xml version="1.0" encoding="ISO-8859-1"?>
<Function name="FLA_Trinv" type="blk" variant="3">
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Declaration>
<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>
</Declaration>
<Loop>
<Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->
<Update>
<Statement name="FLA_Trsm“>
<!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->
</Statement>
<Statement name="FLA_Gemm“>
<!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->
</Statement>
<Statement name="FLA_Trsm“>
<!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->
</Statement>
<Statement name="FLA_Trinv“>
<!–- ‘Upper’, ‘Non-unit’, A11 -->
</Statement>
</Update>
</Loop>
</Function>
![Page 14: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/14.jpg)
April 19, 2010 HIPS 2010 14
Requisite Semantic Information
Updates<?xml version="1.0" encoding="ISO-8859-1"?>
<Function name="FLA_Trinv" type="blk" variant="3">
<Option type="uplo">FLA_UPPER_TRIANGULAR</Option>
<Declaration>
<Operand type="matrix" direction="TL->BR" inout="both">A</Operand>
</Declaration>
<Loop>
<Guard>A</Guard> <!-- while m( ATL ) < m( A ) -->
<Update>
<Statement name="FLA_Trsm“>
<!-- ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, -ONE, A11, A12 -->
</Statement>
<Statement name="FLA_Gemm“>
<!-- ‘No transpose’, ‘No transpose’, ONE, A01, A12, ONE, A02 -->
</Statement>
<Statement name="FLA_Trsm“>
<!-- ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, ONE, A11, A01 -->
</Statement>
<Statement name="FLA_Trinv“>
<!–- ‘Upper’, ‘Non-unit’, A11 -->
</Statement>
</Update>
</Loop>
</Function>
![Page 15: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/15.jpg)
April 19, 2010 HIPS 2010 15
Requisite Semantic Information
Input and Output Parameters<?xml version="1.0" encoding="ISO-8859-1"?>
<Function name="FLA_Trsm">
<Declaration>
<Operand type=“scalar“ inout=“in">alpha</Operand>
<Operand type="matrix“ inout=“in">A</Operand>
<Operand type="matrix“ inout=“both“>B</Operand>
</Declaration>
</Function>
<Function name="FLA_Gemm">
<Declaration>
<Operand type=“scalar“ inout=“in">alpha</Operand>
<Operand type="matrix“ inout=“in">A</Operand>
<Operand type="matrix“ inout=“in">B</Operand>
<Operand type=“scalar“ inout=“in">beta</Operand>
<Operand type="matrix“ inout="both">C</Operand>
</Declaration>
</Function>
<Function name="FLA_Trinv">
<Declaration>
<Operand type="matrix“ inout="both">A</Operand>
</Declaration>
</Function>
![Page 16: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/16.jpg)
April 19, 2010 HIPS 2010 16
Outline
Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic
GraphPerformanceConclusion
![Page 17: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/17.jpg)
April 19, 2010 HIPS 2010 17
Static Generation of a DAG
Code Generation Convert XML representation to FLASH code
generation intermediary Annotated with input and output information
Create directed acyclic graph (DAG) by statically unrolling the loop
Operations on submatrix blocks (tasks) are vertices Data dependencies between tasks are edges
![Page 18: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/18.jpg)
April 19, 2010 HIPS 2010 18
Static Generation of a DAG
Data Dependencies Flow (read-after-write)
S1: A = B + C;
S2: D = A + E; Anti (write-after-read)
S3: F = A + G;
S4: A = H + I; Output (write-after-write)
S5: A = J + K;
S6: A = L + M;
![Page 19: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/19.jpg)
April 19, 2010 HIPS 2010 19
Static Generation of a DAG
![Page 20: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/20.jpg)
April 19, 2010 HIPS 2010 20
Static Generation of a DAG
Problem Size Problem size cannot be determined a priori Fix the block size or loop unrolling factor
Balance between instruction footprint and data granularity of tasks
Example Trinv on 3x3 matrix of blocks
![Page 21: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/21.jpg)
April 19, 2010 HIPS 2010 21
Static Generation of a DAG
Trinv Iteration 1
Trinv2 Trsm0 Trsm1
![Page 22: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/22.jpg)
April 19, 2010 HIPS 2010 22
Static Generation of a DAG
Trinv Iteration 2
Trsm5 Gemm4
Trinv6 Trsm3
![Page 23: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/23.jpg)
April 19, 2010 HIPS 2010 23
Static Generation of a DAG
Trinv Iteration 3
Trsm7
Trsm8
Trinv9
![Page 24: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/24.jpg)
April 19, 2010 HIPS 2010 24
Static Generation of a DAG
Trsm1
Trinv2
Trsm0
Gemm4
Trsm5
Trinv9
Trsm3
Trsm7 Trsm8
Trinv6
![Page 25: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/25.jpg)
April 19, 2010 HIPS 2010 25
Outline
Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic
GraphPerformanceConclusion
![Page 26: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/26.jpg)
April 19, 2010 HIPS 2010 26
Performance
LabVIEW Graphical, data flow programming language (G)
Anti-dependencies cannot exist in G• Copies are made when wire is split
![Page 27: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/27.jpg)
April 19, 2010 HIPS 2010 27
Performance
![Page 28: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/28.jpg)
April 19, 2010 HIPS 2010 28
Performance
Target Architecture 16-core AMD processor
4 socket quad-core Opteron 1.9 GHz 4 GB of RAM per socket
LabVIEW 8.6 Windows XP
Basic Linear Algebra Subprograms (BLAS) MKL 7.2
![Page 29: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/29.jpg)
April 19, 2010 HIPS 2010 29
Performance
![Page 30: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/30.jpg)
April 19, 2010 HIPS 2010 30
Performance
Results Parallelism
Exploit parallelism inherent within DAG Hierarchical matrix storage
Spatial locality Overhead
Copy matrix from flat row-major storage to hierarchical matrix and back
![Page 31: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/31.jpg)
April 19, 2010 HIPS 2010 31
Performance
![Page 32: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/32.jpg)
April 19, 2010 HIPS 2010 32
Outline
Inversion of a Triangular MatrixRequisite Semantic InformationStatic Generation of a Directed Acyclic
GraphPerformanceConclusion
![Page 33: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/33.jpg)
April 19, 2010 HIPS 2010 33
Conclusion
Instantiate linear algebra algorithm using a code generation intermediary
Statically produce a directed acyclic graph by fixing block size or loop unrolling factor
XML → FLASH → DAG
![Page 34: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/34.jpg)
April 19, 2010 HIPS 2010 34
Acknowledgments
Jim Nagle, Robert van de Geijn We thank the other members of FLAME team
for their support
Funding National Instruments NSF Grants
CCF—0540926 CCF—0702714
![Page 35: Transforming Linear Algebra Libraries: From Abstraction to Parallelism](https://reader036.vdocument.in/reader036/viewer/2022081517/56815b11550346895dc8bb2b/html5/thumbnails/35.jpg)
April 19, 2010 HIPS 2010 35
Conclusion
More Information
http://www.cs.utexas.edu/~flame
Questions?