tensor contraction with extended blas kernels on cpu and …users.wfu.edu › ballard › siam-ala18...
TRANSCRIPT
![Page 1: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/1.jpg)
Tensor Contraction with Extended BLAS Kernels
on CPU and GPUYang Shi
University of California, Irvine, EECS
Joint work with U.N. Niranjan, Animashree Anandkumar and Cris Cecka
SIAM-ALA18
![Page 2: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/2.jpg)
Tensor Contraction-Motivation
![Page 3: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/3.jpg)
Tensor Contraction-Motivation
Why we need tensor?
![Page 4: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/4.jpg)
Tensor Contraction-Motivation
Why we need tensor?Modern data is inherently multi-dimensional
![Page 5: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/5.jpg)
Tensor Contraction-Motivation
Why we need tensor?Modern data is inherently multi-dimensional
Input Hidden 1 Hidden 2 Output
Neural Networks
![Page 6: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/6.jpg)
Tensor Contraction-Motivation
Why we need tensor?Modern data is inherently multi-dimensional
Input Hidden 1 Hidden 2 Output
Neural Networks Method of Moment
![Page 7: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/7.jpg)
Tensor Contraction-MotivationWhat is tensor contraction?
![Page 8: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/8.jpg)
Tensor Contraction-MotivationWhat is tensor contraction?
=
=
A(:,1,:) A(:,2,:)A422
B21
C421
![Page 9: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/9.jpg)
Tensor Contraction-MotivationWhat is tensor contraction?
=
=
A(:,1,:) A(:,2,:)A422
B21
C421
Why do we need tensor contraction?
![Page 10: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/10.jpg)
Tensor Contraction-MotivationWhat is tensor contraction?
=
=
A(:,1,:) A(:,2,:)A422
B21
C421
Why do we need tensor contraction?
•Physics •Chemistry
![Page 11: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/11.jpg)
Why do we need tensor contraction?
Tensor Contraction-Motivation
![Page 12: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/12.jpg)
•Deep Learning
Why do we need tensor contraction?
Tensor Contraction-Motivation
![Page 13: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/13.jpg)
•Deep Learning
Why do we need tensor contraction?
Tensor Contraction-Motivation
• Learning latent variable model with tensor decomposition Example: Topic modeling
![Page 14: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/14.jpg)
•Deep Learning
Why do we need tensor contraction?
Tensor Contraction-Motivation
• Learning latent variable model with tensor decomposition Example: Topic modeling
![Page 15: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/15.jpg)
•Deep Learning
Why do we need tensor contraction?
Tensor Contraction-Motivation
h: Proportion of topics in a document
A: Topic-word matrix
• Learning latent variable model with tensor decomposition Example: Topic modeling
![Page 16: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/16.jpg)
•Deep Learning
Why do we need tensor contraction?
Tensor Contraction-Motivation
h: Proportion of topics in a document
A: Topic-word matrix
Third order moment:
• Learning latent variable model with tensor decomposition Example: Topic modeling
![Page 17: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/17.jpg)
Tensor Contraction-MotivationWhat do we have?
![Page 18: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/18.jpg)
Tensor Contraction-MotivationWhat do we have?
Tensor computation libraries:
• Arbitrary/restricted tensor operations of any order and dimension
•Such as: Matlab Tensortoolbox,BTAS, FTensor, Cyclops
![Page 19: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/19.jpg)
Tensor Contraction-MotivationWhat do we have?
Tensor computation libraries:
• Arbitrary/restricted tensor operations of any order and dimension
•Such as: Matlab Tensortoolbox,BTAS, FTensor, Cyclops
Efficient computing frame:
• Static analysis solutions: loop reorganization, fusion
• Parallel and distributed computing system: BatchedGEMM functions in MKL 11.3, CuBLAS v4.1: compute many matrix-matrix multiplies at once.
![Page 20: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/20.jpg)
Tensor Contraction-MotivationWhat do we have?
Tensor computation libraries:
• Arbitrary/restricted tensor operations of any order and dimension
•Such as: Matlab Tensortoolbox,BTAS, FTensor, Cyclops
Efficient computing frame:
• Static analysis solutions: loop reorganization, fusion
• Parallel and distributed computing system: BatchedGEMM functions in MKL 11.3, CuBLAS v4.1: compute many matrix-matrix multiplies at once.
![Page 21: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/21.jpg)
What are the limitations?Tensor Contraction-Motivation
![Page 22: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/22.jpg)
What are the limitations?• Explicit permutation takes long time in current tensor libraries:
Tensor Contraction-Motivation
![Page 23: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/23.jpg)
What are the limitations?• Explicit permutation takes long time in current tensor libraries:
Tensor Contraction-Motivation
Consider
![Page 24: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/24.jpg)
What are the limitations?• Explicit permutation takes long time in current tensor libraries:
Tensor Contraction-Motivation
Consider
![Page 25: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/25.jpg)
What are the limitations?• Explicit permutation takes long time in current tensor libraries:
Figure: The fraction of time spent in copies/transpositions when computingCmnp = AmkBpkn . Lines are shown with 1, 2, 3, and 6 total transpositionsperformed on either the input or output. (Left) CPU. (Right) GPU.
100 200 300 400 5000
0.2
0.4
0.6
0.8
1
n
Mem
ory
fraction
100 200 300 400 5000
0.2
0.4
0.6
0.8
1
n
Tensor Contraction-Motivation
Consider
![Page 26: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/26.jpg)
Overview
![Page 27: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/27.jpg)
Overview• Propose tensor operation kernel: StridedBatchedGEMM
![Page 28: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/28.jpg)
Overview• Propose tensor operation kernel: StridedBatchedGEMM
• Library-based approaches that avoid memory movement
![Page 29: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/29.jpg)
Overview• Propose tensor operation kernel: StridedBatchedGEMM
• Library-based approaches that avoid memory movement• Constant-strided BatchedGEMM that has more optimization opportunities
![Page 30: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/30.jpg)
Overview• Propose tensor operation kernel: StridedBatchedGEMM
• Library-based approaches that avoid memory movement• Constant-strided BatchedGEMM that has more optimization opportunities
• Provide evaluation strategies for tensor contractions
![Page 31: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/31.jpg)
Overview• Propose tensor operation kernel: StridedBatchedGEMM
• Library-based approaches that avoid memory movement• Constant-strided BatchedGEMM that has more optimization opportunities
• Provide evaluation strategies for tensor contractions
• Apply to tensor decomposition
![Page 32: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/32.jpg)
Overview• Propose tensor operation kernel: StridedBatchedGEMM
• Library-based approaches that avoid memory movement• Constant-strided BatchedGEMM that has more optimization opportunities
• Provide evaluation strategies for tensor contractions
• Apply to tensor decomposition
• Introduce TensorLy: Tensor learning in python
![Page 33: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/33.jpg)
BLAS Operations
![Page 34: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/34.jpg)
BLAS OperationsBLAS(Basic Linear Algebra Subprograms): Low-level routines for performing common linear algebra operations.
![Page 35: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/35.jpg)
BLAS OperationsBLAS(Basic Linear Algebra Subprograms): Low-level routines for performing common linear algebra operations.
![Page 36: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/36.jpg)
BLAS OperationsBLAS(Basic Linear Algebra Subprograms): Low-level routines for performing common linear algebra operations.
![Page 37: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/37.jpg)
BLAS OperationsBLAS(Basic Linear Algebra Subprograms): Low-level routines for performing common linear algebra operations.
Stride
C!"#$% M&j'(
Stride
R M j !
![Page 38: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/38.jpg)
Extended BLAS Operator
![Page 39: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/39.jpg)
Extended BLAS OperatorFocusing: one-index contraction
![Page 40: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/40.jpg)
Extended BLAS Operator
If fixing indices of C, there are total 3 x 2 x 3 x 2 x 1 = 36 cases.
Focusing: one-index contraction
![Page 41: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/41.jpg)
Extended BLAS Operator
![Page 42: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/42.jpg)
Table: Example: possible mappings to Level 3 BLAS routines
Extended BLAS Operator
![Page 43: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/43.jpg)
Table: Example: possible mappings to Level 3 BLAS routines
Extended BLAS Operator
![Page 44: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/44.jpg)
Table: Example: possible mappings to Level 3 BLAS routines
Extended BLAS Operator
tride
2]3
Stride
[3]
![Page 45: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/45.jpg)
Example
![Page 46: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/46.jpg)
Example
Table: List of 36 possible single mode contraction operations between a second-order tensor and a third-order tensor and possible mappings to Level-3 BLAS routines
![Page 47: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/47.jpg)
AnalysisFlatten v.s. SBGEMM
0 100 200 300 400 500
1
2
3
n
Flat
teni
ngSp
eedu
p(B
atch
/Fla
t)
Case 1.1 [n]Case 1.1 [p]Case 1.5 [p]Case 6.1 [n]
0 100 200 300 400 500
1
2
3
n
Prefer flatten than SBGEMM
![Page 48: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/48.jpg)
AnalysisBatching in last mode v.s. middle mode
0 100 200 300 400 500
0.9
1
1.1
1.2
n
Last
Mode
Speedup
([n]
/[p])
0 100 200 300 400 500
0.9
1
1.1
1.2
n
Case 1.1Case 2.1
On CPU, it’s better to batch in last mode when tensor size is small/moderate
![Page 49: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/49.jpg)
Application: Tucker Decomposition
![Page 50: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/50.jpg)
Application: Tucker Decomposition
![Page 51: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/51.jpg)
Application: Tucker Decomposition
mnp ijk mi
njT GA
B
pkC
![Page 52: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/52.jpg)
Application: Tucker Decomposition
Main Steps:
mnp ijk mi
njT GA
B
pkC
![Page 53: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/53.jpg)
Application: Tucker Decomposition
Main Steps:
mnp ijk mi
njT GA
B
pkC
![Page 54: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/54.jpg)
Application: Tucker Decomposition
20 40 60 80 100 12010�2
100
102
104
106
n
Tim
e(sec)
TensorToolbox
BTAS
Cyclops
CPU Batched
GPU Batched
Figure: Performance on Tucker decomposition.
![Page 55: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/55.jpg)
Conclusion
• StridedBatchedGEMM for generalized tensor contractions.
• Avoid explicit transpositions or permutations.
• 10x(GPU) and 2x(CPU) speedup on small and moderate sized tensors.
• Available in CuBLAS 8.0.
![Page 56: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/56.jpg)
Introduction of TensorLyby Jean Kossaifi, Imperial College London Yannis Panagakis, Imperial College London Anima Anandkumar, Caltech
![Page 57: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/57.jpg)
Introduction of TensorLy
• Open source
by Jean Kossaifi, Imperial College London Yannis Panagakis, Imperial College London Anima Anandkumar, Caltech
Github: https://github.com/tensorly/tensorly
Suitable for academic / industrial applications
Homepage: http://tensorly.org/dev/
![Page 58: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/58.jpg)
Introduction of TensorLy
• Open source
• Reliability and easy to use
by Jean Kossaifi, Imperial College London Yannis Panagakis, Imperial College London Anima Anandkumar, Caltech
Github: https://github.com/tensorly/tensorly
Suitable for academic / industrial applications
Depends only on NumPy, SciPy [Optionally Matplotlib, MXNet and PyTorch]
Exhaustive documentation, Unit-testing for all functions
Fast
Homepage: http://tensorly.org/dev/
![Page 59: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/59.jpg)
User-friendly API
Unified backend
Basic tensor operations
Tensor decomposition Tensor regression Deep learning
TensorLy
![Page 60: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/60.jpg)
TensorLy Operators
• Kronecker• Khatri-rao• Hadamard products • Tensor unfolding/folding/vectorization • N-mode product
• CANONICAL-POLYADIC (CP) • Non-negative CP Tucker (HO-SVD) • Non-negative Tucker • Robust Tensor PCA
![Page 61: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/61.jpg)
TensorLy Example
from tensorly.decomposition import tucker
core, factors = tucker(image, ranks=(50, 50, 3), init='random')tucker_reconstruction = tl.tucker_to_tensor(core, factors)
from tensorly.decomposition import parafac
factors = parafac(image, rank=50, init='random')cp_reconstruction = tl.kruskal_to_tensor(factors)
![Page 62: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/62.jpg)
TensorLy Backend
tl.set_backend(‘numpy’) # or ‘mxnet’ or ‘pytorch’
import tensorly as tl
T = tl.tensor([[1, 2, 3], [4, 5, 6]])tl.tenalg.kronecker([T, T])tl.clip(T, a_min=2, a_max=5)
tl.set_backend('mxnet')T = tl.tensor([[1, 2, 3], [4, 5, 6]])
tl.set_backend('pytorch')T = tl.tensor([[1, 2, 3], [4, 5, 6]])
NumPy ndarray
MXNet NDArray
PyTorch FloatTensor
![Page 63: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/63.jpg)
TensorLy Example
import tensorly as tlfrom tensorly.random import tucker_tensor
tl.set_backend(‘pytorch’)core, factors = tucker_tensor((5, 5, 5), rank=(3, 3, 3))core = Variable(core, requires_grad=True)factors = [Variable(f, requires_grad=True) for f in factors]
optimiser = torch.optim.Adam([core]+factors, lr=lr)
for i in range(1, n_iter): optimiser.zero_grad() rec = tucker_to_tensor(core, factors) loss = (rec - tensor).pow(2).sum() for f in factors: loss = loss + 0.01*f.pow(2).sum()
loss.backward() optimiser.step()
Back-propagate through tensor operations with PyTorch
PyTorch FloatTensor
We can attach gradients
Penalty on the factors
![Page 64: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/64.jpg)
Contribute to TensorLyContributions welcome!
• If you have a cool tensor method you want to add
• If you spot a bug
![Page 65: Tensor Contraction with Extended BLAS Kernels on CPU and …users.wfu.edu › ballard › SIAM-ALA18 › shi.pdfWhy do we need tensor contraction? Tensor Contraction-Motivation h:](https://reader036.vdocument.in/reader036/viewer/2022062920/5f02ea887e708231d406a545/html5/thumbnails/65.jpg)
Thank you!
Questions?