![Page 1: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/1.jpg)
Zhiduo Liu
Supervisor: Guy Lemieux
Sep. 28th, 2012
Accelerator Compiler
for the VENICE Vector
Processor
![Page 2: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/2.jpg)
Outline:
Motivation
Background
Implementation
Results
Conclusion
![Page 3: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/3.jpg)
Outline:
Motivation
Background
Implementation
Results
Conclusion
![Page 4: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/4.jpg)
Motivation
Multi-core
GPU
FPGA
Many-core
…
CUDA
System Verilog
VHD
L
OpenCL
Erlang
Computer clusters
OpenM
PMPI
Pthre
a
dOpenHM
PP
Verilo
gBluespec
Cilk
X10
OpenGL
Sh
aJava
ParC
Fortress
Chapel
Vector Processor
StreamIt
Spong
e
SSE
![Page 5: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/5.jpg)
Motivation
Multi-core
GPU
FPGA
Many-core
…
CUDA
System Verilog
VHD
L
OpenCL
Erlang
Computer clusters
OpenM
PMPI
Pthre
a
dOpenHM
PP
Verilo
gBluespec
Cilk
X10
OpenGL
Sh
aJava
ParC
Fortress
Chapel
Vector Processor
StreamIt
Spong
e
SSE
Simplification
![Page 6: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/6.jpg)
Motivation
…
Single Description
![Page 7: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/7.jpg)
Contributions
The compiler serves as a new back-end of a single-description multiple-device language.
The compiler makes VENICE easier to program and debug.
The compiler provides auto-parallelization and optimization.
[1] Z. Liu, A. Severance, S. Singh and G. Lemieux, “Accelerator Compiler for
the VENICE Vector Processor,” in FPGA 2012.
[2] C. Chou, A. Severance, A. Brant, Z. Liu, S. Sant, G. Lemieux, “VEGAS: soft
vector processor with scratchpad memory,” in FPGA 2011.
![Page 8: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/8.jpg)
Outline:
Motivation
Background
Implementation
Results
Conclusion
![Page 9: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/9.jpg)
Complicated
ALIGNALIGN WR RDWR RD ALIGNALIGN EX1EX1 EX2EX2 ACCUMACCUM
![Page 10: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/10.jpg)
#include "vector.h“
int main(){ int A[] = {1,2,3,4,5,6,7,8}; const int data_len = sizeof ( A );
int *va = ( int *) vector_malloc ( data_len );
vector_dma_to_vector ( va, A, data_len ); vector_wait_for_dma ();
vector_set_vl ( data_len / sizeof (int) );
vector ( SVW, VADD, va, 42, va ); vector_instr_sync ();
vector_dma_to_host ( A, va, data_len ); vector_wait_for_dma ();
vector_free (); }
Program in VENICE assembly
•Allocate vectors in scratchpad
•Move data from main memory to scratchpad
•Wait for DMA transaction to be completed
•Setup for vector instructions
•Perform vector computations
•Wait for vector operations to be completed
•Move data from scratchpad to main memory
•Wait for DMA transaction to be completed
•Deallocate memory from scratchpad
![Page 11: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/11.jpg)
#include "Accelerator.h"
using namespace ParallelArrays;using namespace MicrosoftTargets;
int main(){ int A[] = {1,2,3,4,5,6,7,8};
Target *tgt = CreateVectorTarget();
IPA b = IPA( A, sizeof (A)/sizeof (int));
IPA c = b + 42;
tgt->ToArray( c, A, sizeof (A)/sizeof (int));
tgt->Delete();}
Target *tgt= CreateDX9Target(); Target *tgt = CreateMulticoreTarget();
Program in Accelerator
•Create a Target•Create Parallel Array objects•Write expressions•Call ToArray to evaluate expressions•Delete Target object
![Page 12: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/12.jpg)
Assembly Programming :
Write AssemblyWrite Assembly
Download to boardDownload to board
Compile with GccCompile with Gcc
Get ResultGet Result
Doesn’t compile?
Result Incorrect?
Accelerator Programming :
Write in AcceleratorWrite in Accelerator
Download to boardDownload to board
Compile with Microsoft Visual Studio
Compile with Microsoft Visual Studio
Get ResultGet Result
Compile with GccCompile with Gcc
Doesn’t compile?Or result incorrect?
![Page 13: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/13.jpg)
Assembly Programming :
1.Hard to program2.Long debug cycle3.Not portable4.Manual – Not always optimal or correct (wysiwyg)
Accelerator Programming :
1.Easy to program2.Easy to debug3.Can also target other devices4.Automated compiler optimizations
![Page 14: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/14.jpg)
Outline:
Motivation
Background
Implementation
Results
Conclusion
![Page 15: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/15.jpg)
![Page 16: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/16.jpg)
![Page 17: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/17.jpg)
#include "Accelerator.h"
using namespace ParallelArrays;using namespace MicrosoftTargets;
int main(){ Target *tgtVector = CreateVectorTarget(); const int length = 8192; int a[] = {1,2,3,4, … , 8192}; int d[length];
IPA A = IPA( a, length); IPA B = Evaluate( Rotate(A, [1]) + 1 ); IPA C = Evaluate( Abs( A + 2 )); IPA D = ( A + B ) * C ;
tgtVector->ToArray( D, d, length * sizeof(int));
tgtVector->Delete();}
××
DD
++
AA++
AA 22
AbsAbs
++
AA
11RotRot
![Page 18: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/18.jpg)
××
DD
++
AA++
AA 22
AbsAbs
++
11RotRot
AA
![Page 19: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/19.jpg)
××
DD
++
AA++
AA 22
AbsAbs
++
11A(rot)
A(rot)
![Page 20: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/20.jpg)
××
DD
++
AA++
AA 22
AbsAbs
++
11A(rot)
A(rot)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BBCC
++
AA 22
AbsAbs
![Page 21: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/21.jpg)
![Page 22: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/22.jpg)
![Page 23: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/23.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB CC
++
AA 22
AbsAbs
Combine Operations
![Page 24: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/24.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
|+||+|
22
CC
AA
Combine Operations
![Page 25: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/25.jpg)
Scratchpad Memory“Virtual Vector Register File”
![Page 26: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/26.jpg)
“Virtual Vector Register File”
![Page 27: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/27.jpg)
“Virtual Vector Register File”
Number of vector registers = ?Vector register size = ?
![Page 28: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/28.jpg)
“Virtual Vector Register File”
Number of vector registers = ?Vector register size = ?
![Page 29: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/29.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
1 0
1
1 0
1
1 1
21
2
11 22
33
11 22
33
11 22
3344
55
Evaluation Order
![Page 30: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/30.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
![Page 31: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/31.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
![Page 32: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/32.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
![Page 33: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/33.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
![Page 34: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/34.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
B 1
![Page 35: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/35.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
B 1
![Page 36: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/36.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
B 1
C 1
![Page 37: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/37.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
B 1
C 1
Active
A Yes
B No
C No
![Page 38: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/38.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
![Page 39: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/39.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 3
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
![Page 40: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/40.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
![Page 41: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/41.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
![Page 42: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/42.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
![Page 43: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/43.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
numTemps = 1
![Page 44: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/44.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B No
C No
numLoads = 1
numTemps = 1
numTotal = 2
maxTotal = 2
![Page 45: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/45.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B Yes
C No
numLoads = 2
numTemps = 1
![Page 46: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/46.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 2
B 1
C 1
Active
A Yes
B Yes
C No
numLoads = 2
numTemps = 0
![Page 47: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/47.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 1
B 1
C 1
Active
A Yes
B Yes
C No
numLoads = 2
numTemps = 0
![Page 48: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/48.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 1
B 1
C 1
Active
A Yes
B Yes
C No
numLoads = 2
numTemps = 0
![Page 49: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/49.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 1
B 1
C 1
Active
A Yes
B Yes
C No
numLoads = 2
numTemps = 1
![Page 50: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/50.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 1
B 1
C 1
Active
A Yes
B Yes
C No
numLoads = 2
numTemps = 1
numTotal = 3
maxTotal = 3
![Page 51: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/51.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 1
B 1
C 1
Active
A Yes
B Yes
C Yes
numLoads = 3
numTemps = 0
![Page 52: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/52.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 1
C 1
Active
A No
B Yes
C Yes
numLoads = 3
numTemps = 0
![Page 53: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/53.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 0
C 1
Active
A No
B No
C Yes
numLoads = 3
numTemps = 0
![Page 54: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/54.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 0
C 1
Active
A No
B No
C Yes
numLoads = 3
numTemps = 0
![Page 55: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/55.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 0
C 0
Active
A No
B No
C No
numLoads = 3
numTemps = 0
![Page 56: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/56.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 0
C 0
Active
A No
B No
C No
numLoads = 3
numTemps = 0
![Page 57: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/57.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 0
C 0
Active
A No
B No
C No
numLoads = 3
numTemps = 0
numTotal = 3
maxTotal = 3
![Page 58: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/58.jpg)
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
++
A(rot)
A(rot)
22
CC
Count number of virtual vector registers
Ref Count
A 0
B 0
C 0
Active
A No
B No
C No
numLoads = 0
numTemps = 0
numTotal = 0
maxTotal = 3
![Page 59: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/59.jpg)
“Virtual Vector Register File”
Number of vector registers = 3Vector register size = ?
![Page 60: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/60.jpg)
“Virtual Vector Register File”
Number of vector registers = 3Vector register size = Capacity/3
![Page 61: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/61.jpg)
Convert to LIR
Result:B
A(rot)
1
+
Result:D
A
B
+
C
×
××
DD
++
AA
CC
BB
++
A(rot)
A(rot)
11
BB
|+||+|
22
CC
AA
Result:C
A
2
|+|
11 22
33
11 22
33
11 22
3344
55
![Page 62: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/62.jpg)
Code Generation
Result:B
A(rot)
1
+
Result:D
A
B
+
C
×
Result:C
A
2
|+|
![Page 63: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/63.jpg)
Code Generation
Result:B
A(rot)
1
+
Result:D
A
B
+
C
×
Result:C
A
2
|+|
1 2 3 4 ... 8192
![Page 64: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/64.jpg)
Code Generation
Result:B
A(rot)
1
+
Result:D
A
B
+
C
×
Result:C
A
2
|+|
1 2 3 4 ... 8192 1
![Page 65: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/65.jpg)
Code Generation
Result:B
A(rot)
1
+
Result:D
A
B
+
C
×
Result:C
A
2
|+|
#include "vector.h“
int main(){ int A[8192] = {1,2,3,4, … 8192}; int *va = ( int *) vector_malloc ( 32772 ); int *vb = ( int *) vector_malloc ( 32768 ); int *vc = ( int *) vector_malloc ( 32768 ); int *vd = ( int *) vector_malloc ( 32772 ); int *vtemp = va;
vector_dma_to_vector ( va, A, 32772 ); for(int i=0; i<4; i++){ vector_set_vl ( 1024 ); vtemp = va; va = vd; vd = vtemp; vector_wait_for_dma (); if(i<3) vector_dma_to_vector ( va, A+(i+1)*1024, 32772 ); if(i>0){ vector_instr_sync (); vector_dma_to_host ( A+(i-1)*1024, vc, 32768 ); } vector ( SVW, VADD, vb, 1, va+1 );
![Page 66: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/66.jpg)
Code Generation
Result:D
A
B
+
C
×
Result:C
A
2
|+|
#include "vector.h“
int main(){ int A[8192] = {1,2,3,4, … 8192}; int *va = ( int *) vector_malloc ( 32772 ); int *vb = ( int *) vector_malloc ( 32768 ); int *vc = ( int *) vector_malloc ( 32768 ); int *vd = ( int *) vector_malloc ( 32772 ); int *vtemp = va;
vector_dma_to_vector ( va, A, 32772 ); for(int i=0; i<4; i++){ vector_set_vl ( 1024 ); vtemp = va; va = vd; vd = vtemp; vector_wait_for_dma (); if(i<3) vector_dma_to_vector ( va, A+(i+1)*1024, 32772 ); if(i>0){ vector_instr_sync (); vector_dma_to_host ( A+(i-1)*1024, vc, 32768 ); } vector ( SVW, VADD, vb, 1, va+1 ); vector_abs ( SVW, VADD, vc, 2, va );
![Page 67: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/67.jpg)
Code Generation
Result:D
A
B
+
C
×
#include "vector.h“
int main(){ int A[8192] = {1,2,3,4, … 8192}; int *va = ( int *) vector_malloc ( 32772 ); int *vb = ( int *) vector_malloc ( 32768 ); int *vc = ( int *) vector_malloc ( 32768 ); int *vd = ( int *) vector_malloc ( 32772 ); int *vtemp = va;
vector_dma_to_vector ( va, A, 32772 ); for(int i=0; i<4; i++){ vector_set_vl ( 1024 ); vtemp = va; va = vd; vd = vtemp; vector_wait_for_dma (); if(i<3) vector_dma_to_vector ( va, A+(i+1)*1024, 32772 ); if(i>0){ vector_instr_sync (); vector_dma_to_host ( A+(i-1)*1024, vc, 32768 ); } vector ( SVW, VADD, vb, 1, va+1 ); vector_abs ( SVW, VADD, vc, 2, va ); vector ( VVW, VADD, vb, vb, va );
![Page 68: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/68.jpg)
Code Generation#include "vector.h“
int main(){ int A[8192] = {1,2,3,4, … 8192}; int *va = ( int *) vector_malloc ( 32772 ); int *vb = ( int *) vector_malloc ( 32768 ); int *vc = ( int *) vector_malloc ( 32768 ); int *vd = ( int *) vector_malloc ( 32772 ); int *vtemp = va;
vector_dma_to_vector ( va, A, 32772 ); for(int i=0; i<4; i++){ vector_set_vl ( 1024 ); vtemp = va; va = vd; vd = vtemp; vector_wait_for_dma (); if(i<3) vector_dma_to_vector ( va, A+(i+1)*1024, 32772 ); if(i>0){ vector_instr_sync (); vector_dma_to_host ( A+(i-1)*1024, vc, 32768 ); } vector ( SVW, VADD, vb, 1, va+1 ); vector_abs ( SVW, VADD, vc, 2, va ); vector ( VVW, VADD, vb, vb, va ); vector ( VVW, VADD, vc, vc, vb );} vector_instr_sync (); vector_dma_to_host ( A+(i-1)*1024, vc, 32768 ); vector_wait_for_dma ();
vector_free (); }
Result:D
A
B
+
C
×
![Page 69: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/69.jpg)
Convert To LIRConvert To LIR
IRIR
Combine Memory transformsCombine Memory transforms
Combine OperationsCombine Operations
Evaluation Ordering Evaluation Ordering
Buffer CountingBuffer Counting
Calculate Register SizeCalculate Register Size
Need Double buffering?
LIRLIR
Expression GraphExpression Graph
Convert to IRConvert to IR
Sub-divide IRSub-divide IR
Constant foldingConstant folding
CSECSE
Move Bounds to LeavesMove Bounds to Leaves
VENICE CodeVENICE Code
Initialize MemoryInitialize Memory
Transfer Data To ScratchpadTransfer Data To Scratchpad
Set VLSet VL
Write Vector InstructionsWrite Vector Instructions
Transfer Result To HostTransfer Result To Host
Allocate MemoryAllocate Memory
![Page 70: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/70.jpg)
Outline:
Motivation
Background
Implementation
Results
Conclusion
![Page 71: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/71.jpg)
370x
Speedups Compiler vs. Human fir 2Dfir life imgblend median motest
V1 1.04x 0.97x 1.01x 1.00x 0.99x 0.81xV4 1.01x 1.12x 1.10x 1.02x 1.07x 1.01xV16 1.09x 1.12x 1.38x 0.90x 0.96x 1.01xV64 1.30x 1.42x 2.24x 0.92x 0.81x 1.04x
![Page 72: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/72.jpg)
![Page 73: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/73.jpg)
![Page 74: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/74.jpg)
CPUBenchmark Runtime (ms)
fir 2Dfir life imgblend median motest
Xeon E5540 (2.53GHz) 0.07 0.44 0.53 0.12 9.97 0.24
VENICE(V64,100MHz) 0.07 0.29 0.23 0.33 3.11 0.22
Speedup 1.0 x 1.5 x 2.3 x 0.4 x 3.2 x 1.1 x
Compare to Intel CPU
Compile Time
fir 2D fir life imgblend median motest geomean
Compile time(ms)
4.74 5.05 4.49 4.44 92.72 24.27 10.12
![Page 75: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/75.jpg)
Using smaller data types
fir 2D fir life imgblend median motest geomeanbyte halfword byte halfword byte word
V1 3.93x 4.36x 4.07x 4.12xV4 3.54x 3.83x 4.03x 3.79xV16 2.90x 3.22x 4.00x 3.34x
V1 1.96x 1.54x 1.74xV4 2.00x 1.46x 1.71xV16 1.97x 1.83x 1.90x
Speedup using bytes
Speedup using halfwords
![Page 76: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/76.jpg)
Outline:
Motivation
Background
Implementation
Results
Conclusion
![Page 77: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/77.jpg)
Conclusions:
The compiler greatly improves the programming and debugging experience for VENICE.
The compiler produces highly optimized VENICE code and achieves performance close-to or better-than hand-optimized code.
The compiler demonstrates the feasibility of using high-abstraction languages, such as Microsoft Accelerator with pluggable 3rd-party back-end support to provide a sustainable solution for future emerging hardware.
![Page 78: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/78.jpg)
Thank you !
![Page 79: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/79.jpg)
Optimal VL for V16
Input Data Sizes (words)
8192 16384 32768 65536 131072 262144 524288 1048576
Instr-uction Count
1 4096 8192 8192 8192 8192 8192 8192 81922 4096 8192 8192 8192 8192 8192 8192 81923 2048 2048 4096 4096 8192 8192 8192 81924 1024 2048 2048 4096 4096 8192 8192 81925 1024 2048 2048 4096 4096 8192 8192 81926 1024 2048 2048 4096 4096 8192 8192 81927 1024 2048 2048 4096 4096 8192 8192 81928 1024 2048 2048 4096 4096 8192 8192 81929 1024 2048 2048 4096 4096 8192 8192 8192
10 1024 2048 2048 4096 4096 8192 8192 819211 1024 2048 2048 4096 4096 8192 8192 819212 1024 2048 2048 4096 4096 8192 8192 819213 1024 2048 2048 4096 4096 8192 8192 819214 1024 2048 2048 4096 4096 8192 8192 819215 1024 2048 2048 4096 4096 8192 8192 819216 1024 2048 2048 4096 4096 8192 8192 8192
Look-up Table
![Page 80: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/80.jpg)
![Page 81: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/81.jpg)
![Page 82: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/82.jpg)
“Virtual Vector Register File”
Number of vector registers = 4Vector register size = 1024
![Page 83: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/83.jpg)
Combine Operators for Motion Estimation
V4 V16 V64Before (ms) 2.03 0.55 0.30After (ms) 1.36 0.37 0.21Speedup 1.49x 1.48x 1.43x
![Page 84: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/84.jpg)
Performance Degradation on median
int *v_min = v_input1; int *v_max = v_input2;
vector ( VVW, VOR, v_tmp, v_min, v_min ); vector ( VVW, VSUB, v_sub, v_max, v_min ); vector ( VVW, VCMV_LTZ, v_min, v_max, v_sub ); vector ( VVW, VCMV_LTZ, v_max, v_tmp, v_sub );
vector ( VVW, VSUB, v_sub, v_input1, v_input2 ); vector ( VVW, VCMV_GTEZ, v_min, v_input2, v_sub ); vector ( VVW, VCMV_LTZ, v_min, v_input1, v_sub ); vector ( VVW, VCMV_GTEZ, v_min, v_input1, v_sub ); vector ( VVW, VCMV_LTZ, v_max, v_input2, v_sub );
Human-written compare-and-swap
Compiler-generated compare-and-swap
![Page 85: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/85.jpg)
Double Buffering
![Page 86: Zhiduo Liu Supervisor: Guy Lemieux Sep. 28 th , 2012](https://reader034.vdocument.in/reader034/viewer/2022051517/56815884550346895dc5e4ca/html5/thumbnails/86.jpg)