high-level synthesis with gaut
DESCRIPTION
Dr. Philippe COUSSY from the Université de Bretagne-Sud gives an introduction to high-level synthesis principles and tools.TRANSCRIPT
![Page 1: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/1.jpg)
1/68
HighHigh--Level Synthesis with Level Synthesis with
GAUTGAUT
UniversitUniversitéé de Bretagnede Bretagne--SudSud
LabLab--STICC STICC
Philippe COUSSYPhilippe COUSSY
[email protected]@univ--ubs.frubs.fr
![Page 2: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/2.jpg)
2/68
OutlineOutline
� Lab-STICC
� High-Level Synthesis
� GAUT
� Experimental results
� Conclusion
![Page 3: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/3.jpg)
3/68
Université de Bretagne Sud
Europe
France
Brittany
![Page 4: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/4.jpg)
4/68
Lab-STICC Laboratory
� “Laboratoire des Sciences et Techniques de l’Information, de la Communication et de la Connaissance”
� From sensor to knowledge: communicate and decide
� Lab-STICC laboratory is composed of 4 research centers
� Brest (2)�
� Lorient (1)�
� Vannes (1)�
� Lab-STICC laboratory represents
� A staff of 400 people� 182 researchers
� 178 PhDs
� More than 6,5M€ of research grants for 2010
� A scientific production during the last 4 years� 65 books or chapters
� 497 journal publications
� 1012 conference publications
� 26 new patents
� 69 PhDs diploma
![Page 5: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/5.jpg)
5/68
Lab-STICC organization
� Lab-STICC is decomposed in three research domains (3 departments) �
� Dept. MOM � Microwaves and Material
Head: Patrick Queffelec
36 Faculties
� Dept. CID� Knowledge, Information, Decision
Head: Gilles Coppin
22 Faculties
� Dept. CACS � Communications, Architectures, Circuits/Systems
Head: Emmanuel Boutillon
59 Faculties
![Page 6: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/6.jpg)
6/68
OutlineOutline
� Lab-STICC
� High-Level Synthesis
� GAUT
� Experimental results
� Conclusion
![Page 7: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/7.jpg)
7/68
Typical HW design flowTypical HW design flow
RTL
Logic synthesis
Gate level netlist
Layout
GDSII
� Starting from a Register Transfer Level description, generate an IC layout
![Page 8: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/8.jpg)
8/68
Typical HW design flowTypical HW design flow
� Starting from a functional description, automatically generate an RTL architecture
RTL
Logic synthesis
Gate level netlist
Layout
GDSII
Algorithm#define N 2
typedef int matrix[N][N];
int main(const matrix A, matrix C)
{
const matrice B ={{1, 2},{ 3, 4}};
int tmp;
int i,j,k;
for (i=0;i<N;i++)
for (j=0;j<N;j++){
tmp = A[i][0]*B[0][j];
for (k=1;k<N - 1;k++)
tmp = tmp + A[i][k] * B[k][j];
C[i][j] = tmp + A[i][N-1] * B[N-1][j];
}
return 0;
}
High-Level synthesis
SystemC simulation models (CABA/TLM)
Virtual prototyping
![Page 9: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/9.jpg)
9/68
Electronic System Level Design (ESLD)
Year
Circuit complexityTransistors
Des
ign
auto
mat
ion
Designer productivity
IP- & Plateform- based design
Co-design & HLS
System-Level Design Language
& virtual prototyping
RTL
Abstraction
10050095
![Page 10: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/10.jpg)
10/68
HighHigh--level synthesislevel synthesis
� Starting from a functional description, automatically generate an RTL architecture
� Algorithmic description� No timing notion in the source code
� Mainly oriented toward data dominated application
Highly processing algorithm like filters…
� Initial description can be
“RTL oriented”
“Function oriented”
� Behavioral description� Notion of step / local timing constraints in the source code
by using the wait statements of SystemC for example
� Can be used for both data and control dominated application
Interface controller, DMA…
Filters…
![Page 11: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/11.jpg)
11/68
SynthesizableSynthesizable modelsmodels
� C for the synthesis:� No pointer
� Statically unresolved
� Arrays are allowed!
� No standard function call� printf, scanf, fopen, malloc…
� Function calls are allowed� Can be in-lined or not
� Finite precision� Bit accurate integers, fixed point, signed, unsigned…
� Based on SystemC or Mentor Graphics data types
sc_int, sc_fixed
ac_int, ac_fixed
![Page 12: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/12.jpg)
12/68
Purely functional Example #1: a simple C codePurely functional Example #1: a simple C code
#define N 16
int main(int data_in, int *data_out)
{ static const int Coeffs [N] = {98,-39,-327,439,950,-2097,-1674,9883,9883,-1674,-2097,950,439,-327,-39,98};
int Values[N];
int temp;
int sample,i,j;
sample = data_in;
temp = sample * Coeffs[N-1];
for(i = 1; i<=(N-1); i++){
temp += Values[i] * Coeffs[N-i-1];
}
for(j=(N-1); j>=2; j-=1 ){
Values[j] = Values[j-1];
}
Values[1] = sample;
*data_out=temp;
return 0;
}
![Page 13: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/13.jpg)
13/68
Purely functional ePurely functional examplexample #2: bit #2: bit accurateaccurate C++ codeC++ code
#include "ac_fixed.h" // From Mentor Graphics
#define PORT_SIZE ac_fixed<16, 12, true, AC_RND,AC_SAT>
// 16 bits, 12 bits after the point, quantization = rounding, overflow = saturation
#define N 16
int main(PORT_SIZE data_in, PORT_SIZE &data_out)
{
static const PORT_SIZE Coeffs [N]={1.1, 1.5, 1.0, 1.0, 1.7, 1.8, 1.2, 1.0, 1.6, 1.0, 1.5, 1.1, 1.9, 1.3, 1.4, 1.7};
PORT_SIZE Values[N];
PORT_SIZE temp;
PORT_SIZE sample;
sample= data_in;
temp = sample * Coeffs[N-1];
for(int i = 1; i<=(N-1); i++){
temp = Values [i] * Coeffs[N-i-1] + temp;
}
for(int j=(N-1); j>=2; j-=1 ){
Values[j] = Values [j-1];
}
Values[1] = sample;
data_out=temp;
return 0;
}
![Page 14: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/14.jpg)
14/68
HighHigh--level synthesislevel synthesis
� Starting from a functional description, automatically generate an RTL architecture
� Algorithmic description� No timing notion in the source code
� Mainly oriented toward data dominated application
Highly processing algorithm like filters…
� Initial description can be
“RTL oriented”
“Function oriented”
� Behavioral description� Notion of step / local timing constraints in the source code
by using the wait statements of SystemC for example
� Can be used for both data and control dominated application
Interface controller, DMA…
Filters…
![Page 15: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/15.jpg)
15/68
Behavioral descriptionBehavioral description
�Behavioral description�Notion of step / local timing constraints in the source code
by using the wait statements of SystemC for example
...
void addmul() {
sc_signal<sc_uint<32> > tmp1;
tmp1 = 0;
result = 0;
wait();
while (1) {
tmp1 = b * c;
wait();
result = a + tmp1;
wait();
}
}
...
Cycle-by-cycle FSMD
with reset state
Reset state
First state
Second state
![Page 16: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/16.jpg)
16/68
High-level transformations
� Loops� Loop pipelining, � loop unrolling
� None, partially, completely
� Loop merging� Loop tiling� …
� Arrays� Arrays can be mapped on memory banks� Arrays can be synthesized as registers� Constant arrays can be synthesized as logic� …
� Functions� Function calls can be in-lined� Function is synthesized as an operator
� Sequential, pipelined, functional unit…
� Single function instantiation� …
![Page 17: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/17.jpg)
17/68
HighHigh--level synthesislevel synthesis
� Starting from a functional description, automatically generate an RTL architecture� Algorithmic description
� no timing notion in the source code
� Behavioral description� Notion of step / local timing constraints in the source code
by using the wait statements of SystemC for example
� Constraints� Timing constraints: latency and/or throughput
� Resource constraints: #Operators and/or #Registers and/or #Memory, #Slices...
� Objectives � Minimization: area i.e. resources, latency, power consumption…
� Maximization: throughput
� Library of characterized operators
![Page 18: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/18.jpg)
18/68
Synthesis stepsSynthesis steps
� Compilation� Generates a formal modeling of the specification
� Selection� Chooses the architecture of the operators
� Allocation� Defines the number of operators for each selected type
� Scheduling� Defines the execution date of each operation
� Binding (or Assignment)� Defines which operator will execute a given operation
� Defines which memory element will store a data
� Architecture generation� Writes out the RTL source code in the target language e.g. VHDL
![Page 19: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/19.jpg)
19/68
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection
HLS steps: inputsHLS steps: inputs
Specification
O = ((n01+n02)*n12)-(n21+n22)
Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
CLA Booth
Wallace
CLA
Operators library
RCA RCA
Adders multipliers subtractorsSpecification
Constraints
Operators
ibrary
![Page 20: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/20.jpg)
20/68
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection
HLS steps: CompilationHLS steps: Compilation
Specification
O = ((n01+n02)*n12)-(n21+n22)
Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
CLA Booth
Wallace
CLA
Operators library
RCA RCA
Adders multipliers subtractors
Intermediate representation
n21n22
N0
N1
N3
+
×
-
+
N2
n01 n02
n11 n12
n31 n32
O
![Page 21: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/21.jpg)
21/68
Synthesis stepsSynthesis steps
� Compilation� Generates a formal modeling of the specification
� Selection� Chooses the architecture of the operators
� Allocation� Defines the number of operators for each selected type
� Scheduling� Defines the execution date of each operation
� Binding (or Assignment)� Defines which operator will execute a given operation
� Defines which memory element will store a data
� Architecture generation� Writes out the RTL source code in the target language e.g. VHDL
![Page 22: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/22.jpg)
22/68
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection
HLS steps: SelectionHLS steps: Selection
Specification
O = ((n01+n02)*n12)-(n21+n22)
Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
CLA Booth
Wallace
CLA
Operators library
RCA RCA
Adders multipliers subtractors
Booth
RCA
RCA
Intermediate representation
n21n22
N0
N1
N3
+
×
-
+
N2
n01 n02
n11 n12
n31 n32
O
![Page 23: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/23.jpg)
23/68
Synthesis stepsSynthesis steps
� Compilation� Generates a formal modeling of the specification
� Selection� Chooses the architecture of the operators
� Allocation� Defines the number of operators for each selected type
� Scheduling� Defines the execution date of each operation
� Binding (or Assignment)� Defines which operator will execute a given operation
� Defines which memory element will store a data
� Architecture generation� Writes out the RTL source code in the target language e.g. VHDL
![Page 24: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/24.jpg)
24/68
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection
HLS steps: allocation HLS steps: allocation
Specification
O = ((n01+n02)*n12)-(n21+n22)
Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
CLA Booth
Wallace
CL
Operators library
RCA RCA
Adders multipliers subtractors
Booth
RCA
RCA *1
*1
*1
Intermediate representation
n21n22
N0
N1
N3
+
×
-
+
N2
n01 n02
n11 n12
n31 n32
O
![Page 25: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/25.jpg)
25/68
Synthesis stepsSynthesis steps
� Compilation� Generates a formal modeling of the specification
� Selection� Chooses the architecture of the operators
� Allocation� Defines the number of operators for each selected type
� Scheduling� Defines the execution date of each operation
� Binding (or Assignment)� Defines which operator will execute a given operation
� Defines which memory element will store a data
� Architecture generation� Writes out the RTL source code in the target language e.g. VHDL
![Page 26: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/26.jpg)
26/68
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection
HLS steps: scheduling HLS steps: scheduling
Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
+N0
×N1
-N3
Booth
RCA
RCA *1
*1
*1
+N2
![Page 27: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/27.jpg)
27/68
Synthesis stepsSynthesis steps
� Compilation� Generates a formal modeling of the specification
� Selection� Chooses the architecture of the operators
� Allocation� Defines the number of operators for each selected type
� Scheduling� Defines the execution date of each operation
� Binding (or Assignment)� Defines which operator will execute a given operation
� Defines which memory element will store a data
� Architecture generation� Writes out the RTL source code in the target language e.g. VHDL
![Page 28: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/28.jpg)
28/68
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection
HLS steps: binding HLS steps: binding
Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
+
×
-
+
×
-
Operation binding
n01
n21, n11
n22, n12
R1
R3
R4
n02 R2
n31 R5
n32 R6
Data Binding
Booth
RCA
RCA *1
*1
*1
+
![Page 29: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/29.jpg)
29/68
Synthesis stepsSynthesis steps
� Compilation� Generates a formal modeling of the specification
� Selection� Chooses the architecture of the operators
� Allocation� Defines the number of operators for each selected type
� Scheduling� Defines the execution date of each operation
� Binding (or Assignment)� Defines which operator will execute a given operation
� Defines which memory element will store a data
� Architecture generation� Writes out the RTL source code in the target language e.g. VHDL
![Page 30: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/30.jpg)
30/68
HLS steps: outputHLS steps: output
+
×
-
×
Operation binding
n01
n21, n11
n22, n12
R1
R3
R4
n02 R2
n31 R5
n32 R6
Data binding
Architecturegeneration
Specification
RTL architecture
Operators
Library
Selection Allocation
Scheduling Binding
Compilation
Intermediate
format
Constraints
Controller
- FSM controller
- Programmable controller
Datapath components
- Storage components
- Functional units
- Connection components +
R3
R1
R4
R2
MU
XM
UX
R5
R6
Controller
Datapath
-x
+
R3
R1
R4
R2
MU
XM
UX
R5
R6
Controller
Datapath
+
R3
R1
R4
R2
MU
XM
UX
MU
XM
UX
R5
R6
Controller
Datapath
-x
![Page 31: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/31.jpg)
31/68
Academic tools
� Streamroller (Univ. Mich.)
� SPARK (UCSD)
� xPilot (UCLA)
� UGH (TIMA+LIP6)
� MMALPHA (IRISA+CITI+…)
� ROCCC (UC Riverside)
� GAUT (UBS / Lab-STICC)
� …
![Page 32: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/32.jpg)
32/68
Commercial tools
�CatapultC (Mentor Graphics => Calypto)
�PICO (Spin-off HP => Synfora => Synopsys)
�Cynthecizer (Forte design)
�Cyber (NEC)
�AutoPilot (AutoESL)
�C to Silicon (Candence)
�Synphony (Synopsys)
�…
![Page 33: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/33.jpg)
33/68
OutlineOutline
� Lab-STICC
� High-Level Synthesis
� GAUT
� Experimental results
� Conclusion
![Page 34: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/34.jpg)
34/68
GAUTGAUT
� An academic, free and open source HLS tool
� Dedicated to DSP applications� Data-dominated algorithm
� 1D, 2D Filters
� Transforms (Fourrier, Hadamar, DCT…)
� Channel Coding, source coding algorithms
� Input : bit-accurate C/C++ algorithm� bit-accurate integer and fixed-point from Mentor Graphics
� Output : RTL Architecture� VHDL
� SystemC � CABA: Cycle accurate and Bit accurate
� TLM: Transaction level model
� Compatible with both SocLib and MPARM virtual prototyping platforms
� Automated Test-bench generation
� Automated operators characterization
![Page 35: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/35.jpg)
35/68
GAUT: ConstraintsGAUT: Constraints
GALS/LIS
interface Data
Path
Controller
Clock enable
Bus
contr
olle
r
Data(i)
Req(i)
Ack(i)
GAUT
Internalbuses
Bit accurate Algorithm in bit-accurate C/C++
Specific
links &
protocols
Memory
Unit
Synthesis constraints- Initiation Interval (Data average throughput )
- Clock frequency
- FPGA/ASIC target technology
- Memory architecture and mapping
- I/O Timing diagram (scheduling + ports)
- GALS/LIS Interface (FIFO protocol)
![Page 36: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/36.jpg)
36/68
GAUT: Design flowGAUT: Design flow
GALS/LIS
interface Data
Path
Controller
Clock enable
Bus
contr
olle
r
Data(i)
Req(i)
Ack(i)
GAUT
Internalbuses
Bit accurate Algorithm in bit-accurate C/C++
Specific
links &
protocols
Synthesis constraints- Iteration period (Data average
throughput )
- Clock frequency
- FPGA/ASIC target technology
Synthesis constraints- Iteration Interval (Data average throughput )
- Clock frequency
- FPGA/ASIC target technology
- Memory architecture and mapping
- I/O Timing diagram (scheduling + ports)
- GALS/LIS Interface (FIFO protocol)
Memory
Unit
![Page 37: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/37.jpg)
37/68
GAUT: Compilation
![Page 38: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/38.jpg)
38/68
GAUT: DFG viewer
![Page 39: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/39.jpg)
39/68
GAUT: Operators characterization
R R
Area : operator only (nb slice)
Propagation time : reg+tri+ope+reg
Script and logic
Database, interpolation…
O
R RO
Mux
Mux
![Page 40: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/40.jpg)
40/68
GAUT: Synthesis steps
HDL coding style: FSMD, FSM+reg, FSM_ROM+reg…
Inititation Interval II
Data Assginment (Left Edge,MWBM…)
Clock period
I/O timing & memory constraints
![Page 41: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/41.jpg)
41/68
GAUT: I/O and memory constraints
![Page 42: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/42.jpg)
42/68
GAUT: Gantt viewer
![Page 43: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/43.jpg)
43/68
GAUT: Interface synthesis
Performances of interfaces depend on data locality (data fetch penality, cache miss)
Interface can be:- Ping pong buffer (scratch-pad on Local Memory Bus)- FIFO (i.e. FSL Fast Simplex Link from Xilinx)
![Page 44: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/44.jpg)
44/68
GAUT: Test-bench generation
Test-bench Generation
Modelsim Script GenerationResult File Generation
![Page 45: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/45.jpg)
45/68
GAUT: more than 100 downloads each year
![Page 46: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/46.jpg)
46/68
OutlineOutline
� Lab-STICC
� High-Level Synthesis
� GAUT
� Experimental results
� Design space exploration of HW accelerators
� MPSoC Design space exploration through virtual prototyping
� SoC hardware prototyping
� “System on board”
� Conclusion
![Page 47: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/47.jpg)
47/68
Experimental results: MJPEG decoding
Dc VLD IDPCM
Ac VLD RLD Unzig Zag
Dequant Idct Yuv2rgbDeMux Huffman table
Q table
Yuv
Block Diagram of mjpeg baseline decoder
Execution time ratio for software MJPEG decoding(by using gprof)
![Page 48: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/48.jpg)
48/68
Synthesis results
IDCT
YUV2RGB
![Page 49: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/49.jpg)
49/68
Synthesis results
IDCT
YUV2RGB
Hardwareprototyping
Virtualprototyping
![Page 50: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/50.jpg)
50/68
SoCLib: a virtual prototyping platform
� French National Research Project (ANR)
� Free and open source virtual prototyping environment� Library of SystemC simulation models
� Hardware components� CPUs, HW-ACCs, memories, busses
� VCI/OCP interface protocol is used
� Two types of model are available for each HW component� CABA (Cycle Accurate / Bit Accurate)
� TLM-DT (Transaction Level Modeling with Distributed Time)
� Software components� OS, API…
� Associated tools� Simulation, configuration, debug
� Automatic generation of simulation models
� GAUT is used, to generate simulation models of HW-ACC� CABA and TLM-DT
www.soclib.fr
![Page 51: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/51.jpg)
51/68
SoCLib: Design flow
HLS Simulationmodels
Mapping
simulateurPerformance
analysis
Debugging
Architecture
Design
![Page 52: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/52.jpg)
52/68
Experiments: architecture #1
TG Demux VLD IQ-ZZ IDCT Libu RamDAC
MIPSR3000
INST DATA
Generic Micro Network VGMN
File access
MWMR
Frame buffer
MWMR
Lock engine RAM TTY
Pure software implementation on a mono-processor architecture
![Page 53: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/53.jpg)
53/68
Experiments: architecture #2
TG Demux VLD IQ-ZZ IDCT Libu RamDAC
MIPSR3000
INST DATA
Generic Micro Network VGMN
File access
MWMR
Frame buffer
MWMR
Lock engine RAM TTY
MWMR Copro
Software implementation on a mono-processor architecture + IDCT as HW accelerator
![Page 54: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/54.jpg)
54/68
Experiments: architecture #3
TG Libu RamDAC
Demux VLD IQ-ZZ IDCTDemux VLD IQ-ZZ IDCT
Demux VLD IQ-ZZ IDCTDemux VLD IQ-ZZ IDCT
Split
MIPS
R3000
INST DATA
Generic Micro Network VGMN
File access
MWMR
Framebuffer
Lock
engineTTY
MIPS
R3000
INST DATA
MIPS
R3000
INST DATA MWMR
MIPS
R3000
INST DATA
RAM3RAM2RAM1RAM0
Parallelized software implementation on a multiprocessor architecture
![Page 55: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/55.jpg)
55/68
Experiments: architecture #4
MIPSR3000
INST DATA
File
access
MWMR
Frame
buffer
Lockengine
TTY
MIPSR3000
INST DATA
MIPSR3000
INST DATA MWMR
MIPSR3000
INST DATA
RAM3RAM2RAM1RAM0
Generic Micro Network VGMN
MWMR Copro
MWMR Copro
MWMR Copro
MWMR Copro
TG Libu RamDAC
Demux VLD IQ-ZZ IDCT
Demux VLD IQ-ZZ IDCT
Split
![Page 56: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/56.jpg)
56/68
MJPEG Results
Execution time of the application (in cycles)
to process 50 images of 48*48 pixels
14% 21%31% IDCT generated by GAUT reduces the
application latency by 14%
Parallelization of the application on
4 CPUs reduces the latency by 21%
![Page 57: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/57.jpg)
57/68
MJPEG Results
10%The 4 HW IDCT in the multiprocessor
architecture further reduce the latency
by 10%
Execution time of the application (in cycles)
to process 50 images of 48*48 pixels
![Page 58: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/58.jpg)
58/68
Execution time of the application (in cycles)
to process 50 images of 48*48 pixels
MJPEG Results
10%
Simulation time increase
38%
65%
Simulation time (in secondes)
![Page 59: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/59.jpg)
59/68
MJPEG: Hardware prototyping
� Real time decoding: 24 QCIF images/sec� IDCT: maximum I/O bandwidth (4 parallel input ports) and the lower
latency (33 cycles, Freq. 138,9Mhz)
� YUV2RGB: minimum latency (12 cycles, Freq. 249,18Mhz)
� Compared to a pure SW implementation� 10x speed-up for the IDCT function
� 5x speed-up for the yuv2rgb function
SoC design on a FPGA Xilinx Virtex 5 LX110 (XUPV5) board
![Page 60: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/60.jpg)
60/68
Prototyping platform
Sundance platform
Mother board
Daughter boardsDSP C62 C67 (Texas Instrument)
FPGA Virtex 1000E (Xilinx)
Interconnection matrixPoint to point links : Com Port (CP, up to 20 Mbytes/sec) and Sundance Digital Bus (SDB, up to 200 Mbytes/sec)
![Page 61: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/61.jpg)
61/68
DVB-DSNG receiver architecture mapping
C-functional architectureMPEG2
frameReceived
data
Hw
(FPGA)
Hw
(FPGA)
Sw
(DSP)
Sw
(DSP)
design architecture
Sw compiler
(Code
Composer)
High Level
Synthesis (GAUT)
(+ ISE)
HLS (GAUT)
(+ ISE)
Sw compiler
(Code
Composer)
Synchro Viterbi de-interleaving RS decoding
![Page 62: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/62.jpg)
62/68
DVB-DSNG receiver
� Synchronization and interleaving : Sw : C62 DSP
� Viterbi and Reed Solomon decoders : Hw : Virtex-1000E FPGA
� 4 SDB links
� 26 Mbps throughput (limited by the synchronization bloc…C64 for higher
throughputs)
![Page 63: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/63.jpg)
63/68
Viterbi decoding
• functional/application parameters : state number, throughput
State Number 8 16 32 64 128
Throughput (Mbps) 44 39 35 26 22
Synthesis Time (s) 1 1 3 9 27
Number of logic
elements223 434 1130 2712 7051
• DVB-DSNG standard : throughput : 1.5 to 72 Mbps, 64 states Viterbi decoder
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
1 10 100
Throughput (Mbps)
Num
ber of
logi
c el
ement
s
![Page 64: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/64.jpg)
64/68
Reed Solomon decoding
0
500
1000
1500
2000
2500
3000
3500
4000
1 10 100
Throughput (Mbps)
Nu
mb
er
of lo
gic
ele
me
nts
RS(207, 187, 10): ATSC
RS(255,239,8): IEEE 802.16
RS(255,223,16): CCSDS
RS(255,205,10): IESS308
RS(255,205,16): ADSL2
RS(204,188,8): DVB-T
RS(204,188,8): DVB-C DVB-S
0
1000
2000
3000
4000
5000
6000
1 10 100
Throughput (Mbps)
Nu
mb
er
of lo
gic
ele
me
nts
• DVB-DSNG standard : 1.5 to 72 Mbps, RS (204/188) decoder
• functional/application parameters : number of input symbols, data symbols, throughput
![Page 65: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/65.jpg)
65/68
ConclusionConclusion
� HLS allows to automatically generate several RTL architectures� From an algorithmic/behavioral description and a set of constraints
� HLS allows to generate� VHDL models for synthesis purpose
� SystemC simulation models for virtual prototyping
� HLS allows to explore the design space of � Hardware accelerators
� MPSoC architectures including HW accelerators
� GAUT is free downloadable at � http://lab-sticc.fr/www-gaut
![Page 66: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/66.jpg)
66/68
References
![Page 67: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/67.jpg)
67/68
References
![Page 68: High-Level Synthesis with GAUT](https://reader034.vdocument.in/reader034/viewer/2022051609/547ea613b379596f2b8b54c7/html5/thumbnails/68.jpg)
68/68
HighHigh--Level Synthesis with Level Synthesis with
GAUTGAUT
UniversitUniversitéé de Bretagnede Bretagne--SudSud
LabLab--STICC STICC
Philippe COUSSYPhilippe COUSSY
[email protected]@univ--ubs.frubs.fr