3d-dresd alberto gallini
TRANSCRIPT
Fondazione Silvio Tronchetti Provera
Spatial Computation and Compiler techniques for configurable Architectures
Milan - July ’07
Alberto Gallini
Dept. of Information, Systems and Communication, University of Milan Bicocca.
Fondazione Silvio Tronchetti Provera
OUTLINE
1. Intro
2. An hint to BME
3. HCL-to-ASCL compilation framework
4. High-level Compilation Layer (HCL)
5. XiRisc+PiCoGa Architecture Specific Compilation Layer (ASCL)
Fondazione Silvio Tronchetti Provera
• physical problems:- Leakage currents, threshold voltage control, tunneling, electromigration, high
interconnect resistance, crosstalk.
• communication:- It is hard to imagine how any form of globally connected stored-program
architecture could be built in a technology where communication even between adjacent switches is difficult.
Complex design Higher and higher costs
V. Agarwal, H.S. Murukkathampoondi, S.W. Keckler, and D.C. Burger. Clock rate versus IPC: The end of the road for conventional microarchitectures. In International Symposium on Computer Architecture (ISCA), June 2000.
LIMITS OF SILICON TECHNOLOGY
Fondazione Silvio Tronchetti Provera
WHAT’S GOING ONIntel Quad core
… towards multi-cores: tera-scale architecture.
ftp://download.intel.com/research/platform/videos/terascale/terascale_demo.htm
v
Cell Processor IBM-Toshiba-Sony (PSP3)
Cell is a heterogeneous chip multiprocessor consisting of a 64-bit Power core, augmented with 8 specialized co-processors based on a novel single-instruction multiple-data (SIMD) architecture called SPU (Synergistic Processor Unit), for data intensive processing as is found in cryptography, media and scientific applications. The system is integrated by a coherent on-chip bus.
For accelerators based on Field Programmable Gate Arrays (FPGA) and tightly coupled to the CPU via the front side bus (FSB). Intel is committed to working with hardware vendors who build FSB-attached accelerator modules, as well providers of compilers for FPGAs, to integrate AAL into their offerings.
Intel Accelerator Abstraction Layer (AAL)
Fondazione Silvio Tronchetti Provera
RESEARCH APPROACHELEMENTARY DEVICES:
Molecular devices have different structural and behavioral properties from transistors.
COMPUTATIONAL DEVICES:
These structures will not be immune to structural defects. So, defects will have to be managed by employed model.
NON-STANDARD ARCHITECTURES:
Highly scalable (characterized by a great number of not reliable processing elements-PE-) computational architecture are mapped on the crystal. Such architectures could be, in a first step, hybrid (molecular crystal merged into regular silicon lattice)
NEW MODELS OF COMPUTATION:
Different model of computation based on thousands, unreliable, interacting elements.
LINKING TRADITIONAL PROGRAMMING LANGUAGES:
Application deployment and efficient spatial resources exploitation
PRODUCTION PROCESS:
Small dimensions of molecular devices will have strong implications on production process. It is possible to synthesize regular molecular structure by bottom-up self-assembling techniques.
Fondazione Silvio Tronchetti Provera
BIO MOLECULAR ENGINE
An instrument for analysis and experimentation on architectural solutions
Fondazione Silvio Tronchetti Provera
BME FEATURES & RESULTS
- 3D topology editor
- “Gird” support (RMI technology)
- Java System-C integration
- Time management and traffic monitoring
Alberto Gallini, Claudio Ferretti, Giancarlo Mauri, Davide Molteni: Bio-Molecular Engine: A Simulation Environment for Bio-Inspired Architectural Models of Molecular-Scale Devices Based Machines. MSV 2005: 100-106.
Alberto Gallini, Claudio Ferretti, Giancarlo Mauri: Bio Molecular Engine: a bio-inspired environment for models of growing and evolvable computation. GECCO Workshops 2005: 249-256 - ACM Press New York, NY, USA.
Guido Casiraghi, Claudio Ferretti, Alberto Gallini, Giancarlo Mauri: A Membrane Computing System Mapped on an Asynchronous, Distributed Computational Environment. Workshop on Membrane Computing 2005: 159-164.
- Papers:
Fondazione Silvio Tronchetti Provera
COMPILATION FLOWint fibonacci(int m);
int main(void){int x;for (x = 1; x < 10; x++){printf("\n - fibonacci(%d) = %d",x,fibonacci(x));}}
int fibonacci(int m){
unsigned int f_0 = 0;unsigned int f_1 = 1;unsigned int f_2, i;
if (m <= 1){return m;}else{int i = 2;for (i=2; i <= m; i++){f_2 = f_0 + f_1;f_0 = f_1;f_1 = f_2;}return f_2;}
}
mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32,$vr7.s32mov fibonacci.i0 <- $vr6.s32
mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32ldc $vr4.s32 <- 2cvt $vr8.s32 <- fibonacci.f_2mov fibonacci.f_2 <- $vr5.u32mov fibonacci.f_0 <- fibonacci.f_1mov fibonacci.f_1 <- fibonacci.f_2ldc $vr7.s32 <- 1add $vr6.s32 <- fibonacci.i0,$vr7.s32mov fibonacci.i0 <- $vr6.s32
mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32ldc $vr4.s32 <- 2cvt $vr8.s32 <- fibonacci.f_2mov fibonacci.f_2 <- $vr5.u32mov fibonacci.f_0 <- fibonacci.f_1mov fibonacci.f_1 <- fibonacci.f_2ldc $vr7.s32 <- 1add $vr6.s32 <- fibonacci.i0,$vr7.s32mov fibonacci.i0 <- $vr6.s32
mov $vr9.s32 <- 0mov $vr0.u32 <- 0mov fibonacci.f_0 <- $vr0.u32ldc $vr1.u32 <- 1mov fibonacci.f_1 <- $vr1.u32ldc $vr2.s32 <- 1ldc $vr3.s32 <- 2mov fibonacci.i0 <- $vr3.s32mov fibonacci.i0 <- $vr6.s32
Fondazione Silvio Tronchetti Provera
TARGET ARCHITECTURES
STANDARD CORE(s)e.g. VLIW or Multiple Simple
RISC coresMAIN MEMORY
configurable core
Device 1
Device 2
Device N
?
?
Fondazione Silvio Tronchetti Provera
HCL-to-ASCL approach
High-level Compilation Layer (HCL)
ASCL (1) ASCL (2) ASCL (3)
Architecture Specific Compilation Layers
Fondazione Silvio Tronchetti Provera
CODE LIFE-CYCLEOverview of the logical steps: • Statements containing loops
• Statements containing high branch-rate• Series of associative operators• Recursive procedures
ranking
Fondazione Silvio Tronchetti Provera
CFG KERNEL FEATURES(1)
B1
B2
B1
B2
While loop
natural loop
self loop
B1
Loops:+-[1]: a:13,b:4,c:1+--+-[1]: a:14,b:5,c:4+--+--+-[1]: a:18,b:9,c:1+--+--+--+-[1]: a:19,b:10,c:9+--+--+--+--+-[1]: a:28,b:19,c:9+--+--+--+--+--+-[1]: a:37,b:28,c:9+--+--+--+--+--+--+-[1]: a:46,b:37,c:9+--+--+--+--+--+--+--+-[1]: a:55,b:46,c:9+--+--+--+--+--+--+--+--+-[1]: a:64,b:55,c:9+--+--+--+--+--+--+--+--+--+-[1]: a:73,b:64,c:9+--+--+--+--+--+--+--+--+--+--+-[1]: a:82,b:73,c:9+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:91,b:82,c:9+--+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:100,b:91,c:9+--+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:100,b:91,c:-508005+--+--+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:-508105,b:-508096,c:-9+--+--+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:-508105,b:-508096,c:105662236+--+--+--+--+--+--+--+--+--+--+--+--+-[3]: a:100,b:91,c:105154231+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:91,b:82,c:-666677360+--+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:-666677451,b:-666677442,c:-9+--+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:-666677451,b:-666677442,c:-1633047632+--+--+--+--+--+--+--+--+--+--+--+-[3]: a:91,b:82,c:1995242304+--+--+--+--+--+--+--+--+--+--+-[2]: a:82,b:73,c:-664419136+--+--+--+--+--+--+--+--+--+--+--+-[1]: a:-664419218,b:-664419209,c:-9+--+--+--+--+--+--+--+--+--+--+--+-[2]: a:-664419218,b:-664419209,c:-1672992513+--+--+--+--+--+--+--+--+--+--+-[3]: a:82,b:73,c:1957555647+--+--+--+--+--+--+--+--+--+-[2]: a:73,b:64,c:-694279600+--+--+--+--+--+--+--+--+--+--+-[1]: a:-694279673,b:-694279664,c:-9+--+--+--+--+--+--+--+--+--+--+-[2]: a:-694279673,b:-694279664,c:-608504996+--+--+--+--+--+--+--+--+--+-[3]: a:73,b:64,c:-1302784596+--+--+--+--+--+--+--+--+-[2]: a:64,b:55,c:-2022114044+--+--+--+--+--+--+--+--+--+-[1]: a:-2022114108,b:-2022114099,c:-9+--+--+--+--+--+--+--+--+--+-[2]: a:-2022114108,b:-2022114099,c:1586880895+--+--+--+--+--+--+--+--+-[3]: a:64,b:55,c:-435233149+--+--+--+--+--+--+--+-[2]: a:55,b:46,c:1993909992+--+--+--+--+--+--+--+--+-[1]: a:1993909937,b:1993909946,c:1993909937+--+--+--+--+--+--+--+--+-[2]: a:1993909937,b:1993909946,c:760083456+--+--+--+--+--+--+--+--+--+-[1]: a:-1233826481,b:-1233826490,c:-1233826481+--+--+--+--+--+--+--+--+--+-[2]: a:-1233826481,b:-1233826490,c:2050415960+--+--+--+--+--+--+--+--+-[3]: a:1993909937,b:1993909946,c:-1484467880+--+--+--+--+--+--+--+-[3]: a:55,b:46,c:509442112+--+--+--+--+--+--+-[2]: a:46,b:37,c:1364159424+--+--+--+--+--+--+--+-[1]: a:1364159378,b:1364159387,c:1364159378+--+--+--+--+--+--+--+-[2]: a:1364159378,b:1364159387,c:803784946+--+--+--+--+--+--+--+--+-[1]: a:-560374432,b:-560374441,c:-560374432+--+--+--+--+--+--+--+--+-[2]: a:-560374432,b:-560374441,c:182405152+--+--+--+--+--+--+--+-[3]: a:1364159378,b:1364159387,c:986190098+--+--+--+--+--+--+-[3]: a:46,b:37,c:-1944617774+--+--+--+--+--+-[2]: a:37,b:28,c:612444176+--+--+--+--+--+--+-[1]: a:612444139,b:612444148,c:612444139+--+--+--+--+--+--+-[2]: a:612444139,b:612444148,c:328475948+--+--+--+--+--+--+--+-[1]: a:-283968191,b:-283968200,c:-283968191+--+--+--+--+--+--+--+-[2]: a:-283968191,b:-283968200,c:-498404480+--+--+--+--+--+--+-[3]: a:612444139,b:612444148,c:-169928532+--+--+--+--+--+-[3]: a:37,b:28,c:442515644+--+--+--+--+-[2]: a:28,b:19,c:-1970456652+--+--+--+--+--+-[1]: a:-1970456680,b:-1970456671,c:-9+--+--+--+--+--+-[2]: a:-1970456680,b:-1970456671,c:1492064119+--+--+--+--+-[3]: a:28,b:19,c:-478392533+--+--+--+-[2]: a:19,b:10,c:-1542473120+--+--+--+--+-[1]: a:-1542473139,b:-1542473130,c:-9+--+--+--+--+-[2]: a:-1542473139,b:-1542473130,c:383838768+--+--+--+-[3]: a:19,b:10,c:-1158634352+--+--+-[2]: a:18,b:9,c:-1344615568+--+--+--+-[1]: a:-1344615586,b:-1344615577,c:-9+--+--+--+-[2]: a:-1344615586,b:-1344615577,c:-478122593+--+--+-[3]: a:18,b:9,c:-1822738161+--+-[2]: a:14,b:5,c:1618198881+--+--+-[1]: a:1618198867,b:1618198876,c:1618198867+--+--+-[2]: a:1618198867,b:1618198876,c:-1774379668+--+--+--+-[1]: a:902388761,b:902388752,c:9+--+--+--+-[2]: a:902388761,b:902388752,c:1563019408+--+--+--+--+-[1]: a:660630647,b:660630656,c:660630647+--+--+--+--+-[2]: a:660630647,b:660630656,c:479659548+--+--+--+--+--+-[1]: a:-180971099,b:-180971108,c:-180971099+--+--+--+--+--+-[2]: a:-180971099,b:-180971108,c:2017726440+--+--+--+--+-[3]: a:660630647,b:660630656,c:-1797581308+--+--+--+-[3]: a:902388761,b:902388752,c:-234561900+--+--+-[3]: a:1618198867,b:1618198876,c:-2008941568+--+-[3]: a:14,b:5,c:-390742687+-[2]: a:13,b:4,c:-329972872+--+-[1]: a:-329972885,b:-329972876,c:-9+--+-[2]: a:-329972885,b:-329972876,c:-1203135140+-[3]: a:13,b:4,c:-1533108012
RESULT: -1533108012
int f(int a, int b, int depth){
int i = 0; int c = a % b; if (c == 0) c++;
if((a < NUM_ROWS) && (a > 0)) c = f(c+a, c+b, depth+1);
c *= (a + 0xff) * ( b - 0xfa);
if (a > 0) c += f(c-a, c-b, depth+1); return c;
}
recursive functions:
i.e. the subgraph consisting the set of nodes containing B1 and all the nodes from which B2 can be reached without passing through B1.
Fondazione Silvio Tronchetti Provera
KERNEL FEATURES(2)
if(x > SB_NUM_COLUMN){ a += (x * 0xFF);}else{ if((x % 5) == 0){ a += x * __ALPHA__; } else{ a += x * __BETA__; }}
Nested Branch speculation
j = 0;while (j < NUM_COLUMNS){ for(i = 0; i < NUM_ROWS ; i++){ sum += m[i][j]; } j++;}
Associative operators Loop unrolling & tree execution
j = 0;while (j < NUM_COLUMNS){ for(i = 0; i < NUM_ROWS-8 ; i+=8){ sum += m[i][j]; sum += m[i+1][j]; sum += m[i+2][j]; sum += m[i+3][j]; sum += m[i+4][j]; sum += m[i+5][j]; sum += m[i+6][j]; sum += m[i+7][j]; } for(; i< NUM_ROWS;i += 1) { sum += m[i][j]; } j++;}
++
++
++
+
+
sum
a x__ALPHA__
__BETA__
>
SB_NUM_COLUMN*
5
0xFF
*
+==
0
+
*
+
%
Fondazione Silvio Tronchetti Provera
CODE “LINEARIZATION”
ldc $vr0.s32 <- 0mov main.from <- $vr0.s32ldc $vr1.s32 <- 10mov main.to <- $vr1.s32mov main.i <- main.fromldc $vr2.f32 <- "0.0"mov main.sum <- $vr2.f32ldc $vr3.s32 <- 10ldc $vr4.u32 <- 4
cal main.suif_tmp0 <- malloc($vr3.s32,$vr4.u32)
cvt $vr5.p32 <- main.suif_tmp0…
1-Input : source program (ANSI–C)
SUIF is exploited as front-end and the following operations are applied:
- dismantle field access expression to address arithmetic
- dismantle structured returns- compact multi-way branch statements- dismantle scope statements- dismantle if, for and while- flatten statement lists- rename colliding symbols- insert struct padding- insert struct final padding- annote LIR- Loop-unrolling + associative operators
marking.- s2m
output: annoted LIR in MACHINE-SUIF assembly
- Tail-recursion elimination
…if (from <= to ){ do{ amp = amplitude[i]; if (amp < 0) sum = sum - amp * SFACTOR; else sum = sum + amp * SFACTOR; for(j = 0; j <= i; j++){ *(part_amplitude+i) += amplitude[j]; } i++; }while (i <= to);…
Fondazione Silvio Tronchetti Provera
2-Input : LIR in MACHINE-SUIF assemblyMACHSUIF is exploited to get and manipulate CFG and DFG representation of the body of each procedure.
- il2cfg : proc-CFG graph is obtained- structural Analysis (cf-analysis & df-analysis)
- kernel identification and ranking- Kernel translation to SSA representation- analysis and optimizations (architecture dependent)
output: SSA representation of the extracted kernel
INTERMEDIATE REPRESENTATION
Fondazione Silvio Tronchetti Provera
MAPPING
3-Input: kernel SSA representation+ Machine code for standard core
- DFG “refinement” in function of
• physical resources of the programmable core
• efficiency of the computation obtained by the mapped circuit.
- Mapping activity definition
• When does a mapping begin? • How long is it ? • Is the mapping static or dynamic ? • Is its employment fitting?
Architectureconstraints
Fondazione Silvio Tronchetti Provera
C to SUIF
LIR / UNROLL
MACHINE-SUIF
CFG
STRUCTURAL ANALYSIS
ANSI-C
SUIF
(unrolled) Machine-SUIF
CFG (for each procedure)
PRE-ANALYSIS
BACK-END
annoted CFG, bit-vector results, ctrl-tree
Associative operator analysis
-Variable tracing-Structural Analysis initialization
-Region kernel analysis-Recursive procedure analysis-Kernel marking
High-level Compilation Layer (HCL)
Tail recursion elimination
Fondazione Silvio Tronchetti Provera
STRUCTURAL ANALYSIS IMPLEMENTATION
Cfg function 1
Cfg function 2
Cfg function 3
Cfg function n
B1
B2
B3
entry
exit
self-loop
block
if-then
block
Kernel identification
Kernel identification
Fondazione Silvio Tronchetti Provera
DF - STRUCTURAL ANALYSIS
FORWARD problems:
B3entry(b) exit
entry(c)
B2(a)entry(a)
B1 B2entry
Top down pass:Bottom up pass:
• Fentry(c)
• Fentry(b)
• Fentry(a),FB2(a)
• In(entry(c))
• In(entry(b)),In(B3),In(exit)
• In(entry(a)),In(B2(a))
• In(entry),In(B1),In(B2)
Flow functions:
BACKWARD problems:
• Out(entry(c))
• Out(entry(b)),Out(B3),Out(exit)
• Out(entry(a)),Out(B2(a))
• Out(entry),Out(B1),Out(B2)
• FB1, FB2, FB3, Fentry
• Fself-loop = Fbody* = Fbody in(body) = Fbody(in(body))
• Fif-then = (Fthen ° FifY) FifN = (Fthen°Fif) Fif
• Fblock = Fb0 ° Fb1 ° … ° Fbn
Fondazione Silvio Tronchetti Provera
REGIONS & FLOW FUNCTIONS(1)
Fdo-While= (Fdo Fwhile )*
Implementation
Floop (in(do))= Fwhile (Fdo(in(do)))
OUT(Do-while)= Floop (Floop (in(do)))
Do-while
FWhile-loop = (Fwhile Fbody )*
Implementation
Floop(in(while))) = Fbody(Fwhile (in(while)))
out(While-loop)= Fwhile (in(while))
Fwhile (Floop (in(while))) Fwhile (Floop (Floop (in(while))))
While-loop
FIf-then-else = Fif Fthen Fif Felse
Implementation
in(THEN)= Fif (in(If-then-else))
in(ELSE)= Fif (in(If-then-else))
out(If-then)= Fthen(in(THEN)) Felse(in(ELSE))
If-then-else
FIf-then, = (Fif Fthen) Fif
Implementation
in(THEN)= Fif (in(If-then))
out(If-then)= Fthen(in(THEN)) in(Then)
If-then
IF
THEN
IF
THEN ELSE
WHILE
BODY
DO
WHILE
Fondazione Silvio Tronchetti Provera
REGIONS & FLOW FUNCTIONS(2)
FNl-return= (Fdo Fbranch Fbody)*
Implementation
Floop(in(while))) = Fbody (Fbranch( (Fwhile (in(while))))
out(Natural-loop)= Fwhile(in(while))
Fwhile(Floop (in(while))) Fwhile(Floop (Fwhile(Floop(in(while)))))
Natural-loop (Return)
FNl-break= (Fdo Fbranch Fbody)*
Implementation
Floop(in(while))) = Fbody (Fbranch( (Fwhile (in(while))))Floop-branch(in(while))= Fbranch( (Fwhile(in(while)))
out(Natural-loop)= Fwhile(in(while))
Floop-branch(in(while))
Fwhile(Floop (in(while))) Floop-branch(Fwhile(Floop(in(while))))
Fwhile(Floop (Floop(in(while))))
Natural-loop (break)
WHILE
BODY
BRANCH
POST LOOP
WHILE
BODY
BRANCH
POST LOOP
EXIT
Fondazione Silvio Tronchetti Provera
REGIONS & FLOW FUNCTIONS(3)
FBlock = (Fb1 Fb2 … Fbn)
Implementation
out(BLOCK)= Fb1 (Fb2(…Fbn(in(B1))…))
block
FNl-break= (Fdo Fbranch Fbody)*
Implementation
Floop(in(while))) = Fbody (Fbranch( (Fwhile (in(while))))
Floop-branch(in(while))=Fbranch( (Fwhile (in(while)))
out(Natural-loop)= Fwhile(in(while))
Fwhile (Floop-branch (in(while)))
Fwhile (Floop (in(while))) Fwhile(Floop-branch (Fwhile (Floop-branch
(in(while))))) Fwhile(Floop-branch (Fwhile (Floop
(in(while))))) Fwhile(Floop (Fwhile (Floop-branch
(in(while))))) Fwhile(Floop (Fwhile (Floop
(in(while)))))
Natural-loop
(continue)
WHILE
BODY
BRANCH
POST LOOP
B2
Bn
B1
WHILE
BODY
BRANCH
POST LOOP
INCREMENT
WHILE
IF-THEN-ELSEPOST LOOP
POST LOOP
WHILE-LOOP
Fondazione Silvio Tronchetti Provera
The last pass is essentially the back-end for the ASCLs. It provides the user with the possibility to identify the desired properties and to append annotations on the elements of a control tree at different resolutions:
• regions (i.e. sub-trees of the control tree) • sets of basic block • set of instructions.
HCL BACK-END
A region oriented query on control-tree has the following structure:
mark_region(Region * current, RegionType type, Policy p){
if(current->get_region_type() == type) if(p.check_properties(current)) if (p.instruction_level_analysis(current)){ p.annote_kernel(current); return; }for (cir IN current inner regions) if(!cir->isWrapper()){ mark_region(cir);}
}
KERNEL I/O INTERFACE: • Input:
USES vectors determines univocally the input interface.
• Output:(GEN (USES immediately after
current kernel))
a
IN USES[i] = <0101 …>
OUT GEN[i](USES[a]USES[b])
b
Fondazione Silvio Tronchetti Provera
EXAMPLE
0
3
8
1
2 4
5
97 6
int fibonacci(int m);
int main(void){ int x; for (x = 1; x < 10; x++){ printf("\n - fibonacci(%d) = %d",x,fibonacci(x)); }}
int fibonacci(int m){
unsigned int f_0 = 0; unsigned int f_1 = 1; unsigned int f_2, i; if (m <= 1){ return m; } else{ for (i=2; i <= m; i++){ f_2 = f_0 + f_1; f_0 = f_1; f_1 = f_2; } return f_2; }}
Fondazione Silvio Tronchetti Provera
Example: fibonacci.cint fibonacci(int m);
int main(void){ int x; for (x = 1; x < 10; x++){ printf("\n - fibonacci(%d) = %d",x,fibonacci(x)); }}
int fibonacci(int m){
unsigned int f_0 = 0; unsigned int f_1 = 1; unsigned int f_2, i; if (m <= 1){ return m; } else{ for (i=2; i <= m; i++){ f_2 = f_0 + f_1; f_0 = f_1; f_1 = f_2; } return f_2; }}
Fondazione Silvio Tronchetti Provera
Example: fibonacci.cfg**** Node # 0: p s 1 3i **** Node # 3: ubr p 0i s 8 instruction [0] : jmp src: dst: **** Node # 8: ret p 3 s 9 instruction [0] : src: dst: instruction [1] : ldc src: 0, dst: $vr10.s32, instruction [2] : ret src: $vr10.s32, dst: **** Node # 1: cbr p 0 s 2 4 instruction [0] : src: dst: instruction [1] : ldc src: 0, dst: $vr0.u32, instruction [2] : mov src: $vr0.u32, dst: fibonacci.f_0, instruction [3] : ldc src: 1, dst: $vr1.u32, instruction [4] : mov src: $vr1.u32, dst: fibonacci.f_1, instruction [5] : ldc src: 1, dst: $vr2.s32, instruction [6] : bgt src: fibonacci.m, $vr2.s32, dst: **** Node # 4: p 1 s 5 instruction [0] : src: dst: instruction [1] : ldc src: 2, dst: $vr3.u32, instruction [2] : mov src: $vr3.u32, dst: fibonacci.i, **** Node # 5: cbr p 4 6 s 6 7 instruction [0] : src: dst: instruction [1] : cvt src: fibonacci.m, dst: $vr4.u32, instruction [2] : bgt src: fibonacci.i, $vr4.u32, dst: **** Node # 7: ret p 5 s 9 instruction [0] : src: dst: instruction [1] : cvt src: fibonacci.f_2, dst: $vr9.s32, instruction [2] : ret src: $vr9.s32, dst: **** Node # 6: ubr p 5 s 5 instruction [0] : add src: fibonacci.f_0, fibonacci.f_1, dst: $vr5.u32, instruction [1] : mov src: $vr5.u32, dst: fibonacci.f_2, instruction [2] : mov src: fibonacci.f_1, dst: fibonacci.f_0, instruction [3] : mov src: fibonacci.f_2, dst: fibonacci.f_1, instruction [4] : ldc src: 1, dst: $vr8.s32, instruction [5] : cvt src: $vr8.s32, dst: $vr7.u32, instruction [6] : add src: fibonacci.i, $vr7.u32, dst: $vr6.u32, instruction [7] : mov src: $vr6.u32, dst: fibonacci.i, instruction [8] : jmp src: dst: **** Node # 2: ret p 1 s 9 instruction [0] : ret src: fibonacci.m, dst:
**** Node # 9: p 2 7 8 s
0
3
8
1
2 4
5
97 6
Fondazione Silvio Tronchetti Provera
Example: fibonacci structural analysis
+[root]-> 0 1 9 |+--- (1)[if then else] ->1 2 4 |+---+---+--- (4)[block] ->4 5 7 |+---+---+---+---+---+--- (5)[while loop] ->5 6
01
2 4
5
97 6
01
2 4
5
97
01
2 4
9
0
1
9
root
root
0 91
1 2 4
4 5 7
5 6
Control tree:
Control tree construction:
1) 2) 3) 4) 5)
Fondazione Silvio Tronchetti Provera
Pre-analysis info
GEN,PRSV,REACH-IN GEN BIT VECTOR node(0), length 1 : {0} GEN BIT VECTOR node(3), length 1 : {} GEN BIT VECTOR node(8), length 1 : {} GEN BIT VECTOR node(1), length 3 : {1-2} GEN BIT VECTOR node(4), length 4 : {3} GEN BIT VECTOR node(5), length 4 : {} GEN BIT VECTOR node(7), length 4 : {} GEN BIT VECTOR node(6), length 8 : {4-7} GEN BIT VECTOR node(2), length 8 : {} GEN BIT VECTOR node(9), length 8 : {}
PRSV BIT VECTOR node(0): {0-7} PRSV BIT VECTOR node(3): {0-7} PRSV BIT VECTOR node(8): {0-7} PRSV BIT VECTOR node(1): {0,3-4,7} PRSV BIT VECTOR node(4): {0-2,4-6} PRSV BIT VECTOR node(5): {0-7} PRSV BIT VECTOR node(7): {0-7} PRSV BIT VECTOR node(6): {0} PRSV BIT VECTOR node(2): {0-7} PRSV BIT VECTOR node(9): {0-7}
REACH-IN bit-vector, node (0) : {} REACH-IN bit-vector, node (1) : {0} REACH-IN bit-vector, node (4) : {0-2} REACH-IN bit-vector, node (5) : {0-7} REACH-IN bit-vector, node (7) : {0-7} REACH-IN bit-vector, node (6) : {0-7} REACH-IN bit-vector, node (2) : {0-2} REACH-IN bit-vector, node (9) : {0-7}
USES bit-vector, node (0) : {} USES bit-vector, node (1) : {0} [m] USES bit-vector, node (4) : {} USES bit-vector, node (5) : {0,3,7} [m][i][i] USES bit-vector, node (7) : {4} [f_2] USES bit-vector, node (6) : {1-7} [f_0][f_1][i][f_2][f_0][f_1][i] USES bit-vector, node (2) : {0} [m] USES bit-vector, node (9) : {}
0
3
8
1
2 4
5
97 6
Fondazione Silvio Tronchetti Provera
KERNEL MARKED[BASIC-BLOCK] good **** Node # 6: ubr p 5
s 5 [+] instruction [0] : add src: , , dst: , | +---- annote : line +---- annote : *** while_instruction +---- annote : RCHIN +---- annote : USES +---- annote : GEN
[+] instruction [1] : mov src: , dst: , | +---- annote : line +---- annote : *** while_instruction
[+] instruction [2] : mov src: , dst: , | +---- annote : line +---- annote : *** while_instruction
[+] instruction [3] : mov src: , dst: , | +---- annote : line +---- annote : *** while_instruction
[+] instruction [4] : ldc src: , dst: , | +---- annote : *** while_instruction
[+] instruction [5] : add src: , , dst: , | +---- annote : *** while_instruction
[+] instruction [6] : mov src: , dst: , | +---- annote : *** while_instruction
0
3
8
1
2 4
5
97 6
Fondazione Silvio Tronchetti Provera
iquant1_non_intra_fixed
Control tree: [root]-> 0 2 25 +--- (0)[block] ->0 1 +--- (2)[while loop] ->2 3 +---+---+--- (3)[block] ->3 18 24 +---+---+---+---+--- (3)[if then] ->3 4 +---+---+---+---+---+--- (4)[block] ->4 10 13 +---+---+---+---+---+---+---+--- (4)[block] ->4 7 +---+---+---+---+---+---+---+---+--- (4)[if then else] ->4 5 6 +---+---+---+---+---+---+---+---+--- (7)[if then else] ->7 8 9 +---+---+---+---+---+---+---+--- (10)[if then] ->10 11 +---+---+---+---+---+---+---+---+---+--- (11)[if then] ->11 12 +---+---+---+---+---+---+---+--- (13)[if then] ->13 14 +---+---+---+---+---+---+---+---+---+---+--- (14)[block] ->14 17 +---+---+---+---+---+---+---+---+---+---+---+---+--- (14)[if then else] ->14 15 16 +---+---+---+---+--- (18)[if then else] ->18 19 20 +---+---+---+---+---+---+--- (20)[block] ->20 23 +---+---+---+---+---+---+---+---+---+--- (20)[if then else] ->20 21 22 +--- (25)[block] ->25 26
static void iquant1_non_intra_fixed(si16_t *src, si16_t *dst, ui8_t *quant_mat, si32_t mquant){ si32_t i, val, pro;
for (i=0; i<64; i++) { val = src[i]; if (val!=0)
{ pro = (2*val+(val>0 ? 1 : -1))*quant_mat[i]*mquant; val = (pro>=0) ? (si32_t) (pro>>5) : (si32_t) ((pro + 31)>>5); /* mismatch control */ if ((val&1)==0 && val!=0) val+= (val>0) ? -1 : 1;}
/* saturation */ dst[i] = (val>2047) ? 2047 : ((val<-2048) ? -2048 : val); }}
0
0 2 25
2
3
3
3 18 24
20 23
20 21 22
4
4
10
10
13
13
4
4 5 6
7
7 8 9 11 12
11
17
14
14
14 15 16
18 19 20
25 260 1
Fondazione Silvio Tronchetti Provera
PiCoGa & Griffy-C:
• Computation is broken in DFG NODES, whose complexity is at most a 32-bit ANSI-C standard operator.
• For each DFG node it is possible to specify the SIZE in order to avoid waste of unneeded PiCoGA resources.
• shifts, concatenation and bitwise operators by constants are considered routing-only operators, as they do not require RLCs for implementation.
CASE STUDY: XiRisc+PiCoGa and Griffy-C
Fondazione Silvio Tronchetti Provera
C TO GRIFFY-C
C to SUIF
LIR
MACHINE-SUIF
CFG
STRUCTURAL ANALYSIS
KERNEL IDENTIFICATION
• innermost while-region;
• “PiCoGa basic block” marking;
• selection of while-region sub-trees containing only PiCoGa Basic Block;
• Kernel ranking
1
2
3
KERNEL EXTRACTION
PiCoGa Kernel translation• SSA representation
• Cti Cmove replacementGRIFFY–C COMPILER
Fondazione Silvio Tronchetti Provera
Example: fibonacci.lir(1)/* Architecture is Linux */typedef int (*__cp_1) (void);typedef int (*__cp_2) ();typedef int (*__cp_3) (int );typedef char __ar_4[24];typedef int (*__cp_5) (int );
int main(void);extern int printf();int fibonacci(int );
static __ar_4 _fibonacciTmp0 = {10, 32, 45, 32, 102, 105, 98, 111, 110, 97, 99, 99, 105, 40, 37, 100, 41, 32, 61, 32, 32, 37, 100, 0};
int main(void){ int x; int suif_tmp00; x = (1); L3:; if ((! (x< 10))) goto L4; suif_tmp00 = fibonacci(x); (void)printf(_fibonacciTmp0, x, suif_tmp00); x = ((x+1)); goto L3; L4:; /*#line 8 ""*/ return 0 ; }
Fondazione Silvio Tronchetti Provera
Example: fibonacci.lir(2)int fibonacci(int m){ unsigned int f_0; unsigned int f_1; unsigned int f_2; unsigned int i; /*#line 11 "./test/fibonacci/fibonacci.c"*/ f_0 = (0U); /*#line 11 "./test/fibonacci/fibonacci.c"*/ f_1 = (1U); if ((! (m<= 1))) goto L2; /*#line 18 "./test/fibonacci/fibonacci.c"*/ return m ; L2:; i = (2U); L5:; if ((! (i<= ((unsigned int )(m ))))) goto L6;
/*#line 22 "./test/fibonacci/fibonacci.c"*/ f_2 = ((f_0+f_1)); /*#line 23 "./test/fibonacci/fibonacci.c"*/ f_0 = (f_1); /*#line 24 "./test/fibonacci/fibonacci.c"*/ f_1 = (f_2); i = ((i+((1 )))); goto L5; L6:; /*#line 26 "./test/fibonacci/fibonacci.c"*/ return ((int )(f_2 )) ; return 0 ; }
Fondazione Silvio Tronchetti Provera
Example: fibonacci.svm(1)/* target_lib: "suifvm"] *//* Generated automatically by Machine SUIF */
#include <suifvm/c_printer_defs.h>
int main();int printf();int fibonacci();static char _fibonacciTmp0[24];
static char _fibonacciTmp0[24] =
{10, 32, 45, 32, 102, 105, 98, 111, 110, 97, 99, 99, 105, 40, 37, 100, 41, 32, 61, 32, 32, 37, 100, 0};
int main() {int x;int suif_tmp00;
/* Virtual register declarations */int _vr0;int _vr1;void * _vr2;int _vr3;int _vr4;int _vr5;
_vr0 = 1; x = _vr0;L3: _vr1 = 10; if (x >= _vr1) goto L4; suif_tmp00 = fibonacci(x); _vr2 = (void *)&_fibonacciTmp0; printf(_vr2, x, suif_tmp00); _vr4 = 1; _vr3 = x + _vr4; x = _vr3; goto L3;L4: _vr5 = 0; return _vr5;} /* end of main */
Fondazione Silvio Tronchetti Provera
Example: fibonacci.svm(2)intfibonacci(int m){unsigned int f_0;unsigned int f_1;unsigned int f_2;unsigned int i;
/* Virtual register declarations */unsigned int _vr0;unsigned int _vr1;int _vr2;unsigned int _vr3;unsigned int _vr4;unsigned int _vr5;unsigned int _vr6;unsigned int _vr7;int _vr8;int _vr9;int _vr10;
_vr0 = 0;f_0 = _vr0;_vr1 = 1;f_1 = _vr1;_vr2 = 1;if (m > _vr2) goto L2;return m;goto L1;
L2:_vr3 = 2;i = _vr3;
L5:_vr4 = (unsigned int)m;if (i > _vr4) goto L6;_vr5 = f_0 + f_1;f_2 = _vr5;f_0 = f_1;f_1 = f_2;_vr8 = 1;_vr7 = (unsigned int)_vr8;_vr6 = i + _vr7;i = _vr6;goto L5;
L6:_vr9 = (int)f_2;return _vr9;
L1:_vr10 = 0;return _vr10;
} /* end of fibonacci */
Fondazione Silvio Tronchetti Provera
0
0 2 25
2
3
3
3 18 24
20 23
20 21 22
4
4
10
10
13
13
4
4 5 6
7
7 8 9 11 12
11
17
14
14
14 15 16
18 19 20
25 260 1