transport triggered architectures used for embedded systems henk corporaal ee department delft univ....
TRANSCRIPT
![Page 1: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/1.jpg)
Transport Triggered Architectures used for Embedded Systems
Henk Corporaal
EE department
Delft Univ. of Technology
http://cs.et.tudelft.nl
International Symposium onNEW TRENDS IN
COMPUTER ARCHITECTURE Gent, Belgium
December 16, 1999
![Page 2: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/2.jpg)
Gent, December 19992
Topics
MOVE project goals Architecture spectrum of solutions From VLIW to TTA Code generation for TTAs Mapping applications to processors Achievements TTA related research
![Page 3: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/3.jpg)
Gent, December 19993
MOVE project goals Remove bottlenecks of current ILP processors Tools for quick processor and system design; offer
expertise in a package Application driven design process Exploit ILP to its limits (but not further !!) Replace hardware complexity with software complexity as
far as possible Extreme functional flexibility Scalable solutions Orthogonal concept (combine with SIMD, MIMD, FPGA
function units, ... )
![Page 4: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/4.jpg)
Gent, December 19994
Architecture design spectrumFour dimensional architecture design space: I,O,D,SS = freq (op) lt(op)
Four dimensional architecture design space: I,O,D,SS = freq (op) lt(op)
Operations/instruction ‘O’
Instructions/cycle ‘I’
Data/operation ‘D’
Superpipelining degree ‘S’
(1,1,1,1)
VLIW
Superpipelined
RISC
SIMD
Superscalar DataflowCISC
(MOVE design space)
![Page 5: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/5.jpg)
Gent, December 19995
Architecture design spectrumArchitecture I O D S Mpar
CISC 0.2 1.2 1.1 1 0.26
RISC 1 1 1 1.2 1.2
VLIW 1 10 1 1.2 12
Superscalar 4 1 1 1.2 4.8
Superpipelined 1 1 1 3 3
Vector 0.1 1 64 5 32
SIMD 1 1 128 1.2 154
MIMD 32 1 1 1.2 38
Dataflow 10 1 1 1.2 12
Mpar is the amount of parallelism to be exploited by the compiler / application !Mpar is the amount of parallelism to be exploited by the compiler / application !
![Page 6: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/6.jpg)
Gent, December 19996
Architecture design spectrum
Which choice: I,O,D,or S ? A few remarks: I: instructions / cycle
Superscalar / dataflow: limited scaling due to complexity
MIMD: do it yourself
O: operations / instruction VLIW: good choice if binary compatibility not an
issue Speedup for all types of applications
![Page 7: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/7.jpg)
Gent, December 19997
Architecture design spectrum D: data/operation
SIMD / Vector: application has to offer this type of parallelism
may be good choice for multimedia
S: pipelining degree Superpipelined: cheap solution however, operation latencies may become dominant unused delay slots increase
MOVE project initially concentrates on O and S
![Page 8: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/8.jpg)
Gent, December 19998
From VLIW to TTA
VLIW Scaling problems
number of ports on register file bypass complexity
Flexibility problems can we plug in arbitrary functionality ?
TTA: reverse the programming paradigm template characteristics
![Page 9: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/9.jpg)
Gent, December 19999
From VLIW to TTA
General organization of a VLIW
Inst
ruct
ion
mem
ory
Inst
ruct
ion
fetc
h un
it
Inst
ruct
ion
deco
de u
nit
FU-1
FU-2
FU-3
FU-4
FU-5
Reg
iste
r fi
le
Dat
a m
emor
y
CPU
Byp
assi
ng n
etw
ork
![Page 10: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/10.jpg)
Gent, December 199910
From VLIW to TTAStrong points of VLIW:
Scalable (add more FUs) Flexible (an FU can be almost anything)
Weak points: With N FUs:
Bypassing complexity: O(N2) Register file complexity: O(N) Register file size: O(N2)
Register file design restricts FU flexibility
Solution: mirror programming paradigm
![Page 11: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/11.jpg)
Gent, December 199911
Transport Triggered Architecture
General organization of a TTAIn
stru
ctio
n m
emor
y
Inst
ruct
ion
fetc
h un
it
Inst
ruct
ion
deco
de u
nit
FU-1
FU-2
FU-3
FU-4
FU-5
Reg
iste
r fi
le
Dat
a m
emor
y
CPU
Byp
assi
ng n
etw
ork
![Page 12: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/12.jpg)
Gent, December 199912
TTA structure; datapath details
integer RF
float RF
boolean RF
instruct. unit
immediate unit
load/store unit
integer ALU
float ALU
integer ALU
load/store unit
Socket
![Page 13: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/13.jpg)
Gent, December 199913
TTA characteristicsHardware Modular: Lego play tool generator Very flexible and scalable
easy inclusion of Special Function Units (SFUs) Low complexity
50% reduction on # register ports reduced bypass complexity (no associative matching) up to 80 % reduction in bypass connectivity trivial decoding reduced register pressure
![Page 14: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/14.jpg)
Gent, December 199914
Register pressure
12
34
5
12
34
51.00
1.50
2.00
2.50
3.00
3.50
ILP
de
gre
e
Read portsWrite ports
Read and write ports required
![Page 15: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/15.jpg)
Gent, December 199915
TTA characteristics
SoftwareA traditional Operation-triggered instruction:
mul r1, r2, r3
A Transport-triggered instruction:
r3 mul.o, r2 mul.t, mul.r r1
Extra scheduling optimizations However: More difficult to schedule !
![Page 16: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/16.jpg)
Gent, December 199916
Code generation trajectory
Application (C)
Compiler frontend
Sequential code
Compiler backend
Parallel code
Sequential simulation
Parallel simulation
Arc
hite
ctur
e de
scri
ptio
n
Profiling data
Input/Output
Input/Output
• Frontend: GCC or SUIF (adapted)
• Frontend: GCC or SUIF (adapted)
![Page 17: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/17.jpg)
Gent, December 199917
TTA compiler characteristics
Handles all ANSI C programs Region scheduling scope with speculative
execution Using profiling Software pipelining Predicated execution (e.g. for stores) Multiple register files Integrated register allocation and scheduling Fully parametric
![Page 18: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/18.jpg)
Gent, December 199918
Code generation for TTAs
TTA specific optimizations common operand elimination software bypassing dead result move elimination scheduling freedom of T, O and R
Our scheduler (compiler backend) exploits these advantages
![Page 19: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/19.jpg)
Gent, December 199919
TTA specific optimizations
Bypassing can eliminate the need of RF accesses
Example: r1 -> add.o, r2 -> add.t; add.r -> r3; r3 -> sub.o, r4 -> sub.t sub.r -> r5;
Translates into: r1 -> add.o, r2 -> add.t; add.r -> sub.o, r4 -> sub.t; sub.r -> r5;
![Page 20: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/20.jpg)
Gent, December 199920
Mapping applications to processors
We have described a Templated architecture Parametric compiler exploiting specifics of the
template
Problem:
How to tune a processor architecture for a certain application domain?
![Page 21: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/21.jpg)
Gent, December 199921
Mapping applications to processors
Architectureparameters
OptimizerOptimizer
Parametric compilerParametric compiler Hardware generatorHardware generator
feedbackfeedback
Userintercation
Parallel object code chip
Pareto curve(solution space)
cost
exec
. tim
e
x
x
x
x
xx
x
xx
x
x
x
x
x
x
xx x
x
x
Move framework
![Page 22: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/22.jpg)
Gent, December 199922
Achievements within the MOVE project Transport Triggered Architecture (TTA) template
lego playbox toolkit Design framework almost operational
you may add your own ‘strange’ function units (no restrictions) Several chips have been designed by TUD and Industry; their
applications include Intelligent datalogger Video image enhancement (video stretcher) MPEG2 decoder Wireless communication
![Page 23: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/23.jpg)
Gent, December 199923
Video stretcher board containing TTA
![Page 24: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/24.jpg)
Gent, December 199924
Intelligent datalogger• mixed signal• special FUs• on-chip RAM and ROM• operates stand alone• core generated automatically• C compiler
![Page 25: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/25.jpg)
Gent, December 199925
TTA related research
RoD: registers on demand scheduling SFUs: pattern detection CTT: code transformation tool Multiprocessor single chip embedded systems Global program optimizations Automatic fixed point code generation ReMove
![Page 26: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/26.jpg)
Gent, December 199926
RoD: Register on Demand scheduling
![Page 27: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/27.jpg)
Gent, December 199927
Phase ordering problem: scheduling allocation Early register assignment
Introduces false dependencies Bypassing information not available
Late register assignment Span of live ranges likely to increase which leads to
more spill code Spill/reload code inserted after scheduling which
requires an extra scheduling step Integrated with the instruction scheduler: RoD
More complex
![Page 28: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/28.jpg)
Gent, December 199928
RoD 4 -> add.o, x -> add.t, add.r-> y;4 -> add.o, x -> add.t, add.r-> y;r0 -> sub.o, y -> sub.t, sub.r -> z;r0 -> sub.o, y -> sub.t, sub.r -> z;
4 -> add.o r1-> add.t4 -> add.o r1-> add.t
4 -> add.o r1 -> add.t4 -> add.o r1 -> add.tadd.r -> r1add.r -> r1
4-> add.o r1 -> add.t4-> add.o r1 -> add.tadd.r -> sub.tadd.r -> sub.t
4-> add.o r1 -> add.t4-> add.o r1 -> add.tadd.r -> sub.t r0 -> sub.oadd.r -> sub.t r0 -> sub.osub.r -> r7sub.r -> r7
RRTsSchedule
r0r0
r0 r0
r0r0
r0r0
r0 r0
r0, r1r0, r1
r0r0
r7r7
step 1.step 1.
step 2.step 2.
step 3.step 3.
step 4.step 4.
step 5.step 5.
![Page 29: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/29.jpg)
Gent, December 199929
Spilling Occurs when the number of simultaneously live
variables exceeds the number of registers
Contents of variables are stored in memory
The impact on the performance due to the insertion of extra code must be as small as possible
![Page 30: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/30.jpg)
Gent, December 199930
Spilling
def r1def r1store r1
use r1load r1use r1
def y
use xuse y
def x
![Page 31: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/31.jpg)
Gent, December 199931
Spilling Operation to schedule:
x -> sub.o, r1 -> sub.t; sub.r -> r3;
Code after spill code insertion: Bypassed code:
4 -> add.o, fp -> add.t; 4 -> add.o, fp -> add .o;add.r -> z; add.r -> ld.t;z -> ld.t; ld.r -> sub.o, r1 -> sub.t;ld.r -> x; sub.r -> r3;x -> sub.o, r1 -> sub.t;sub.r -> r3;
![Page 32: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/32.jpg)
Gent, December 199932
RoD compared with early assignment
32 24 20 16 12 10-5
0
5
10
15
20
25
30
35
32 24 20 16 12 10
a68bisoncompressdhrystonegzipsievesortsumuniqwcaverage
Number of registersNumber of registers
Spee
dup
of R
oD[%
]Sp
eedu
p of
RoD
[%]
![Page 33: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/33.jpg)
Gent, December 199933
RoD compared with early assignment
0
4
8
12
16
20
24
12 16 20 24 28 32
RoD
early assignment
Number of registers
cycl
e co
unt i
ncre
ase[
%]
cycl
e co
unt i
ncre
ase[
%]
Impact of decreasing number of registers
![Page 34: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/34.jpg)
Gent, December 199934
Special Functionality: SFUs
![Page 35: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/35.jpg)
Gent, December 199935
Mapping applications to processors
SFUs may help ! Which one do I need ? Tradeoff between costs and performance
SFU granularity ? Coarse grain: do it yourself (profiling helps)
Move framework supports this Fine grain: tooling needed
![Page 36: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/36.jpg)
Gent, December 199936
SFUs: fine grain patterns
Why using fine grain SFUs: code size reduction register file #ports reduction could be cheaper and/or faster transport reduction power reduction (avoid charging non-local wires)
Which patterns do need support? Detection of recurring operation patterns needed
![Page 37: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/37.jpg)
Gent, December 199937
SFUs: Pattern identification
Method: Trace analysis Built DDG Create pattern library on demand Fusing partial matches into complete matches
![Page 38: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/38.jpg)
Gent, December 199938
SFUs: fine grain patterns
General pattern & subject graph multi-output non-tree operand and operation nodes
![Page 39: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/39.jpg)
Gent, December 199939
SFUs: covering results
![Page 40: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/40.jpg)
Gent, December 199940
SFUs: top-10 patterns (2 ops)
![Page 41: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/41.jpg)
Gent, December 199941
SFUs: conclusions
Most patterns are: multi-output and not tree like Patterns 1, 4, 6 and 8 have implementation
advantages 20 additional 2-node patterns give 40% reduction
(in operation count) Group operations into classes for even better
results
Now: scheduling for these patterns? How?
![Page 42: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/42.jpg)
Gent, December 199942
Source-to-Source transformations
![Page 43: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/43.jpg)
Gent, December 199943
Design transformationsSource-to-source transformations CTT: code transformation tool
GUILibrary oftransformations
Input Csources
Output Csources
CTT
![Page 44: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/44.jpg)
Gent, December 199944
Transformation example: loop embedding
....for (i=0;i<100;i++){
do_something();}....void do_something() { procedure body}
....for (i=0;i<100;i++){
do_something();}....void do_something() { procedure body}
....do_something2();....void do_something2() { int i; for (i=0;i<100;i++){ procedure body }}
....do_something2();....void do_something2() { int i; for (i=0;i<100;i++){ procedure body }}
![Page 45: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/45.jpg)
Gent, December 199945
Structure of transformation
PATTERN { description of the code selection stage}
CONDITIONS { additional constraints}
RESULT { description of the new code}
PATTERN { description of the code selection stage}
CONDITIONS { additional constraints}
RESULT { description of the new code}
![Page 46: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/46.jpg)
Gent, December 199946
Implementation
Transformations
IR
IR
Inputsources
IR
Outputsources
SUIFfront-end
SUIFfront-end
SUIFlinker
CodeTransformationEngine
s2c
IRCTT
![Page 47: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/47.jpg)
Gent, December 199947
Experimental results
Loop peeling. Index set splitting. Loop reversal. Loop skewing.
Loop fusion. Wave fronting. Inlining. Loop fission.
Strip mining. Code sinking. Unswitching. Loop embedding
and extraction.
Could transform 39 out of 45 SIMD loops (in a set of 9 DSP benchmarks and MPEG)
Can handle transformations like:
![Page 48: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/48.jpg)
Gent, December 199948
Partitioning your program for Multiprocessor single chip
solutions
![Page 49: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/49.jpg)
Gent, December 199949
RAM I/O TPU
core core core
sfu1 sfu2 sfu1 sfu1 sfu2
sfu3
Asip1 Asip2 Asip3
RAM RAM
Multiprocessor embedded system
An ASIP based heterogeneous multiprocessor How to partition and map your application? Splitting threads
![Page 50: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/50.jpg)
Gent, December 199950
Design transformations
Why splitting threads?
Combine fine (ILP) and coarse grain parallelism Avoid ILP bottleneck Multiprocessor solution may be cheaper
More efficient resource use Wire delay problem clustering needed !
![Page 51: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/51.jpg)
Gent, December 199951
Experimental results of partitioner
0
2
4
6
8
10
12
14
16
18
Sp
eed
up
Benchmark
1 proc 2 procs 3 procs 4 procs
![Page 52: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/52.jpg)
Gent, December 199952
Instant frequency tracking example
![Page 53: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/53.jpg)
Gent, December 199953
Global program optimizations
![Page 54: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/54.jpg)
Gent, December 199954
Traditional compilation path
Compiler output is textual, i.e. assembly loss of source-level
information. The object code defines
the program’s memory layout. efficient binary
representation, but not suitable for code
transformations.
compilersource
file
objectcode
library code
executable
assembly
assembler
![Page 55: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/55.jpg)
Gent, December 199955
New Compilation Path Structured machine-level
representation of the program: the representation is
accessible to “binary tools”, high-level information is
maintained and passed to the linker,
code transformations on whole-programs are easier.
The link function and the section offsets information must be rethought.
front-end
sourcefile
machine-level IR
library codeIR
linked machinecode
![Page 56: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/56.jpg)
Gent, December 199956
Inter-module Register Allocation After linkage global exported variables can be
allocated to registers Performing re-allocation of exported variables
before scheduling is expensive
Solution: re-allocation after linking all modules Analyses on variable aliasing (is address taken?) is
computed and maintained A larger pool of live ranges candidates available
for actual register allocation
![Page 57: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/57.jpg)
Gent, December 199957
Fixed-point conversion: motivation
Cost of floating-point hardware.
Most “embedded” programs written in ANSI C.
C does not support fixed-point arithmetic.
Manual writing of fixed-point programs is tedious
and error-prone (insertion of scaling operations).
Fixed-point extensions to C are only a partial
solution.
![Page 58: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/58.jpg)
Gent, December 199958
Fixed-point conversionExample:
acc += (*coef_ptr) * (*data_ptr)
coef_ptr coef_data
load load
mul
add
acc
acc
coef_ptr coef_data
load load
call mulh()
add
acc
acc
>>1
<<1
4 40
5
4
![Page 59: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/59.jpg)
Gent, December 199959
Methodology The user starts with a floating-point
version of the application.
The user annotates a selected set of
FP variables.
The converter automatically
converts the remaining
variables/temporaries and delivers
feedback.
Result: source file where floating-
point variables are replaced by
integer variables with appropriate
scaling operations.
Userannotes
CProgram
converter
AnnotedC
Program
Fixed-point C
Program
![Page 60: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/60.jpg)
Gent, December 199960
Link-time code conversion Problem: linking fixed-point code with library code
transformations on binary code impractical source-level linkage is awkward
Solution: Floating- to fixed-point conversion of library code “on the fly” during linkage.
Advantages: No need to compile in advance a specific version of the
library for a particular fixed-point format. Information about the fixed-point format can flow
between user and library code in both directions.
![Page 61: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/61.jpg)
Gent, December 199961
Experimental Results
SE
SSESQNR
'log10
SQNR (dB)
program fixed-p.1 fixed-p.2
FIR 33.1 74.7
IIR 20.3 55.1
floating-p.
70.9
64.9
S = floating-point signal S’ = fixed-point signal
Accuracy Metric: signal-to-noise ratio (dB)
Test programs: 35th-order FIR, 6th-order IIR filters
![Page 62: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/62.jpg)
Gent, December 199962
Experimental Results
Performance and code size
Floating-point Fixed-point
hardware sw emulation
program cycles size cycles size
FIR
IIR
32826 66
7422 73
151849 170
39192 258
version2
cycles size
39410 72
8723 93
![Page 63: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/63.jpg)
Gent, December 199963
What next?
How to map your application A(L,A,D) to hardware (L,N,C)
L: design level (e.g. architecture, implementation or realization level)A: application compononentsD: dependences between application componentsN: hardware componentC: connections between hardware components
![Page 64: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/64.jpg)
Gent, December 199964
Integrated design environment Software
descriptionAG(L,A,D)
HardwaredescriptionRG(L,N,C)
Mapper &Scheduler
Analysis
Exploration
Steeringdesigntransformation
Steeringdesigntransformationand mapping
Design point
Statistics
Designtransfor-mations
Designtransfor-mations
In the MOVE project we mostly ‘closed’ the right part of the design cycle !!In the MOVE project we mostly ‘closed’ the right part of the design cycle !!
![Page 65: Transport Triggered Architectures used for Embedded Systems Henk Corporaal EE department Delft Univ. of Technology h.corporaal@et.tudelft.nl](https://reader035.vdocument.in/reader035/viewer/2022062422/56649f095503460f94c1d77a/html5/thumbnails/65.jpg)
Gent, December 199965
Conclusions / Discussion Billions of embedded systems with embedded processors sold
annually; how to design these systems quickly, cheap, correct, low power,.... ?
We have experience with tuning architectures for applications extremely flexible templated TTA; used by several companies parametric code generation automatic TTA design space exploration
The challenge: automated tuning of applications for architectures : closing the Y-chart design transformation framework needed