a and a tool for source level timing/energy estimation of ... · • reducing packaging costs and...
TRANSCRIPT
![Page 1: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/1.jpg)
A methodology and a tool for source level timing/energy estimation of software
P f Willi F i i f i @ l li i i h d i li i i /f iProf. William Fornaciari, [email protected], home.dei.polimi.it/fornaciaProject Manager and Contact
Prof Carlo Brandolese brandole@elet polimi itProf. Carlo Brandolese, [email protected] Developer
Politecnico di Milano – DEI Milano, Italy
![Page 2: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/2.jpg)
OutlineMotivation– Software energy and timing estimationgy g– Goals
Modeling approach & Toolchains– Source‐level modelsSource‐level models– Dynamic models– Target processor characterization models– Target processor characterization models
Res ltsResults– Estimation
l– Analysis
2
![Page 3: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/3.jpg)
Low power and energy aware designPractical market issue• Increasing market share of mobile, gasking for longer cruising life
• Limitations of battery technology
Economic issue• Reducing packaging costs and achieving energy savings
Technology issue• Enabling the realization of high‐density chips g g y p(heat poses severe constraints to reliability)
3
![Page 4: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/4.jpg)
Motivation ‐ Application scenarios
Large data centers, or data‐intensive
High performance Low power
Low energy
General purpose, and multimedia Reliability, and
battery‐supplied
Smartphones, Wireless Sensor Networks
4
pmobile multimedia
![Page 5: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/5.jpg)
Software energy and timing estimationEvaluation of the performance of an embedded system– Must consider specific contributions realted to software executionp
• User application• Operating System & BSPs• User data
– Should be performed as early as possibleT ll l i diff i l i l i• To allow exploring different implementation alternatives
– Should not depend on the availability of the target platform• ISS based solutions do exist• ISS‐based solutions do exist• Better if performed directly on the host machine
– Should provide data at level at which the application is conceivedShould provide data at level at which the application is conceived• The source code level
– Should be as fast and accurate as possiblep• A fast estimation toolchain allows automated design space exploration
5
![Page 6: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/6.jpg)
GoalsThe SWAT Project(*) aims at three main goals– Defines a general estimation approachg pp– Defines
• Execution time models• Energy consumption models
– Defines low‐level, target independent software metrics• To enable a more comprehensive analysis of the application • To support the designer in optimizing the application
I l t t l t t th d l– Implements tools to support the models• As modular and flexible as possible
(*) The SWAT Project is part of the EU‐Funded Project IP 247999 Project COMPLEX:"COdesing and power Management in PLatform‐based design space Exploration"
6
![Page 7: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/7.jpg)
Modeling approachEnergy consumption and execution time of an embedded application depend on– The structure of the application
• Its source code and, consequently, its assembly translation
– The stimuli• Specific data that the application processes
– The execution platform• The target microprocessor
h i if• The operating system, if any• The third‐party libraries
All these aspectsAll these aspects– Are complex to model as a whole with sufficient generalitySh ld b "d l d" d d ll d i d d l– Should be "decoupled" and modelled independently
7
![Page 8: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/8.jpg)
Modeling approachThe SWAT methodology decouples the different aspectsSource code modelSource code model– Captures the structure of the code at a high level of abstraction– No dependence on data or platform is accounted forNo dependence on data or platform is accounted for
Stimuli modelCollects set of representative execution profiles– Collects set of representative execution profiles
Execution platform– The target microprocessor is modeled at ISA level– The operating system and all third‐party are statistically
h i d i bi l l dcharacterized operating at binary level, as source codemay not be available
8
![Page 9: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/9.jpg)
Source‐level modelsTo model the source code, several representations have been used in literature– Concrete and abstract syntax trees– Static metrics mutuated from software engineering analysesg g y– Black‐box function‐level analysis– Intermediate representationp
SWAT models the source– Using the LLVM Compiler Infrastructure– Using the LLVM Compiler Infrastructure – Starting from the LLVM intermediate representationAdapting it to the specific purpose– Adapting it to the specific purpose• For static modelling, several semantic aspects are irrelevant and the originlLLVM code is simplified
• For dynamic modelling, the LLVM code is augmented with suitable probes
9
![Page 10: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/10.jpg)
Source‐level modelsThe simplified LLVM code is called "Basic‐block model"
int sumsqr( int* x, int n ) {
f.c
int i, t = 0;for( i = 0; i < n; i++ )
t += x[i] * x[i]return t;
}
llvm-gccdefine i32 @sumsqr(i32* %x, i32 %n) nounwind {
...bb7:
}
f.ll
bb7:%8 = load i32* %i, align 4 %9 = load i32** %1, align 8 %10 = getelementptr inbounds i32* %9, i64 %9 ...
swat-bbmodel
sumsqr:6:22:27:25:bb7:332:124:158::-:-:
}
Back‐annotation info
St ti & d i
f.bbmodel
: : ::-:-:-:(25,load,-,-,-)(25,load,-,-,-)(25,getelementprt,-,-,-)
Static & dynamic energyand timing (not yet available)
Basic‐block model
10
...]
![Page 11: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/11.jpg)
Source‐level modelsFormally the basic‐block model is expressed by the representation matrix:
Where– Each row corresponds to a basic‐block– Each column corresponds to an LLVM instruction
The element of the representation matrixp– Indicate how many LLVM instructions of type j are present in basic block i
![Page 12: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/12.jpg)
Dynamic modelDynamic models accont for data dependence– Collect profiling information by executing the codep g y g
• On the host platform, since the LLVM code can be targeted to the host • On an targtet ISS, if available (very slow)• On the actual target, using in‐circuit debugging facilities
For OS‐less application host execution is preferrable– Must provide stubs for emulating devices
The result of execution is a basic‐block traceThe result of execution is a basic‐block trace– The trace can be stored for an analysis over timeThe trace can be compacted to provide execution counts only– The trace can be compacted to provide execution counts only
12
![Page 13: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/13.jpg)
Dynamic modelSWAT provides a very flexible tracing support– Can trace basic‐blocks, functions, values, stack, ...
Tracing is based on a three‐steps process– Meta‐instrumentationMeta instrumentation
• Add tags to the LLVM code containing structured information about it• Tags are added as comments, so the resulting code is still LLVM‐compliant• This phase is independent from the information to be traced
– Instrumentation expansion• Expands tags into specific probes (function calls)• Uses a custom, awk‐like, set of expansion rules
E i– Execution• Produces the dynamic information for the specific simulus or set of stimuli• Data can be dumped in time‐trace or collected as counts in memory• Data can be dumped in time‐trace or collected as counts in memory
13
![Page 14: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/14.jpg)
Dynamic modelThe dynamic model "enriches" the basic‐block model
f.ll f.bbmodelf.ll
swat-minstr
f.bbmodel
swat-mexpand
f i llf.i.ll
compilelink
sumsqr:6:22:27:25:bb7:332:124:158::20.377:13.239:
Back‐annotation info
St ti & d i
execute
a.trace: . : . ::10:203.774:132.393:(25,load,-,4.331,2.973)(25,load,-,4.331,2.973)(25,getelementprt,-,6.104,4.610)
Static & dynamicenergy and timing
Basic‐block model withinstruction‐level energyand timing figures
swat-todyn
14
...]
g g
f.bbmodel
![Page 15: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/15.jpg)
Dynamic modelFormally the basic‐block counts are represented as
Wh h l t i i t d t b i bl kWhere each element is associated to a basic blockThe resulting LLVM instruction counts are thus:
Where each element is the cumulative execution count of a specific LLVM instructionof a specific LLVM instruction
15
![Page 16: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/16.jpg)
Dynamic modelTo target the model to the specific processor we define a "translation matrix"
Where– Each row corresponds to an LLVM instruction– Each column corresponds to an instruction of the target set
The element of the representation matrixp– Indicate how many target instructions of type j are statistically used by the compiler to translate an LLVM instruction of type i
16
![Page 17: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/17.jpg)
Dynamic modelThe energy and timing model of LLVM istruction for the specific target instruction set can thus be expressed as
WhereWhere – is an LLVM instruction
i t t i t ti– is a target instruction– indicates the estimated execution time
i di h i d– indicates the estimated energy
17
![Page 18: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/18.jpg)
Dynamic modelCombining all the equations described so far yields the total estimated energy and execution time of the code
In other words, combining, g– Excution count per each basic‐block– LLVM instruction per each basic‐blockLLVM instruction per each basic block– Target instruction per each LLVM instruction– Energy and execution time per each target instructionEnergy and execution time per each target instruction
...leads to the overall estimates
18
![Page 19: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/19.jpg)
Target processor characterizationAs we have seen– The model of the source code is expressed as LLVM code– The actual execution involves instructions of the target processor
The translation matrix links these two levels– It expresses the "average" way the compiler translates
LLVM i t t t i t tian LLVM into one or more target instruction
As an exampleAs an example:
19
![Page 20: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/20.jpg)
Target processor characterizationThe translation of an instance of an LLVM instruction– Depends on the context and optimizations selected p p
The coefficients collects in the translation matrix– Can only represent an "average" translationCan only represent an average translation– Should be derived analyzing a very large training set
The characterization procedure is structured in stepsThe characterization procedure is structured in steps– Select a suitably large set of source code benchmarksT l t h i t LLVM– Translate each source into LLVM• Count LLVM instruction frequency
Transalte each source into the target assembly– Transalte each source into the target assembly• Count LLVM instruction frequency
– Bulid a multilinear problem and solve it in least‐square senseBulid a multilinear problem and solve it in least square sense
20
![Page 21: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/21.jpg)
Target processor characterizationFormally, let– be the set of training source codesg– the count of LLVM instruction of type j
in the source code i– the count of target instruction of type k
in the source code i
The desired multilinear model can thus be expressed as
I t t i fIn compact matrix form:
21
![Page 22: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/22.jpg)
Target processor characterizationThis model is though too general– It assumes that the translation of each LLVM instruction may ypotentially involve all target instructions!
Realistically, the translation will only use a subset of y, ytarget instruction– Knowing the semantics of both instruction sets, is easy tog , ydefine a reasonable subset of target instructions
This is modeled by the set of vectorsy
where a 1 indicates thatwhere a 1 indicates that – LLVM instruction jMi h b i ll l d i l i i k– Might be potentially traslated using also target instruction k
22
![Page 23: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/23.jpg)
Target processor characterizationTo integrate these hypotheses into the model:
From a "physical" point of viewNegative coefficients have no meaning– Negative coefficients have no meaning
– Coefficients must be constrained to be positive or null
Th t i d bl i th dThe constrained problem is thus expressed as
Each solution vector collects the coefficients of the model of LVVM instruction jjThe translation matrix is thus simply given by
23
![Page 24: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/24.jpg)
Target processor characterizationThis characterization process – Is almost totally automaticy
• Only the translation “hypothesis " vectors must be coded manually
– Has been implemented in the SWAT toolchain
*.c
Training source set
llvm-gccllvm.translations
Manual input model
target-gcc
llvm.count target.count
t h t i t / tl bswat-characterize
target.model
octave/matlab
target.model
![Page 25: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/25.jpg)
The SWAT toolchainSWAT collects a set of tools analysis– Organized into several toolchains or flowsg
In particular:– Target characterization flowTarget characterization flow– Estimation & back‐annotation flow– Analysis flowAnalysis flow– Generic tracing toolchain
A few prototyping optimization tools and flows have alsoA few prototyping optimization tools and flows have also been implementedO ti d ti i– Operating modes optimizer
– Application parameters exploration and optimizationHi h l l f i "hi "– High‐level transformation "hint" generator
25
![Page 26: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/26.jpg)
The SWAT toolchainToolchains are organized as follows– Front‐End: Source code modellingg– Core: Perform specific analyses– Back‐End: Post‐processing & Report generationp g p g
SOURCES
SWAT Front‐End
Source Code Models
Estimates & Reports
Core Flow NCore Flow 1 Core Flow 2
26
Estimates & Reports
![Page 27: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/27.jpg)
Estimation resultsEnergy estimation errors (WCET Benchmarks)
Estimation error (%)
0
5
10
‐10
‐5
0
Average: ‐2.32
‐15cover bs bsort100 fibcall matmult cnt compress recursion edn expint lcdnum
Estimation error (%, absolute value)
810
1214
( , )
2
46
8Average: +5,98
27
0cover bs bsort100 fibcall matmult cnt compress recursion edn expint lcdnum
![Page 28: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/28.jpg)
Estimation resultsData dependency of energy estimation errors (WCET Benchmarks)– Same benchmark different data, namely the size of the array to be compressed
Estimation error (%)
2
4
6
6
‐4
‐2
0
Average: ‐3,74
‐10
‐8
‐6
0 20 40 60 80 100 120 140 160
28
![Page 29: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/29.jpg)
Analysis ResultsApplication summary & structure
29
![Page 30: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/30.jpg)
Analysis ResultsFunction models
30
![Page 31: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/31.jpg)
Analysis ResultsEstimates overview & detailed charts
31
![Page 32: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/32.jpg)
Analysis ResultsDynamic analysis: Basic‐block execution count
32
![Page 33: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/33.jpg)
Analysis ResultsTrace analysis: Stack size over time
33
![Page 34: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/34.jpg)
Work in progressSWAT opt. Toolchain (1)– Based on the same front-end of
sources
Based on the same front end ofthe estimation flow and usingthe results extracted from the
SWAT Front‐Endcpu
dynamic models and theexecution traces, the mainoptimization flow apply a set of
static & dynamic models
optimization flow apply a set offuzzy rules on selected portionsof the application to suggest the
build fitness function
of the application to suggest themost promising transformationsto apply.
fitness functions
inferential enginefuzzy fuzzy rules
optimization hints
34
![Page 35: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/35.jpg)
Work in progressSWAT opt. Toolchain (2)– A second optimization flow
sources
A second optimization flowexplores compileroptimization options in order
SWAT Front‐Endcpu
to determine thetransformation mix that bestfits the specific application To
llvm‐opt optimization mix
fits the specific application. Tothis purpose, the MOSTdesign space exploration
SWAT Back‐End
energy and timing
MOSTDSE engine(COMPLEX) design space exploration
engine is used to generatesets of transformation mixes
energy and timingestimates
for the LLVM optimizer untilthe best optimization recipe isfound
optimizationrecipe
found.
35
![Page 36: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/36.jpg)
Moving up to system level: RTRM (BBQ)
Scenarios
RequirementsAggregationhi
cal
Run‐Time Resource Manager (BBQ)
gg gPg
Hierarch
A t i
ResourcesAccounting
P P P
Symmetric
D1 D2 DNAsymmetric
36
Retargetability
![Page 37: A and a tool for source level timing/energy estimation of ... · • Reducing packaging costs and achieving energy savings ... Software energy and timing estimation ... explores compiler](https://reader033.vdocument.in/reader033/viewer/2022043017/5f3964576671b1153029d383/html5/thumbnails/37.jpg)
ConclusionsSoftware characterization is of paramount importance for embedded mobile applicationsThere is room for optimization at source levelPressure is also for moving up the perspectivePressure is also for moving up the perspective• Time adaptability• Retargetability• Retargetability• System‐wide resources management
• For more info Thank you!
• Prof. William Fornaciari• [email protected]• home.dei.polimi.it/fornacia
37