Computer Science 12Design Automation for Embedded Systems
ECRTS 2011
WCET-aware Register AllocationWCET-aware Register Allocationbased onbased on
Integer-Linear ProgrammingInteger-Linear Programming
Heiko Falk, Norman Schmitz, Florian Schmoll
TU Dortmund
Computer Science 12
Design Automation for Embedded Systems
Slide 2 / 18© H. Falk | 2011-07-06 ECRTS 2011
OutlineOutline
Introduction State of the Art in Compiler Design Register Allocation
Traditional ILP-based Register Allocation ILP Model Limitations
WCET-aware Register Allocation using ILP Model of the WCET Model of Pipeline-Related Spill Costs
Results Summary & Future Work
Slide 3 / 18© H. Falk | 2011-07-06 ECRTS 2011
Current State of the Art in Compiler DesignCurrent State of the Art in Compiler Design
Objective Function of Compiler Optimizations Usually reduction of Average-Case Execution Times (ACET):
Accelerate a “typical” execution of a program using “typical” input data
No statements about WCETs possible
Optimization Strategy Naive: Current compilers lack precise ACET timing model Application of an optimization if “promising” Effect of optimizations on a program’s ACET fully unknown to the
compiler itself. ACET-optimizations not useful for WCET minimization
Slide 4 / 18© H. Falk | 2011-07-06 ECRTS 2011
Register AllocationRegister Allocation
Goals Considered the most important compiler optimization Registers are fastest and most efficient memories Register Allocation should make optimal use of registers
Tasks Assembly code before register allocation: virtual registers
(VREGs) Map all (potentially many) VREGs to (usually few) physical
registers (PHREGs) of a processor Insert memory loads and stores (spill code) whenever VREGs
don’t fit into the register file
Slide 5 / 18© H. Falk | 2011-07-06 ECRTS 2011
Well-Known Register AllocatorsWell-Known Register Allocators
Graph Coloring De-facto standard approach nowadays Heuristics decide about allocation and spill code generation Fast approach of moderate complexity Spill heuristic might lead to poor code quality
Register Allocation via Integer-Linear Programming (ILP) Formal mathematical model of allocation and spilling Achieves minimal spill code overhead, i.e. minimizes total number
of spill instructions Relatively high complexity, but optimal quality
[P. Briggs, Register Allocation via Graph Coloring, 1992]
[D. W. Goodwin, K. D. Wilken, Optimal and Near-optimal Global Register Allocation Using 0-1 Integer Programming, 1996]
Slide 6 / 18© H. Falk | 2011-07-06 ECRTS 2011
Traditional ILP-based Register AllocationTraditional ILP-based Register Allocation
Spilling decisions
ConstraintsGuarantee correctness of allocation and spilling decisions, e.g. ensure that each VREG is assigned to at least one PHREG, that at most one VREG can be assigned to a single PHREG, ...
Allocation decisionsVariables , and map VREGs to PHREGs
Slide 7 / 18© H. Falk | 2011-07-06 ECRTS 2011
Traditional ILP-based Register AllocationTraditional ILP-based Register Allocation
Objective Function Minimizes spill code-related overhead Under the assumption:
Each spill instruction contributes by same constant amount to objective function
Example: minimization of spill-related code size
Slide 8 / 18© H. Falk | 2011-07-06 ECRTS 2011
WCET Minimization via ILP-based Allocation?WCET Minimization via ILP-based Allocation?
Limitation of the traditional approach Assumption:
Each spill instruction contributes by same constant amount to objective function
Assumption only holds for trivial objectives like e.g. code size
Challenges How to model and minimize Worst-Case Execution Time (WCET)
as non-trivial objective? How to deal with complex processor pipelines executing spill
instructions in parallel with other code?
Slide 9 / 18© H. Falk | 2011-07-06 ECRTS 2011
Challenge 1: ILP Model of the WCETChallenge 1: ILP Model of the WCET
The Worst-Case Execution Path (WCEP) WCET of a program = Length of the program’s longest execution
path (WCEP) WCET Minimization: Optimization of only those parts of a program
lying on the WCEP Code optimization apart the WCEP will not reduce WCET
Only those spill-related decision variables must contribute to the ILP’s objective function that actually lie on the WCEP.
But: Spilling decisions affect WCET of basic blocks and thus the WCEP within a program.
How to model the WCEP via ILP depending on spill-related decision variables?
Slide 10 / 18© H. Falk | 2011-07-06 ECRTS 2011
Costs of basic block :
models WCET of depending on the WCET of potentially inserted spill code
WCET without any spill code, plus WCET of all spill code inside
Spill Code-dependent CostsSpill Code-dependent Costs
Slide 11 / 18© H. Falk | 2011-07-06 ECRTS 2011
Intraprocedural Control FlowIntraprocedural Control Flow
Modeling of a function’s control flow:
A
CB
D
E
Acyclic sub-graphs: (Reducible) Loops:
B
A
C
D
E
Treat body of inner-most loop like acyclic sub-graph
Fold loop Costs of :
Continue with next innermost loop = WCET of longest path
starting at A
Loop LB, C, D
Slide 12 / 18© H. Falk | 2011-07-06 ECRTS 2011
Objective FunctionObjective Function
WCET of entire function: Each function has dedicated entry block Variable models WCET of longest path within starting
at
Variable models WCET of entire function
Slide 13 / 18© H. Falk | 2011-07-06 ECRTS 2011
Challenge 2: Pipeline-Related Spill CostsChallenge 2: Pipeline-Related Spill Costs
Example: The Infineon TriCore Pipelines Integer I-Pipeline: Executes usual integer ALU instructions Load/Store LS-Pipeline: Executes memory loads/stores and
address arithmetic Ideal case: One I- and one LS-instruction executed in parallel
within same clock cycle However...
(Some even more subtle cases of the TriCore pipelines omitted here…)
add d0,d1,d2; # d0 = d1 + d2ld d0,[a0]; # d0 = mem[a0]
I-instruction
LS-instruction
WAW hazard (write after write) Stalled by 1 cycle
Slide 14 / 18© H. Falk | 2011-07-06 ECRTS 2011
ILP Example for Costs of Spill Instruction ILP Example for Costs of Spill Instruction ss
Case 1 If is LS-instruction:
. costs 1 cycle if is actually generated:
Case 2 If is spill-load
and is I-instruction: . costs 1 cycle if
is actually generatedand WAW hazard between and exists via PHREG :
st [a1],d1; # i: mem[a1] = d1ld d0,[a0]; # s: d0 = mem[a0]
add d0,d1,d2; # i: d0 = d1 + d2ld d0,[a0]; # s: d0 = mem[a0]
Slide 15 / 18© H. Falk | 2011-07-06 ECRTS 2011
Results – Worst-Case Execution TimesResults – Worst-Case Execution Times
Target Processor: TriCore TC1796 100%: WCETEST using Graph Coloring
Compiler: WCC at optimizationlevel -O3 (42 optimizations)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
Re
lati
ve
WC
ET
ES
T [
%]
WCET-ILP WCET-GC
[H. Falk, WCET-aware Register Allocation based on Graph Coloring, DAC 2009]
98%
19%
80% x2
Slide 16 / 18© H. Falk | 2011-07-06 ECRTS 2011
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
110%
Re
lati
ve
AC
ET
[%
]
WCET-ILP WCET-GC
Results – Average-Case Execution TimesResults – Average-Case Execution Times
Target Processor: TriCore TC1796 100%: ACET using Graph Coloring
Compiler: WCC at optimizationlevel -O3 (42 optimizations)
Slide 17 / 18© H. Falk | 2011-07-06 ECRTS 2011
Results – CPU RuntimesResults – CPU Runtimes
ILP-based Allocator Runtimes range from 1 CPU second to 54:08 CPU minutes Including WCET analysis and ILP solver Average runtime for 55 benchmarks: 3:33 CPU minutes
WCET-aware Graph Coloring Average runtime for 55 benchmarks: 4:13 CPU minutes Reason: Performs a costly WCET analysis after register allocation
for each individual basic block
Slide 18 / 18© H. Falk | 2011-07-06 ECRTS 2011
Summary & Future WorkSummary & Future Work
Summary Current state of the art: Compilers are unaware of timing, naive
optimization strategies Standard register allocators unaware of worst-case properties May thus lead to spill code generation along WCEP WCET-aware ILP-based register allocation: Sophisticated models
of WCET and pipeline-related spill costs Average WCET reductions over 55 benchmarks: 20.2% Outperforms WCET-aware graph coloring by factor 2
Future Work Reduce runtimes of ILP-based register allocator Improve code quality further by integrating rematerialization