progressive register allocation for irregular architectures

2005 International Symposium on Code Generation and Optimization

Progressive Register Allocation for Irregular

Architectures

David [email protected]

Seth Copen [email protected]

March 23, 2005

2005 International Symposium on Code Generation and Optimization2

Irregular Architectures

• Few registers

• Register usage restrictions– address registers, hardwired registers...

• Memory operands

• Examples:– x86, 68k, ColdFire,

ARM Thumb, MIPS16, V800, various DSPs...

eaxebxecxedxesiedi

ebpesp


Fewer Registers More Spills

• Used gcc to compile >10,000 functions from Mediabench, Spec95, Spec2000, and micro-benchmarks

• Recorded which functions spilled

Percent of functions that spill

05

101520253035404550

PPC (32) 68k (16) x86 (8)

Percent


Register Usage Restrictions

• Instructions may prefer or require a specific subset of registers– x86 multiply instruction

imul %edx,%eax // 2 byte instruction

imul %edx,%ecx // 3 byte instruction– x86 divide instruction

idivl %ecx // eax = edx:eax/ecx


Memory Operands

• Load/store not always needed to access variables allocated to memory– depends upon instruction– still less efficient than register access

addl 8(%ebp), %eax vs

movl 8(%ebp), %edxaddl %edx, %eax


Register Allocation Challenges

• Optimize spill code– with few registers, spilling unavoidable

• Model register usage restrictions

• Exploit memory operands– affects spilling decisions


Previous Work

Method Models Irregular Features

Fast Optimal

Graph Coloring

Integer Programming[Goodwin and Wilken 96]

[Kong and Wilken 98]

[Fu and Wilken 2002]

Separated IP[Appel and George 01]

PBQP[Scholz and Eckstein 02] / /


Our Goals

• Expressive– Explicitly represent architectural irregularities

and costs

• Proper model– An optimum solution results in optimal

register allocation

• Progressive solution algorithm– more computation better solution– decent feasible solution obtained quickly– competitive with current allocators


Multicommodity Network Flow (MCNF)

a b

a b

2

22 4

444

instruction

crossbar

source

sink


Modeling Usage Constraints

int foo(int a, int b, int c){ a = a*b; return a/c;}

a

a

b

imuleax edx ecx mem

b

1-1

idiveax edx ecx mem

c

c

1

not quite right…


Modeling Spills and Moves

int foo(int a, int b, int c){ a = a*b; return a/c;}

a

imuleax edx ecx mem

b

1-1

eax edx ecx mem

eax edx ecx mem

c

b

3 3 3

a

idiveax edx ecx mem

c

1

eax edx ecx mem

eax edx ecx mem


Modeling Stores

• Simple approach flawed– doesn’t model memory

persistency

• Solution: antivariables– flow only through memory– eviction cost = store cost– evict only once


Register Allocation as MCNF

• Variables Commodities

• Variable Usage Network Design

• Nodes Allocation Classes (Reg/Mem)

• Registers Limits Node Capacities

• Spill Costs Edge Costs

• Variable Definition Source

• Variable Last Use Sink


Solving an MCNF

• Integer solution NP-complete

• Use standard IP solvers– commercial solvers (CPLEX) are impressive

• Exploit structure of problem– variety of MCNF specific solvers

• empirically faster than IP solvers

• Lagrangian Relaxation technique


Lagrangian Relaxation: Intuition

• Relaxes the hard constraints – only have to solve single commodity flow

• Combines easy subproblems using a Lagrangian multiplier– an additional price on each edge

a b

a b

01

Example:edges have unit capacity

a b

a b

0+11with price, solution to single commodity flow can be solution to multicommodity flow


Solution Procedure

• Compute prices using iterative subgradient optimization– converge to optimal prices

• At each iteration, greedily construct a feasible solution using current prices– allocate most expensive vars first– can always find an allocation


Solution Procedure

• Advantages+ have feasible solution at each step+ iterative nature progressive+ Lagrangian relaxation theory provides

means for computing a lower bound+ Can compute optimality bound

• Disadvantages– No guarantee of optimality of solution


Evaluation

• Replace gcc’s local allocator

• Optimize for code size– easy to statically evaluate

• Evaluate on MediaBench, MiBench, SpecInt95, SpecInt2000– consider only blocks where local allocation is

interesting (enough variables to spill)


Behavior of Solver


Proven Optimality

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

5-10conflicts

(355 blocks)

10-15conflicts

(23 blocks)

15-20conflicts

(7 blocks)

>= 20conflicts

(5 blocks)

>25%

Within 20%

Within 15%

Within 10%

Within 5%

Optimal


Comprehensive Results

-15.00%

-10.00%

-5.00%

0.00%

5.00%

10.00%

15.00%

20.00%

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

1 Iter10 Iters

100 Iters1000 Iters

5-10 conflicts(355 blocks)



>= 20 conflicts(5 blocks)

Improvement over gcc

artifact of interaction with gcc


Progressive Nature

:-(


Contributions

• New MCNF model for register allocation+ expressive, can model irregular architectures+ can be solved using conventional ILP solvers

• Progressive solution procedure+ decent initial solution+ maintains feasible solution+ improves solution over time– no optimality guarantees

Progressive

progressive register allocation for irregular architectures

Documents

code generation

x vsmovl

hardwired registers

optimizationfewer registers

optimizationmodeling

memory operandsexamples

byte instructionimul

movesint fooint