future works in llvm register allocationllvm.org/devmtg/2009-10/registerallocationfutureworks.pdf0...
Post on 03-Oct-2020
4 Views
Preview:
TRANSCRIPT
Future Works in LLVM Register Allocation
1
Talk Overview
1. Introduction
2. Upcoming Changes
3. PBQP
2
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
3
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Lower PHI-instructions to copies
Register Allocation in LLVM
4
PHI Lowering
BB1: %x1 = .
BB3:%x3 = phi [%x1, BB1], [%x2, BB2]
BB2: %x2 = .
5
PHI Lowering
BB1: %x1 = .
BB3:%x3 = %xP
BB2: %x2 = .
6
PHI Lowering
BB1: %x1 = . %xP = %x1
BB3:%x3 = %xP
BB2: %x2 = . %xP = %x2
7
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Lower PHI-instructions to copies
Register Allocation in LLVM
8
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Register Allocation in LLVM
Lower Three-Address Instructions
9
Two-Address Instructions
10
Two-Address Instructions
x3 = x2 + x1
10
Two-Address Instructions
x3 = x2 + x1
x3 = x2x3 += x1
10
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Register Allocation in LLVM
Lower three-address instructions
11
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Construct live intervals
12
Live Intervals
BB:%x1 = ...
.
.
.%x2 = %x1
.
.
. ... = %x2
13
Live Intervals
BB:%x1 = ...
.
.
.%x2 = %x1
.
.
. ... = %x2
%x1
13
Live Intervals
BB:%x1 = ...
.
.
.%x2 = %x1
.
.
. ... = %x2
%x1 %x2
13
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Construct Live Intervals
14
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Aggressively eliminate copies
15
BB:%x1 = ...
.
.
.%x2 = %x1
.
.
. ... = %x2
Coalescing
%x1 %x2
16
BB:%x1 = ...
.
.
.
.
.
. ... = %x1
Coalescing
%x1
16
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Aggressively eliminate copies
17
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Compute register assignment
18
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
Apply register assignment
19
Register Allocation in LLVM
PHI Elimination Two-Address
Coalescing Allocator
Liveness
Rewriter
20
Improvements
• New and better optimizations
• New allocators
• Cleaner infrastructure
21
1. Optimizations
22
Rematerialization
23
. // stuff
... =
. // more stuff
... =
<expr>
vr1
vr1
vr1
Rematerialization
=
23
. // stuff
... =
. // more stuff
... =
<expr>
vr1
vr1
vr1
Rematerialization
=
23
. // stuff
... =
. // more stuff
... =
<expr>
vr1
vr1
vr1
[M]
[M]
[M]
Rematerialization
=
23
. // stuff
... =
. // more stuff
... =
<expr>
vr1
vr1
vr1
Rematerialization
=
23
. // stuff
... =
. // more stuff
... =
<expr>
<expr>
Rematerialization
23
Splitting
24
Splitting
24
Splitting
24
Splitting
24
Splitting
Spill
24
Spills introducedin to other loops
Splitting
Spill
24
Splitting
Register
Spill
24
2. New Allocators
25
New Allocators
26
New Allocators
• “Linear Scan” is not, in fact, linear
26
New Allocators
• “Linear Scan” is not, in fact, linear
• We want something faster
26
New Allocators
• “Linear Scan” is not, in fact, linear
• We want something faster
• Priority coloring?
26
New Allocators
• “Linear Scan” is not, in fact, linear
• We want something faster
• Priority coloring?
• Linear Scan?
26
New Allocators
• “Linear Scan” is not, in fact, linear
• We want something faster
• Priority coloring?
• Linear Scan?
• Need to tidy the infrastructure
26
3. Cleaner Infrastructure
27
Currently...
28
Currently...
Liveness Analysis
28
Currently...
Liveness Analysis Allocator
28
Currently...
Liveness Analysis Allocator
VirtualRegister Map
28
Currently...
Liveness Analysis Allocator
VirtualRegister Map
28
Currently...
Liveness Analysis Allocator
VirtualRegister Map
28
Currently...
Liveness Analysis Allocator
VirtualRegister Map
28
Currently...
Liveness Analysis Allocator Rewriter
VirtualRegister Map
28
Rewrite In Place
29
Rewrite In Place
Liveness Analysis
29
Rewrite In Place
Liveness Analysis
Live Intervals
29
Rewrite In Place
Liveness Analysis Allocator
Live Intervals
29
Rewrite In Place
Liveness Analysis Allocator
Live Intervals
Modify code in-place
29
Rewrite In Place
Liveness Analysis Allocator
Live Intervals
Modify code in-place
Live intervalskept up-to-date
29
Rewrite In Place
Liveness Analysis Allocator
Live IntervalsLive intervals remain valid post-alloc
Modify code in-place
Live intervalskept up-to-date
29
Improvements
• New and better optimizations
• New allocators
• Cleaner infrastructure
30
Upcoming Changes
31
Upcoming Changes
• Live index renumbering
• Improved splitting
• Better def/kill tracking for values
32
Live Indexes
33
BB: . . %x2 = %x1 add %x3, %x2 . .
Live Indexes
33
BB: . . %x2 = %x1 add %x3, %x2 . .
Live Indexes
123456
33
%x3 = ...
BB: . . %x2 = %x1
add %x3, %x2 . .
Live Indexes
123
456
33
%x3 = ...
BB: . . %x2 = %x1
add %x3, %x2 . .
Live Indexes
123
456
?
33
BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .
Live Indexes
P1P2P3
P4P5P6
34
BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .
Live Indexes
P1P2P3
P4P5P6
P1 ≺ P2 ≺ P3 ≺ P4 ≺ P5 ≺ P634
BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .
Live Indexes
P1P2P3
P4P5P6
P7
P1 ≺ P2 ≺ P3 ≺ P4 ≺ P5 ≺ P634
BB: . . %x2 = %x1 %x3 = ... add %x3, %x2 . .
Live Indexes
P1P2P3
P4P5P6
P7
P1 ≺ P2 ≺ P3 ≺ P4 ≺ P5 ≺ P6P7 ≺34
Live Indexes
• unsigned ➙ LiveIndex
• Index Renumbering
✓⌚
35
Improved Splitting
36
Improved Splitting
• Break multi-value intervals into component values.
36
Improved Splitting
• Break multi-value intervals into component values.
• Each value gets a 2nd chance at allocation.
36
Improved Splitting
• Break multi-value intervals into component values.
• Each value gets a 2nd chance at allocation.
• ... but not a 3rd.
36
Improved Splitting
• Break multi-value intervals into component values.
• Each value gets a 2nd chance at allocation.
• ... but not a 3rd.
• 13% reduction in static memory references on test case (a pathological SSE kernel).
36
Better Def/Kill Tracking
37
Better Def/Kill Tracking
• Defined by a PHI - Track the def block.
For Values
37
Better Def/Kill Tracking
• Defined by a PHI - Track the def block.
• Killed by a PHI - Track the appropriate predecessor.
For Values
37
Future Work
• Better value def/kill tracking
• LiveIndex renumbering
• Improved splitting
• Rewrite-in-place
• New allocators
✓
⌚⌚⌚⌚
38
PBQP
39
PBQP
• Discrete optimization problems
• NP-complete
• Subclass solvable in linear time
Partitioned Boolean Quadratic Problems
40
Irregular Architectures
• Multiple register classes.
• Register aliasing.
• Register pairing.
• ...
41
PBQP Example
42
PBQP Example
ZX
Y
42
PBQP Example
901
436
702
ZX
Y
42
PBQP Example
901
436
702
ZX
Y
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
Y
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
Y
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
Y
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
Y
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
Y
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15Edge Costs: 2+6+9 = 17
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15Edge Costs: 2+6+9 = 17
Total: 32
Solution [3,2,1]:
42
PBQP Example
901
436
702
3 8 18 5 35 2 7
5 2 26 7 31 4 8
1 9 34 5 39 7 6
ZX
YNode Costs: 6+0+9 = 15Edge Costs: 2+6+9 = 17
Total: 32
Solution [3,2,1]:
Solution [1,2,3]: 19
42
PBQP Example
900
400
700
0 0 00 ∞ 00 0 ∞
0 0 00 ∞ 00 0 ∞
0 0 00 ∞ 00 0 ∞
ZX
Y
Nodes represent virtual registers.
Options reflect storage locations.
Option costs:
Typically zero cost for registers,spill cost estimate for stack slot.
Edge costs:
Depends on the constraint.
For Register Allocation:
43
Example 1
X Y
Interference on a Regular Architecture
44
Example 1
400
X Y
Interference on a Regular Architecture
Sp
r0
r1
44
Example 1
400
700
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
44
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
44
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
44
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (Sp, Sp) = 1144
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, Sp) = 745
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (Sp, r0) = 446
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, r1) = 047
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, r1) = 0 ✓47
0 0 00 ∞ 00 0 ∞
Example 1
400
700
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, r1) = 0 ✓48
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, r1) = 0 ✓49
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, r0) = ∞50
Example 1
400
700
0 0 00 ∞ 00 0 ∞
X Y
Interference on a Regular Architecture
Sp
r0
r1
Sp
r0
r1
Cost (r0, r0) = ✗∞50
Example 1
0 0 00 ∞ 00 0 ∞
Interference on a Regular Architecture
50
Example 2
0 0 0 0 00 ∞ ∞ 0 00 0 0 ∞ 0
Interference on an Irregular Architecture
Sp
AX
BX
Sp AL AH BL CL
51
Example 3
0 0 00 -c 00 0 -c
Coalescing
Sp
AX
BX
Sp AX BX
52
Example 4
Register Pairing (Ri, Ri+1)
Sp
s0
s1
Sp s0 s1 s2 s3
s2
0 0 0 0 00 ∞ 0 ∞ ∞0 ∞ ∞ 0 ∞0 ∞ ∞ ∞ 0
53
The PBQP Allocator
solution = solve pbqpsolution -> allocation
regalloc -> pbqp
54
The PBQP Allocator
solution = solve pbqpsolution -> allocation
regalloc -> pbqp
54
The PBQP Allocator
solution = solve pbqpsolution -> allocation
regalloc -> pbqp
54
The PBQP Allocator
llvm/lib/CodeGen/RegAllocPBQP.cpp
solution = solve pbqpsolution -> allocation
regalloc -> pbqp
54
How does it work?
Solver uses a graph reduction algorithm.
Reduce problem to the empty graph with reduction rules, then reconstruct it.
55
PBQP
56
PBQP
PROS CONS
• Ideal for Irregularity
• Very Simple
• Reasonable quality
• Slooooooow
56
PBQP
PROS CONS
• Ideal for Irregularity
• Very Simple
• Reasonable quality
• Slooooooow
56
PBQP
PROS CONS
• Ideal for Irregularity
• Very Simple
• Reasonable quality
• Slooooooow
• Perfect opportunity
for a coffee
56
57
• Improved Optimizations.
• New Allocators.
• Cleaner Architecture.
• PBQP.
57
the end.
58
top related