optimal chain rule placement for instruction selection based on
TRANSCRIPT
![Page 1: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/1.jpg)
Optimal Chain Rule Placementfor Instruction Selectionbased on SSA Graphs
Stefan Schäfer, Bernhard Scholz(stefans|scholz)@it.usyd.edu.au
School of IT, University of Sydney
![Page 2: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/2.jpg)
Outline● Related Work
● Motivation (Instruction Selection based on SSA Form)
● Chain Rule Placement
● Implementation
● Results
● Conclusion
![Page 3: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/3.jpg)
Instruction Selection based on SSA Graphs (1)
SourceProgram
CompilerFrontEnd
IntermediateRepresentationin SSA Form
CompilerBackEnd Target
Program
MachineIndependent
Optimisations
CodeSelection
InstructionScheduling
RegisterAllocation
![Page 4: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/4.jpg)
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)
![Page 5: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/5.jpg)
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs
![Page 6: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/6.jpg)
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs
● Code Selection for DAGs
M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999
![Page 7: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/7.jpg)
Related Work (1)● Tree Pattern Matching
C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)
– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs
● Code Selection for DAGs
M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999
– DAGMatching is NPcomplete.
![Page 8: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/8.jpg)
Related Work (2)● Code Selection based on SSA Graphs
E. Eckstein, O. König, B. ScholzCode Instruction Selection based on SSA GraphsSCOPES 2003, Volume 2826 of Lecture Notes on Computer Science
– Introduced a (heuristical) code selection techniques for DAGs– Costoptimal derivation of a graph grammar for a given SSA graph
– Chain rules used for type conversion, but optimal placement unaddressed– optimal means: costminimal for a given cost metric
![Page 9: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/9.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
![Page 10: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/10.jpg)
Instruction Selection based on SSA Graphs (2)
b1 cast
b14 add b11 add b12 add
![Page 11: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/11.jpg)
Instruction Selection based on SSA Graphs (2)
b1 cast
b14 add b11 add b12 add
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 12: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/12.jpg)
Instruction Selection based on SSA Graphs (2)
reg reg reg
sreg::=cast(sreg)
reg::=add(reg,reg) reg::=add(reg,reg) reg::=add(reg,reg)
b1 cast
b14 add b11 add b12 add
sreg
reg reg reg
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 13: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/13.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 14: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/14.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 15: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/15.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 16: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/16.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
optimal 3510placed at b
5, b
9, b
10
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 17: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/17.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
optimal 3510 1placed at b
5, b
9, b
10b
1
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 18: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/18.jpg)
Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]
[3014,1]
[293,1]
[292,1]
[1712,1]
[1,1][26,1]
[59.2,1]
[28,1]
[25,1]
[34,1]
[124,1]
[94,1]
b2
b7
b6b8
b3
b4
b10
b9
b14 add
b11 add
b5
b12 add
strategy time space tradeoff 1:4def (b
1) 30140 1 6028.8
uses (b11
, b12
, b14
) 19300 3 3862.4def/uses 19300 1 3862.4
optimal 3510 1 704placed at b
5, b
9, b
10b
1b
5, b
7
reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]
![Page 19: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/19.jpg)
SSA Form● Single Static Assignment form
● There is at most one assignment to each variable.
● Each definition of a variable is distinct.
![Page 20: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/20.jpg)
SSA Form● Single Static Assignment form
● There is at most one assignment to each variable.
● Each definition of a variable is distinct.
● Multiple definitions have to be resolved:
– if (e) b=32 else b=42; > if (e) b1=32 else b
2=42;
● Further uses induce φfunctions:
– a=b; > a=φ(b1,b
2);
● SSA graphs as intermediate data flow representation in SSA form
![Page 21: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/21.jpg)
Chain Rule Placement● Map the CFG to a network
● Reduce the network for each definition and nonterminal(a definition node dominates all of its users)
● Find a minimum cut for each reduced network
![Page 22: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/22.jpg)
Mapping to a Network
d 10
v 10u 10
![Page 23: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/23.jpg)
Mapping to a Network
dn
dx
10
d 10
∞
tnt d
v 10u 10 un
ux
10
vn
vx
10
∞
∞ ∞
![Page 24: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/24.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 1: u is not a φnode
![Page 25: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/25.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 1: u is not a φnode
– All nodes an all acyclic paths from d to u are dominated by d– All those nodes added to reduced network
![Page 26: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/26.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 2: u is a φnode, all v ∈ preds(u) is dominated by d
r
u = (..., w1, ..., w
2 ...)
w2= op’ (...)
w2w1
w1= op (...)
v1 v2
![Page 27: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/27.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 2: u is a φnode, all v ∈ preds(u) are dominated by d
– All nodes an all acyclic paths from d to v are dominated by d– All those nodes and u added to reduced network
r
u = (..., w1, ..., w
2 ...)
w2= op’ (...)
w2w1
w1= op (...)
v1 v2
![Page 28: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/28.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d
r
u = (..., d1, ..., d
2 ...)
d2= op’ (...)
d2d1
d1= op (...)
x1x2
y
![Page 29: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/29.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d
– Stop traversal for all users of d and add only d to reduced networkr
u = (..., d1, ..., d
2 ...)
d2= op’ (...)
d2d1
d1= op (...)
x1x2
y
![Page 30: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/30.jpg)
Reducing each Network● Done for each definition d and nonterminal
● Starts in each user u:
● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d
– Stop traversal for all users of d and add only d to reduced network
not costoptimal butdoes not occur very often:
2264628 nodes94183 φusescase 3 occurs 1076 times
r
u = (..., d1, ..., d
2 ...)
d2= op’ (...)
d2d1
d1= op (...)
x1x2
y
![Page 31: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/31.jpg)
Implementation
GraphGrammar
Code Basein L
![Page 32: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/32.jpg)
Implementation
GraphGrammar
CodeGeneratorGenerator
Source forCode
Generatorin L
Code Basein L
![Page 33: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/33.jpg)
Implementation
GraphGrammar
Source forCode
Generatorin L
Compilerfor L
CodeGenerator
in L
PBQPLibraryfor L
Code Basein L
CodeGeneratorGenerator
![Page 34: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/34.jpg)
Implementation
GraphGrammar
Source forCode
Generatorin L
Compilerfor L
CodeGenerator
in L
Run
Input Program inSSA Form
Base RuleMatching
PBQPLibraryfor L
Code Basein L
CodeGeneratorGenerator
CompleteMatching
Chain RulePlacement
![Page 35: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/35.jpg)
Costs (Spec2000, Time:Space 1:4)
168.wupw
ise171.sw
im172.m
grid173.applu175.vpr176.gcc177.m
esa179.art181.m
cf183.equake186.crafty188.am
mp
197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw
olf301.apsi
0
10
20
30
40
50
60
70
80
90
100
Use
Def
Def-Use
Min-Cut
%
![Page 36: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/36.jpg)
Costs (MiBench, Time:Space 1:4)
bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw
caudioraw
daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm
ediantoastuntoast
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Use
Def
Def-Use
Min-Cut
%
![Page 37: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/37.jpg)
Execution Times (Spec2000)
168.wupw
ise171.sw
im172.m
grid173.applu175.vpr176.gcc177.m
esa179.art181.m
cf183.equake186.crafty188.am
mp
197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw
olf301.apsi
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Misc
Min Cut
NetworkPBQP
Program
% T
ime
![Page 38: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/38.jpg)
Execution Times (MiBench)
bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw
caudioraw
daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm
ediantoastuntoast
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Misc
Min Cut
Network
PBQP
Program
% T
ime
![Page 39: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/39.jpg)
Contributions● Contributed to code selection based on SSAGraphs
● Main Contributions:
– Formally addressed the unsolved problem of placing chain rules optimally– Introduced an efficient and effective algorithm to place chain rules
optimally with respect to an arbitrary cost metric– Implemented a free, opensource code generator generator, enhancing rule
matching with chain rule placement– Proved the correctness of our algorithm– Conducted experiments with Spec2000 and MiBench suites
![Page 40: Optimal Chain Rule Placement for Instruction Selection based on](https://reader031.vdocument.in/reader031/viewer/2022030323/589eeef91a28abb84a8c210b/html5/thumbnails/40.jpg)
Thank you for your attention!
Any questions or comments?