optimal chain rule placement for instruction selection based on

Optimal Chain Rule Placementfor Instruction Selectionbased on SSA Graphs

Stefan Schäfer, Bernhard Scholz(stefans|scholz)@it.usyd.edu.au

School of IT, University of Sydney

Outline● Related Work

● Motivation (Instruction Selection based on SSA Form)

● Chain Rule Placement

● Implementation

● Results

● Conclusion

Instruction Selection based on SSA Graphs (1)

SourceProgram

CompilerFrontEnd

IntermediateRepresentationin SSA Form

CompilerBackEnd Target

Program

MachineIndependent

Optimisations

CodeSelection

InstructionScheduling

RegisterAllocation

Related Work (1)● Tree Pattern Matching

C. Fraser, R. Henry, T. ProebstingBURG – Fast Optimal Instruction Selection and Tree ParsingACM SIGPLAN Notices 27(4):6876 (1992)

– Works fine with trees (expressions)



– Works fine with trees (expressions)– Problem: control flow graphs are usually directed acyclic graphs




● Code Selection for DAGs

M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999




● Code Selection for DAGs

M. A. Ertl, Optimal Code Selection in DAGs, Proceedings of POPL 1999

– DAGMatching is NPcomplete.

Related Work (2)● Code Selection based on SSA Graphs

E. Eckstein, O. König, B. ScholzCode Instruction Selection based on SSA GraphsSCOPES 2003, Volume 2826 of Lecture Notes on Computer Science

– Introduced a (heuristical) code selection techniques for DAGs– Costoptimal derivation of a graph grammar for a given SSA graph

– Chain rules used for type conversion, but optimal placement unaddressed– optimal means: costminimal for a given cost metric

Instruction Selection based on SSA Graphs (2)b1 cast [3014,1]

[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add


b1 cast

b14 add b11 add b12 add


b1 cast


reg ::= add(reg, reg) [10.0,1.0]sreg ::= cast(sreg) [10.0,1.0]reg ::= sreg [10.0,1.0]sreg ::= reg [10.0,1.0]


reg reg reg

sreg::=cast(sreg)

reg::=add(reg,reg) reg::=add(reg,reg) reg::=add(reg,reg)

b1 cast


sreg

reg reg reg



[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add

strategy time space tradeoff 1:4def (b

1) 30140 1 6028.8



[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add


1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4



[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add


1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4



[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add


1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

optimal 3510placed at b

5, b

9, b

10



[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add


1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

optimal 3510 1placed at b

5, b

9, b

10b

1



[3014,1]

[293,1]

[292,1]

[1712,1]

[1,1][26,1]

[59.2,1]

[28,1]

[25,1]

[34,1]

[124,1]

[94,1]

b2

b7

b6b8

b3

b4

b10

b9

b14 add

b11 add

b5

b12 add


1) 30140 1 6028.8

uses (b11

, b12

, b14

) 19300 3 3862.4def/uses 19300 1 3862.4

optimal 3510 1 704placed at b

5, b

9, b

10b

1b

5, b

7


SSA Form● Single Static Assignment form

● There is at most one assignment to each variable.

● Each definition of a variable is distinct.

SSA Form● Single Static Assignment form

● There is at most one assignment to each variable.

● Each definition of a variable is distinct.

● Multiple definitions have to be resolved:

– if (e) b=32 else b=42; > if (e) b1=32 else b

2=42;

● Further uses induce φfunctions:

– a=b; > a=φ(b1,b

2);

● SSA graphs as intermediate data flow representation in SSA form

Chain Rule Placement● Map the CFG to a network

● Reduce the network for each definition and nonterminal(a definition node dominates all of its users)

● Find a minimum cut for each reduced network

Mapping to a Network

d 10

v 10u 10

Mapping to a Network

dn

dx

10

d 10

∞

tnt d

v 10u 10 un

ux

10

vn

vx

10

∞

∞ ∞

Reducing each Network● Done for each definition d and nonterminal

● Starts in each user u:

● Case 1: u is not a φnode



● Case 1: u is not a φnode

– All nodes an all acyclic paths from d to u are dominated by d– All those nodes added to reduced network



● Case 2: u is a φnode, all v ∈ preds(u) is dominated by d

r

u = (..., w1, ..., w

2 ...)

w2= op’ (...)

w2w1

w1= op (...)

v1 v2



● Case 2: u is a φnode, all v ∈ preds(u) are dominated by d

– All nodes an all acyclic paths from d to v are dominated by d– All those nodes and u added to reduced network

r

u = (..., w1, ..., w

2 ...)

w2= op’ (...)

w2w1

w1= op (...)

v1 v2



● Case 3: u is a φnode, any v ∈ preds(u) is not dominated by d

r

u = (..., d1, ..., d

2 ...)

d2= op’ (...)

d2d1

d1= op (...)

x1x2

y




– Stop traversal for all users of d and add only d to reduced networkr

u = (..., d1, ..., d

2 ...)

d2= op’ (...)

d2d1

d1= op (...)

x1x2

y




– Stop traversal for all users of d and add only d to reduced network

not costoptimal butdoes not occur very often:

2264628 nodes94183 φusescase 3 occurs 1076 times

r

u = (..., d1, ..., d

2 ...)

d2= op’ (...)

d2d1

d1= op (...)

x1x2

y

Implementation

GraphGrammar

Code Basein L

Implementation

GraphGrammar

CodeGeneratorGenerator

Source forCode

Generatorin L

Code Basein L

Implementation

GraphGrammar

Source forCode

Generatorin L

Compilerfor L

CodeGenerator

in L

PBQPLibraryfor L

Code Basein L


Implementation

GraphGrammar

Source forCode

Generatorin L

Compilerfor L

CodeGenerator

in L

Run

Input Program inSSA Form

Base RuleMatching

PBQPLibraryfor L

Code Basein L


CompleteMatching

Chain RulePlacement

Costs (Spec2000, Time:Space 1:4)

168.wupw

ise171.sw

im172.m

grid173.applu175.vpr176.gcc177.m

esa179.art181.m

cf183.equake186.crafty188.am

mp

197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw

olf301.apsi

0

10

20

30

40

50

60

70

80

90

100

Use

Def

Def-Use

Min-Cut

%

Costs (MiBench, Time:Space 1:4)

bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw

caudioraw

daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm

ediantoastuntoast

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Use

Def

Def-Use

Min-Cut

%

Execution Times (Spec2000)

168.wupw

ise171.sw

im172.m

grid173.applu175.vpr176.gcc177.m

esa179.art181.m

cf183.equake186.crafty188.am

mp

197.parser200.sixtrack252.eon254.gap255.vortex256.bzip2300.tw

olf301.apsi

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Misc

Min Cut

NetworkPBQP

Program

% T

ime

Execution Times (MiBench)

bitcntscjpegcrcdijkstradjpegfft gs ispellloutpatriciapgpqsortraw

caudioraw

daudiorijndaelsearchshasusantiff2bwtiff2rgbatiffdithertiffm

ediantoastuntoast

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Misc

Min Cut

Network

PBQP

Program

% T

ime

Contributions● Contributed to code selection based on SSAGraphs

● Main Contributions:

– Formally addressed the unsolved problem of placing chain rules optimally– Introduced an efficient and effective algorithm to place chain rules

optimally with respect to an arbitrary cost metric– Implemented a free, opensource code generator generator, enhancing rule

matching with chain rule placement– Proved the correctness of our algorithm– Conducted experiments with Spec2000 and MiBench suites

Thank you for your attention!

Any questions or comments?

optimal chain rule placement for instruction selection based on

Documents