fast points-to analysis for languages with structured types michael jung and sorin a. huss...
TRANSCRIPT
Fast Points-to Analysis for Fast Points-to Analysis for Languages with Structured TypesLanguages with Structured Types
Michael Jung and Sorin A. Huss
Integrated Circuits and Systems Lab.
Department of Computer Science
Technische Universität Darmstadt, Germany
OutlineOutline
A points-to analysis is applied to answer questions like: „Which variables might be accessed by expression *a->b?“
• Motivation – Why is this of benefit to know?
• Code Motion
• Partial Evaluation
• Design decisions for a points-to analysis
• Concepts of Steensgaard‘s points-to analyses
• Differences between proposed and original PTA
• Results
MULT r1, b, cSTORE r1, (a)LOAD r2, (e)ADD d, r2, f
Motivation – Code MotionMotivation – Code Motion
...
*a = b * c; d = *e + f;
LOAD r2, (e)MULT r1, b, cSTORE r1, (a)LOAD r2, (e)ADD d, r2, f
Optimize CPUutilization bycode motion•RAW hazard on r2
•LOAD is multi cycle•Pipeline stalled
a = &g; e = &g;*a = b * c; d = *e + f;
RAW violation
• If the sets of locations a and e may point to are disjoint → optimization is correct
• Points-to analysis computes conservative approximation of these sets
• ANSI C restrict type qualifier:void f(int n, int * restrict p, int * restrict q) {
while (n--) *p++ = *q++;}
Motivation – Partial EvaluationMotivation – Partial Evaluation
int power(int x, unsigned n) { int r=1; while (n) { if (n&1) r=r*x; x=x*x; n=n>>1; } return r;}
x : dynamicn : static
void power_gen(unsigned n) { printf(„void power(int x, unsigned n) {\n“); printf(„ int r=1;\n“); printf(„ assert(n==%d);\n“, n); while (n) { if (n&1) printf(„ r=r*x;\n“); printf(„ x=x*x;\n“); n=n>>1; } printf(„ return r;\n}“);}
n = 3
void power(int x, unsigned n) { int r=1; assert(n==3); r=r*x; x=x*x; r=r*x; x=x*x; return r;}
Relation to Points-to Analysis:
Can expression *y statically be evaluated?
int *p, a, b;
if (a) { p = &a; } else { p = &b;
}
b = *p;
int *p, a, b;
if (a) { p = &a; f(p);} else { p = &b; f(p);}
b = *p;
Storage Shape GraphsStorage Shape Graphs
pa
b
PT(*p)={a,b}
• PTA computes a storage shape graph• undecidable -> conservative approx.
• Tradeoff between efficiency and accuracy• flow sensitivity
pb
PT(*p)={b}
pa
PT(*p)={a}
• context sensitivity• complexity of storage shape graphp a,b
PT(*p)={a,b} • Steensgaard‘s first algorithm [1]:• flow insensitive• context insensitive• graph complexity O(n)• almost linear time complexity• 80/20 rule: 80% benefit, 20% cost [2]
[1] B. Steensgaard. Points-to analysis in almost linear time. In Symposium on Principles of Programming Languages, 1996
[2] M.Hind and A.Pioli. Which pointer analysis should I use? In International Symposium on Software Testing and Analysis, 2000
Concepts of Steensgaard‘s PTAConcepts of Steensgaard‘s PTA
a b c
d e
a b c
d e
a b,d c
e
a b,d c,e a = &d;
• Keeping the storage shape graph O(N)
Disjoint-set forests ([2]) => Join is O(α(N,N)) => Analysis O(Nα(N,N))
[2] R.Tarjan. Efficiency of a good but not linear set union algorithm. Journal of the ACM, 1975
• Data flow direction (Pending Joins)
a=&b;a=c;
c
a b
c=&d;
b,d
NIL Join( )
Points-to Analysis Comparison IPoints-to Analysis Comparison I
object
simple struct
blank
[4] 4 types of nodes:
s->a = &a;s->b = &b;*(*int)s =
c;s
a
bs a,b
structobject
Single kind of representation: Node with fields
simple
blank struct
Contra:• Memory layout dependent
Pro:• Conceptually simpler
(algorithm less complex)• More precise in case of
inconsistent access
[4] B.Steensgaard. Points-to Analysis by type inference of programs with structures and unions. In Computational Complexity, 1996
simple
simple
Points-to Analysis Comparison IIPoints-to Analysis Comparison II
struct { int *a, *b, *c } s;int **d, f, g, h;
s.a = &f;s.b = &g;s.c = &h;
d = &s->a;d = &s->b;
d
f
g
h
s
s
d
f
g
h
s
d
f
g
h
d
f
g
h
s
d
f
g
h
s
d
f
g
h
s
d
f
g
h
s
d
f
g
h
s
d
f
g
h
s
d
f
g
h
s
d
f
g
h
s
d s f,g,h d f,g
h
d
h
f
g
Points-to(**d) = { f, g, h } Points-to(**d) = { f, g }
Steensgaard‘s Analysis [4] Proposed Analysis
1. Graph Initialization:One node per variable
- Join nodes as necessary
- Establish links
2. Iterate over statements
Data StructuresData Structures
),(
),(
),,,(
,...,1
ulp
soq
pqf
ff n
f
f
os
l
u
Abstract locations
Fields
Field extents
Pointer offset range
Storage shape graphs are composed of
RelationsRelations
Intervals-overlap rel.:
Sub-interval relation:
Field inclusion:
Assignment data flow:
2o
2s1o
1s
field extents
1o
1s 1s
pointer offset ranges
1o
2o
2s
1o
1s
field extents
1o
1s 1s
pointer offset ranges
1o
2o
2s
1o
1s
1l
1u
2l
2u
2
*a =s *b;
2u1l
1us
b1
as
2
2:y3
us
Constraints Deduction ExampleConstraints Deduction Example
x = *y;
1:xs
s
ResultsResults
number of variables aliased
benchmark
LOC 1 2 3 4 5 6 7 20 22 270
bc 6,098 5 1 1
espresso 11,691 17 10 1 1 1 1
li 7,355 3 1 1 1 2 1
Twelve benchmarks from Todd Austin‘s as well as from the SPEC92 benchmarks. Excerpt:
• Benchmarks common for Points-To Analyses
• Hard to compare
• Steensgaard‘s second paper does not report on this kind of results
...
ConclusionsConclusions
We propose an improvement to Steensgard‘s PTA, of which we feel confident that it is• more precise than,• as fast as,• conceptually simpler and thus • easier to implement than the original.
Thanks for your attention!