level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines...
TRANSCRIPT
Level by Level: Making Flow- and Context-Sensitive Pointer
Analysis Scalable for Millions of Lines of Code
Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo
Institute of Computing Technology, Chinese Academy of Sciences
{ htyu, zqzhang, fxb, huowei }@ict.ac.cn
1
Jingling Xue
University of New South Wales
INSTITUTE OF COMPUTING TECHNOLOGY
Outline
• Introduction• Framework• Analyzing a Level• Experiments• Conclusion
2
INSTITUTE OF COMPUTING TECHNOLOGY
Introduction• Motivation
– Who needs flow- and context-sensitive (FSCS) pointer analysis ?• Software checking tools• Program understanding• Parallelization tools • Hardware synthesis
– Existed methods cannot scale to large real programs
– Aiming at millions of lines of C code
3
INSTITUTE OF COMPUTING TECHNOLOGY
Improve scalability
• For flow-sensitivity– Decreasing iterations in dataflow analysis– Saving space of points-to graph
• For context-sensitivity– Summary-based– Low storage penalty– Low apply penalty
4
INSTITUTE OF COMPUTING TECHNOLOGY
Idea• Level by Level analysis
– Analyze the pointers in decreasing order of their points-to levels• Suppose
int **q, *p, x;
q has a level 2, p has a level 1 and x has a level 0.
– Fast flow-sensitive analysis on full sparse SSA– Fast and accurate context-sensitive analysis
using a full transfer function
5
INSTITUTE OF COMPUTING TECHNOLOGY
Contribution• performs a full-sparse flow-sensitive pointer
analysis using a flow-insensitive algorithm• performs a context-sensitive pointer analysis
efficiently with precise full transfer function• yields a flow- and context-sensitive
interproce-dural may/must mod/ref on a compact SSA form
• analyzes million lines of code in minutes, fast-er than the state-of-the art FSCS pointer ana-lysis algorithms
6
INSTITUTE OF COMPUTING TECHNOLOGY
Framework
Figure 1. Level-by-level pointer analysis (LevPA).
Evalute transfer functions
Bottom-upBottom-up Top-downTop-down
Propagate points-to set
Compute points-to
level
Compute points-to
level
for points-to level from the highest to lowest
incremental build call graph
7
INSTITUTE OF COMPUTING TECHNOLOGY
Points-to level
• Property 1. If a variable x is possibly pointed to by a pointer y, then ptl(x) ≤ ptl(y).
• Property 2. If a variable y is possibly assigned to x, then ptl(x) = ptl(y).
• Compute points-to level by a Unification-based pointer analysis
8
INSTITUTE OF COMPUTING TECHNOLOGY
Example
int o, t;main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x = &a; y = &b; L4: foo(x, y); L5: *b = 5; L6: if ( … ) { x = &c; y = &e; } L7: else { x = &d; y = &d; } L8: c = &t; L9: foo( x, y); L10: *e = 10; }
void foo( int **p, int **q) { L11: *p = *q; L12: *q = &obj;}
9
ptl(x, y, p, q) =2ptl(a, b, c, d, e) =1 ptl(t, o) = 0
analyze first { x, y, p, q } then { a, b, c, d, e} last { t, o }
INSTITUTE OF COMPUTING TECHNOLOGY
Bottom-up analyze level 2void foo( int **p, int **q) { L11: *p = *q; L12: *q = &obj; }
main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x = &a; y = &b; L4: foo(x, y); L5: *b = 5; L6: if ( … ) { x = &c; y = &e; } L7: else { x = &d; y = &d; } L8: c = &t; L9: foo( x, y); L10: *e = 10; }
10
INSTITUTE OF COMPUTING TECHNOLOGY
Bottom-up analyze level 2void foo( int **p, int **q) { L11: *p1 = *q1; L12: *q1 = &obj; }
main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x = &a; y = &b; L4: foo(x, y); L5: *b = 5; L6: if ( … ) { x = &c; y = &e; } L7: else { x = &d; y = &d; } L8: c = &t; L9: foo( x, y); L10: *e = 10; }
11
• p1’s points-to depend on formal-in p
• q1’s points-to depend on formal-in q
INSTITUTE OF COMPUTING TECHNOLOGY
Bottom-up analyze level 2void foo( int **p, int **q) { L11: *p1 = *q1; L12: *q1 = &obj; }
main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x1 = &a; y1 = &b; L4: foo(x1, y1); L5: *b = 5; L6: if ( … ) { x2 = &c; y2 = &e; } L7: else { x3 = &d; y3 = &d; } x4=ϕ (x2, x3); y4=ϕ (y2, y3) L8: c = &t; L9: foo( x4, y4); L10: *e = 10; }
12
• p1’s points-to depend on formal-in p
• q1’s points-to depend on formal-in q
• x1 → { a }• y1 → { b }• x2 → { c }• y2 → { e }• x3 → { d }• y3 → { d }• x4 → { c, d }• y4 → { e, d }
INSTITUTE OF COMPUTING TECHNOLOGY
Full-sparse Analysis
• Achieve flow-sensitivity flow-insensitively – Regard each SSA name as a unique variable– Set constraint-based pointer analysis
• Full sparse– Saving time– Saving space
13
INSTITUTE OF COMPUTING TECHNOLOGY
Top-down analyze level 2
L4:foo.p → { a }foo.q → { b }
L9:
foo.p → { c, d }
foo.q → { d, e }
• foo.p → { a, c, d }
• foo.q → { b, d, e }
main: Propagate to callsite
14
void foo( int **p, int **q) { L11: *p = *q; L12: *q = &obj; }
main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x = &a; y = &b; L4: foo(x, y); L5: *b = 5; L6: if ( … ) { x = &c; y = &e; } L7: else { x = &d; y = &d; } L8: c = &t; L9: foo( x, y); L10: *e = 10; }
INSTITUTE OF COMPUTING TECHNOLOGY
Top-down analyze level 2
void foo( int **p, int **q) { μ(b, d, e) L11: *p1 = *q1; χ(a, c, d) L12: *q1 = &obj;
χ(b, d, e) }
foo: Expand pointer dereferences
15
Merging calling contexts here
void foo( int **p, int **q) { L11: *p = *q; L12: *q = &obj; }
main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x = &a; y = &b; L4: foo(x, y); L5: *b = 5; L6: if ( … ) { x = &c; y = &e; } L7: else { x = &d; y = &d; } L8: c = &t; L9: foo( x, y); L10: *e = 10; }
INSTITUTE OF COMPUTING TECHNOLOGY
Context Condition
• To be context-sensitive• Points-to relation ci
– p ⟹ v (p→v ) , p must (may) point to v, p is a formal parameter.
• Context Condition ℂ(c1,…,ck)– a Boolean function consists of higher-level points-to
relations• Context-sensitive μ and χ
– μ(vi, (cℂ 1,…,ck))– vi+1=χ(vi, M, (cℂ 1,…,ck))
• M {may, must∈ }, indicates weak/strong update
16
INSTITUTE OF COMPUTING TECHNOLOGY
Context-sensitive μ and χ
void foo( int **p, int **q) { μ(b, q⟹b)
μ(d, q→d) μ(e, q→e)
L11: *p1 = *q1; a=χ(a , must, p a)⟹ c=χ(c , may, p→c) d=χ(d , may, p→d)L12: *q1 = &obj; b=χ(b , must, q b)⟹ d=χ(d , may, q→d) e=χ(e , may, q→e)}
17
INSTITUTE OF COMPUTING TECHNOLOGY
Bottom-up analyze level 1
void foo( int **p, int **q) { μ(b1, q⟹b) μ(d1, q→d) μ(e1, q→e)
L11: *p1 = *q1;
a2=χ(a1 , must, p⟹a)
c2=χ(c1 , may, p→c)
d2=χ(d1 , may, p→d)L12: *q1 = &obj; b2=χ(b1 , must, q⟹b) d3=χ(d2 , may, q→d) e2=χ(e1 , may, q→e)}
18
INSTITUTE OF COMPUTING TECHNOLOGY
Points-to Set
• Local Points-to Set – Loc (p) = { <v, (cℂ 1,…,ck)> | (cℂ 1,…,ck) is a context
condition}. – p can point to v if and only if (cℂ 1,…,ck) holds.– is computed explicitly during the bottom-up analysis.
• Dependence Set– Dep(p) = { <q, (cℂ 1,…,ck)> | q is a formal-in parameter
of level lev and (cℂ 1,…,ck) is a context condition– Ptr(p) includes Ptr(q) if and only if (cℂ 1,…,ck) holds.
19
INSTITUTE OF COMPUTING TECHNOLOGY
Transfer function
• Trans(proc, v)– < Loc(v), Dep(v), (cℂ 1,…,ck), M >
• v is a formal-out parameter• ℂ(c1,…,ck) is a context condition.
– V can be modified at a callsite invoking proc only if (cℂ 1,…,ck) holds at the callsite
• M {may, must∈ } , – indicates may/must mod effect
• Trans(proc) – a set of all individual transfer functions
Trans(proc, v).
20
INSTITUTE OF COMPUTING TECHNOLOGY
Bottom-up analyze level 1
void foo( int **p, int **q) { μ(b1, q⟹b) μ(d1, q→d) μ(e1, q→e)
L11: *p1 = *q1; a2=χ(a1 , must, p⟹a) c2=χ(c1 , may, p→c) d2=χ(d1 , may, p→d)L12: *q1 = &obj; b2=χ(b1 , must, q⟹b) d3=χ(d2 , may, q→d) e2=χ(e1 , may, q→e)}
• Trans(foo, a) = < { }, { <b, q⟹b> , < d, q→d>, < e, q→e>} , p a⟹ , must >
21
• Trans(foo, c) = < { }, { <b, q⟹b> , < d, q→d>, < e, q→e>} , p→c, may >
• Trans(foo, b) = < {< obj, q⟹b> }, { } , q b⟹ , must >
• Trans(foo, e) = < {< obj, q→e> }, { } , q→e, may >
• Trans(foo, d) = < {< obj, q→d> }, { <b, p→d q∧ ⟹b> , < d, p→d>, < e, p→d q∧ →e> } , p→d q∨ →d, may >
INSTITUTE OF COMPUTING TECHNOLOGY
Bottom-up analyze level 1
int obj, t;main() { L1: int **x, **y;
L2: int *a, *b, *c, *d, *e; L3: x1 = &a; y1 = &b; μ(b1, true) L4: foo(x1 , y1 ); a2=χ(a1 , must, true) b2=χ(b1 , must, true)
L5: *b1 = 5; L6: if ( … ) { x2 = &c; y2 = &e; } L7: else { x3 = &d; y3 = &d; } x4=ϕ (x2, x3) y4=ϕ (y2, y3) L8: c1 = &t; μ(d1, true) μ(e1, true) L9: foo(x4 , y4); c2=χ(c1, may , true) d2=χ(d1, may , true) e2=χ(e1, may , true) L10: *e1= 10; }at L4,
p ⟹ a holds, q ⟹ b holds
at L9, p → c, p → d holds,q → e, q → d holds,
22
INSTITUTE OF COMPUTING TECHNOLOGY
BDD and context condition
• Context conditions are implemented using BDD– Compactly represented – Boolean operations efficiently
23
x1
x2
x3
0 1
01
0
110
variable x1 represents p→a
variable x2 represents q→a
variable x3 represents p→b
BDD for = (ℂ p → a q → a) p → b∧ ∨
if only p → b holds at a call site, we can write ℂ |x1=0;x2=0;x3=1 to see whether C holds at the call site.
INSTITUTE OF COMPUTING TECHNOLOGY
Experiment
• Analyzes million lines of code in minutes• Faster than the state-of-the art FSCS pointer analysis
algorithms.
Table 2. Performance (secs).
24
Benchmark KLOCLevPA
Bootstrapping(PLDI’08)
64bit 32bit 32bit
Icecast-2.3.1 22 2.18 5.73 29
sendmail 115 72.63 143.68 939
httpd 128 16.32 35.42 161
445.gombk 197 21.37 40.78 /
wine-0.9.24 1905 502.29 891.16 /
wireshark-1.2.2 2383 366.63 845.23 /
INSTITUTE OF COMPUTING TECHNOLOGY
Conclusion
• We present a scalable method for flow- and context-sensitive pointer analysis
• Analyzes the pointers in a program level by level in terms of their points-to levels. – Fast flow-sensitive analysis on full sparse SSA
form – Fast and accurate context-sensitive analysis using
full transfer functions represented by BDD. • Can analyze million lines of C code in minutes,
faster than the state-of-the-art methods.25
INSTITUTE OF COMPUTING TECHNOLOGY
Thanks
26