context-sensitive flow analysis using instantiation constraints cs 343, spring 01/02 john whaley...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Context-Sensitive Flow Analysis Using Instantiation Constraints
CS 343, Spring 01/02John Whaley
Based on a presentation by Chris Unkel
Instantiation Constraints Flow-insensitive Context-sensitive Handles higher-order functions
(function pointers) smoothly “Flow” analysis (“provenance”
analysis) Inspired by Henglein’s Type Inference
with Polymorphic Recursion 1993 Constraint-based analysis
Constraint-Based Analysis Pattern of program analysis Program is read to produce abstract
representation—set of constraints Graph System of equations
Abstract representation is processed Result is examined to tell us about the
program
Types
a2 ptr3
a4
Alpha type Pointer type
pointer
pointee
func4
a5 a6
Function type
function
input return
Equality Constraint Result of an assignment Values flow both ways
From *x to *y From *y to *x
Handle with unification
*x = *y;x: ptr1
*x: a2
y: ptr3
*y: a4
Instantiation Constraint Used to make connections across
procedures Values flow one direction only Generated by naming functions Identified with labels
int foo(int x)
{ … }
…
foo1(b);
foo: func
4x: a5
a6
foo1: func
1b: a2 a3
)1
Call Id Twice Exampleint *id(int *x)
{
return x;
}
void main(void)
{
int *a, *b, *c, *d;
int e, f;
b = &e;
d = &f;
a = id1(b);
c = id2(d);
}
Generated Graph
id: func
x: a5
a: ptrb: ptr c: ptrd: ptr
fe *a *c
id2: func
id1: func
)1 )2
Processing Rules (1)foo: func
4x: a5
a6
foo1: func
1b: a2 a3
)1
foo: func
4x: a5
a6
foo1: func
1b: a2 a3
)1
)1(1
Processing Rules (2)
ptr1
a2
ptr3
a4
)1
(1
ptr1
a2
ptr3
a4
)1
(1
)1
ptr1
a2
ptr3
a4
ptr1
a2
ptr3
a4
)1
(1
(1
Processing Rules (3)
a1 a3
)1(1
a2
a1 a3
)1(1
a2
a1, a3
)1 (1
a2
“Closure rule”
Generated Graph
id: func
x: a5
a: ptrb: ptr c: ptrd: ptr
fe *a *c
id2: func
id1: func
)1 )2
Processing Graph (1)
id: func
x: a5
a: ptrb: ptr c: ptrd: ptr
fe *a *c
id2: func
id1: func
)2
)2
)1
(2
Processing Graph (2)
id: func
x: a5
a: ptrb: ptr c: ptrd: ptr
fe *a *c
id2: func
id1: func
)2
)2
)1
(2
Processing Graph (3)
id: func
x: a5
a: ptrb: ptr c, d: ptr
fe *a
id2: func
id1: func
)2
)2
)1
(2
Processing Graph (4)
id: func
x: a5
a: ptrb: ptr c, d: ptr
fe *a
id2: func
id1: func
)2
)2
)1
(2(1
)1
Processing Graph (5)
id: func
x: a5
a: ptrb: ptr c, d: ptr
fe *a
id2: func
id1: func
)2
)2
)1
(2(1
)1
Result
id: func
x: a5
a, b: ptr
c, d: ptr
fe
id2: func
id1: func
)2
)2
)1
(2(1
)1
Result
id: func
x: a5
a, b: ptr
c, d: ptr
fe
id2: func
id1: func
)2
)2
)1
(2(1
)1
Nuts and Bolts Build the constraints for a procedure Simplify those constraints Instantiate (copy) those constraints
to the callers
Polarity: Distinguish between function inputs and
outputs
Using the Results A value in on variable can end up in a
second if: There is a path in the graph consisting of 0 or more red/close paren edges followed by 0 or more green/open paren edges
From a to x: yes From a to c: no
**()
An Application Format String Vulnerability
Some format strings can cause data to be overwritten
E.g. “%n%n%n%n%n%n%n%n” Malicious format string can gain control
of program Problem formulation
Values should not flow from an unsafe source to a format string
Source, network in recv Format string, first argument to printf
Inst. Constraints Summary Result shows provenance—source of
values Can be used as a pointer analysis Algorithm is actually undecidable
But seems to run quickly in practice Context-sensitive
Paren-matching through closure rule Interesting base analysis to build tools
on
The End
Some Other Pointer Work Andersen
Flow-insensitive, with directional assignments
Assignment copies points-to set of rhs to lhs Cycles make it slow; other work to find and
collapse cycles Das (“one level flow”)
One pointer level of directionality Handles the common cases in C well
Multiple return values, altering parameters
Wilson and Lam Flow-sensitive, context-sensitive
Further ReadingManuvir Das. Unification-Based Pointer Analysis with Directional
Assignments. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, Vancouver, BC, Canada, June 2000.
Manuel Fahndrich, Jakob Rehof, Manuvir Das. Scalable context-sensitive flow analysis using instantiation constraints. Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, Vancouver, BC, Canada, June 2000.
R. P. Wilson and M. S. Lam. Efficient context-sensitive pointer analysis for C programs. Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, June 1995.
L.O. Andersen. Program analysis and specialization for the C programming language. Technical Report 94-19, University of Copenhagen, 1994.
Overview Steensgaard ’95
Flow-insensitive, context-insensitive alias analysis
Liang and Harrold ’99 Straightforward context-sensitive
extension of Steensgaard Fahndrich, Rehof, Das ’00
Context-sensitive extension of Steensgaard that handles function pointers and higher-order functions smoothly
Steensgaard Preliminaries “Type inference”
Types here do not refer to integer, char, etc!
“Non-standard” or “extended” types Two objects sharing the same type have
some property in common Point to the same things
Each variable in the program has a type Type rules describe a consistent typing
Points-to sets vs. alias pairs Begin by ignoring the conditional join
stuff
Steensgaard Basic operation
After an assignment x=y, x and y point to the same set of things (*x and *y are the same)
Non-directional: x=y has same effect as y=x Also implies that *x and *y point to the same
set of things (and **x and **y, and so on) Alias relation (not points-to relation) is
symmetric, reflexive, transitive Types can be grouped into equivalence
classes of objects that point to the same things
Assignment joins equivalence classes of pointees together
Steensgaard Example (1)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
x
y
z
&x
&y
&z
*a
*c
*b
*y
Steensgaard Example (2)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
x
y
z
&x
&y
&z
*a
*c
*b
*y
Steensgaard Example (3)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
y
z
&x
&y
&z
*a, x
*c
*b
*y
Steensgaard Example (4)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
y
z
&x
&y
&z
*a, x
*c
*b
*y
Steensgaard Example (5)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c z
&x
&y
&z
*a, x
*c
*b, y
*y
Steensgaard Example (6)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c z
&x
&y
&z
*a, x
*c
*b, y
*y
Steensgaard Example (7)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
&x
&y
&z
*a, x
*c
*b, y
*y, z
Steensgaard Example (8)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
&x
&y
&z
*a, x
*c
*b, y
*y, z
Steensgaard Example (9)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
&x
&y
&z
*a, x, *y,
z
*c
*b, y
Steensgaard Example (10)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
&x
&y
&z
*a, x, *y,
z
*c
*b, y
Steensgaard Example (Result)a = &x;
b = &y;
if (p)
y = &z;
else
y = &x;
c = &y;
a
b
c
&x
&y
&z
*a, x, *y,
z
*b, y, *c
Observation: forcing pointees to join is sometimes too strong an action Assignment is really directional After x=y, x points to everything y points to Using a directional assignment prevents
using equivalence classes/union find But, if y’s points-to set is null, OK to do
nothing When we see x=y for y not a pointer
(bottom type in this notation), don’t join immediately, but record the fact that if y later is found to point to something, we must join it with x.
A Typing RuleGIVEN
x : ref(a) x has type pointer to type ay : ref(b) y has type pointer to type bb a b is not a pointer type, or
a and b are the same typewe conclude that
welltyped (x = y) our types are well-typed for statement x = y (we have a consistent points-to graph)
Reading downward verifies consistency; upward gives constraints.
Steensgaard Summary Fast union-find allows solution in
“near-linear” (linear times inverse Ackerman’s function) time
Does not handle structs (but see later paper by same author)
Flow-insensitive Context-insensitive Non-directional assignments
Liang & Harrold Operation Do Steensgaard within each procedure
to build a summary Then do bottom-up, top-down
propagation of results Bottom-up: aliases from callees to callers Top-down: from callers to callees
“FICS”=flow-insensitive context-sensitive
L & H Example (1)int *id(int *x)
{
return x;
}
int e, f;
void main(void)
{
int *a, *b, *c, *d;
b = &e;
d = &f;
a = id(b);
c = id(d);
}
main
id
L & H Example (2) (Phase 1)
a idret
*x, *idret
xb c d
fe
This is the result of applying Steensgaard to eachprocedure individually; the result is a summary of
the pointer behavior of each function.
main id
L & H Example (3) (Phase 2)
a idret
*x, *idret
xb c d
fe
Propagate pointer information from callees tocallers (apply summaries of called functions.)Bind formals to actuals, and returns to where
they are assigned. (Bottom-up phase.)
main id
bindings
inducededge
First call site
L & H Example (4) (Phase 2)
a idret
*x, *idret
xb c d
fe
main id
Second call site
L & H Example (5) (Phase 3)
a idret
*x, *idret, e
xb c d
fe
Propagate pointer information from callersto callees. (Top-down phase.)
main id
First call site
L & H Example (6) (Phase 3)
a idret
*x, *idret, e, f
xb c d
fe
main id
Second call site
L & H Summary Cycles in call graph
In BU/TD phases, iterate among procedures in each SCC to find a fixpoint
Presumes call graph pre-exists With function pointers, need pointer
analysis to provide call graph! Algorithm as expressed doesn’t handle
function pointers