formal methods program slicing & dataflow analysis february 2015

Post on 30-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Formal Methods

Program Slicing & Dataflow Analysis

February 2015

Program Analysis

• Automatic analysis of a program

• Two main objectives– Correctness: program verification– Efficiency: code optimization (compilers)– Security: understand code vulnerabilities

• Two types of analysis– Static Analysis:

• do not execute program; reason over all inputs– Dynamic Analysis:

• Execute program; reason over specific input

Static Analysis

• Based upon source code analysis• Useful for: – Semantic Analysis of Programs

e.g. Type Inference, etc. – Optimizations and Transformations

e.g. Dataflow/Control-flow Analysis

– Program Verificatione.g. Dijkstra’s Weakest Precondition Methods

Dynamic Analysis

• Based upon one or more runs of the program on given inputs

• Useful for: – Performance Analysis – Dynamic Slicing – Program Debugging

Static Analysis Techniques

• Type Inference– Check or infer types for program expressions

• Data Flow Analysis – Analyze variable and other dependencies• Program Slicing

– Construct reduced program WRT variables of interest

• Model checking– Check temporal properties of programs

• Theorem proving– Use logical deduction to prove facts

References

Hiralal Agrawal and Joseph Horgan, Dynamic Program Slicing, ACM SIGPLAN Conf. on Programming Language Design and Implementation; also in SIGPLAN Notices, 25(6): 246-256, 1990

H. Agrawal, Richard A. DeMillo, Eugene H. Spafford: Dynamic Slicing in the Presence of Unconstrained Pointers.  Proceedings of Symposium on Testing, Analysis, and Verification, 1991: 60-73

Frank Tip,  A Survey on Program Slicing Techniques, Journal of Programming Languages, (3):121-189, 1995

Mark Weiser: Program Slicing.  IEEE Transactions on Software Engineering. 10(4): 352-357 (1984)

Static and Dynamic Program

Slicing

Static Program Slicing

• Computing a reduced program with respect to a criterion: <stmt, vars>

• Helps understand dependencies in programs and helps program debugging

• Other applications: • software testing• software maintenance• parallelization

#define YES 1#define NO 0main() {

int c, nl, nw, nc, inword;inword = NO;nl = 0; nw = 0; nc = 0;c = getchar();while (c != EOF) {

nc = nc + 1;if (c == ‘\n’) nl = nl + 1;if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO;else if (inword == NO) {

inword = YES;nw = nw + 1;}

c = getchar();}

printf(“%d \n”, nl);printf(“%d \n”, nw);printf(“%d \n”, nc);}

Example: Char, Line, and Word Counter

while (c != EOF)

nc = nc + 1

if (c == ‘\n’)

if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’)

nl = nl + 1

if (inword == NO)

inword = YESnw = nw + 1

inword = NO; nl = 0; nw = 0; nc = 0;c = getchar();

TRUE

TRUE

TRUE

TRUE

printf(“%d \n”, nl);printf(“%d \n”, nw);printf(“%d \n”, nc);

inword == NO

c = getchar();

#define YES 1#define NO 0main() {

int c, nw, inword;inword = NO;nw = 0; c = getchar();while (c != EOF) {

if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO;else if (inword == NO) {

inword = YES;nw = nw + 1;}

c = getchar();}

printf(“%d \n”, nw);}

Program Slice: Word Counter

#define YES 1#define NO 0main() {

int c, nl;nl = 0; c = getchar();while (c != EOF) {

if (c == ‘\n’) nl = nl + 1;c = getchar();}

printf(“%d \n”, nl);}

Program Slice: Line Counter

#define YES 1#define NO 0main() {

int c, nc;nc = 0;c = getchar();while (c != EOF) {

nc = nc + 1;c = getchar();}

printf(“%d \n”, nc);}

Program Slice: Character Counter

Slicing OO Programs: Example

Ohm’s Lawclass component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; }}

class parallel extends component {attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}

Slice WRT Resistance

class parallel extends component {attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}

class parallel extends component {attributes component [ ] C; constraints (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}

Slice WRT Resistance

class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; }}

class component { attributes Real R; constraints

constructor component(R1) { R = R1; }}

Static Slicing Classification

• Forward vs Backward• Intra vs Inter Procedural• Procedural vs OO Languages

OO Slicing is a good topic for presentation.

Slicing is based upon Dataflow Analysis, and hence will examine this topic first.

Data Flow Analysis

• Compiler does data flow analysis for various reasons: detect common subexpressions, loop invariant operations, uninitialized variables, etc.

• Two forms of data flow analysis:– Forward Flow– Backward Flow

• Characterized by Data Flow Constraints

Examples of Data Flow Analysis

• Forward Flow– Reaching Definitions (U)– Available Expressions (ח)

• Backward Flow– Live Variables (U)– Very Busy Expressions (Π)

Summary of DF Analyses

Forward

Backward

ReachingDefinitions

Available Expressio

ns

Live Variables

Very BusyExpressio

ns

U (LFP) Π (GFP)

KILL(B) and GEN(B) sets

x := y + 1; z := w + x; v := z + u; x := x + v;

d1:d2:d3:d4:

B

KILL(B) = {d5} GEN(B) = {d2,d3,d4}

IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;}

KILL(B) eliminates each definition whose variable is re-assigned within B.

GEN(B) adds (the last) definition for each variable that is assigned in B.

Reaching Definitions

OUT(B) = (IN(B) – KILL(B)) U GEN(B)

B

… …

OUT(B)

IN(B)

IN(B) = U {OUT(P) | P B in the graph}

Illustrating the equations

x := y + 1; z := w + x; v := z + u; x := x + v;

d1:d2:d3:d4:

B

KILL(B) = {d5} GEN(B) = {d2,d3,d4}

IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;}

OUT(B) = IN(B) – KILL(B) U GEN(B)= {d2,d3,d4,d6,d7,d8}

Least Fixed Point

Theorem: Every Monotonic Function on a Finite Lattice has a Least Fixed Point.

For Reaching Definitions, the lattice of interest is P(S), the powerset of S, the set of all definition points, ordered by the ≤ (subset or equal) relation.

Note that S is finite.

The least upper bound and greatest lower bound are set union and set intersection respectively.

More on Monotonicity

• OUT(B) = (IN(B) – KILL(B)) U GEN(B)• U is monotonic in both arguments• X – Y is monotonic in X but not Y• Since KILL(B) is a constant for each B,

its use does not violate monotonicity

Fixed point iteration will converge only if the functions are monotonic. Note: The composition of monotonic functions is monotonic.

Example: Non-Monotonic Function

x = not(y)y = not(x)

Boolean Lattice: <{T,F}, F ≤ T, and, or>

There are two fixed points:

1. x = T, y = F2. x = F, y = T

No unique solution!

Sketch of Algorithm:Least Fixed Point Iteration

• Forall basic blocks B Do { IN(B) := {}; OUT(B) := GEN(B)

}• While no more changes Do {

Forall B Do { IN(B) := U { OUT(P) | P B in the

graph}; } Forall B Do {

OUT(B) := IN(B) – KILL(B) U GEN(B);

} }

Control Flow Graph

From Aho & Ullman, Principles of Compiler Design, 1977

KILL(B) and GEN(B)

{d1, d2}

{d3}

{d4}

{d5}

{}

{d3,d4,d5}

{d1}

{d2,d5}

{d2,d4}

{}

Initialize: IN(B) = {} and OUT(B) = GEN(B)

{d1, d2}

{d3}

{d4}

{d5}

{}

Iteration 1: IN(B) = U {OUT(P) | P B}

{d3}

{d1, d2}

{d3}

{d4}

{d4,d5}

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1, d2}

{d2, d3}

{d3, d4}

{d5}

{d4,d5}

IN(B) = U {OUT(P) | P B}

{d2,d3}

{d1, d2, d4, d5}

{d2, d3}

{d3, d4}

{d3, d4,d5}

Iteration 2:

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1,d2}

{d2,d3,d4,d5}

{d3,d4}

{d3,d5}

{d3,d4,d5}

IN(B) = U {OUT(P) | P B}

{d2,d3,d4,d5}

{d1,d2,d3,d4,d5}

{d2,d3,d4,d5}

{d3, d4}

{d3, d4,d5}

Iteration 3:

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1,d2}

{d2,d3,d4,d5}

{d3, d4}

{d3, d5}

{d3,d4,d5}

IN(B) = U {OUT(P) | P B}

{d2,d3,d4,d5}

{d1,d2,d3,d4,d5}

{d2,d3,d4,d5}

{d3,d4}

{d3,d4,d5}

Iteration 4:

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1,d2}

{d2,d3,d4,d5}

{d3, d4}

{d3, d5}

{d3,d4,d5}

Uses of Reaching Definitions

• Uninitialized Variables: Add a dummy ass’t for all variables at start of the program and check where they “reach”.

• Loop Invariant Operations: An expr ‘X op Y’ in a loop is invariant if all definitions for X and Y are outside the loop.

• Static Program Slicing: We will examine the technique in more detail in the next class.

Analysis is Approximate

From Aho & Ullman, Principles of Compiler Design, 1977

this statement will not ever be executed

A := 2 and A := 3 reach point p

Algorithm Efficiencies

• Can represent sets by bit vectors, so that U and Π become logical \/ and /\.

• The number of iterations bounded by number of nodes in graph.

• By visiting the nodes B1, …, Bk in “depth-first order” the number iterations can be minimized. In practice, the number <= 5.

(Reverse) Depth-first Traversal Order

TraversalSequence:

IN(B1)OUT(B1)

IN(B2)OUT(B2)

IN(B3)OUT(B3)

IN(B10)OUT(B10)

The path of “back edges”

10 7 4 3

determinesnumber

of iterations

Global Common Subexpressions

… … …

:= X * Y := X * Y:= X * Y

… …

p:= X * Y

X and Ydo not change

here

Global Common Subexpressions

… … …

T := X * Y T := X * YT := X * Y

… …

p:= T

X and Ydo not change

here

Available Expression

X op Y is said to be ‘available’ at a point p if every path from the start of the program to p evaluates X op Y and after the last such evaluation prior to p, there are no subsequent assignments to X or Y.

OUT(B) = (IN(B) – KILLe(B)) U GENe(B)

IN(B) = Π {OUT(P) | P B in the graph}

Algorithmic Sketch:Greatest Fixed Point

ComputationForall basic blocks B except initial block Do {

IN(B) := E (set of all exprs in program);

OUT(B) := E – KILLe(B) }While no more changes Do {

Forall B Do { IN(B) := Π { OUT(P) | P B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILLe(B) U GENe(B); }

}

Greatest Fixed Point Iteration

Theorem: Every Monotonic Function on a Finite Lattice has a Greatest Fixed Point.

For Available Expressions, the lattice of interest here is the P(S), the powerset of S, the set of all expressions appearing in the program ordered by the ≤ (subset or equal) relation.

Note that S is finite.

The least upper bound and greatest lower bound are set union and intersection respectively.

Live Variables

A variable X is live at p if X will be referenced in some path starting from p to the end of the program

IN(B) = (OUT(B) – DEF(B)) U USE(B)

OUT(B) = U {IN(S) | B S in the graph}

DEF(B) = variables that are assigned in B before they are used

USE(B) = variables that are used in B before any assignment to them in B

Live Variable Analysis

• Example of a Backward Flow Analysis.• Useful in register

allocation/deallocation• The role of IN and OUT are reversed

compared with reaching definitions and available expressions

• This is a least fixed point iteration due to the use of the U in defining OUT(B).

Very Busy Expressions

X op Y is said to be ‘very busy’ at a point p if every path from p encounters X op Y before any assignment to X or Y.

DEF(B) = expressions X op Y in B in which X or Y is defined beforecomputing X op Y USE(B) = expressions X op Y in B in which neither X nor Y is defined beforecomputing X op Y

IN(B) = (OUT(B) – DEFvb(B)) U USEvb(B)

OUT(B) = Π {IN(P) | P B in the graph}

Code Hoisting• Very Busy expressions are useful

in “code hoisting”• Example of backward flow

analysis. p

A := B op C D := B op C

:= A := A := D := D

B and C do notchange here

After Code Hoisting

p

:= T := T := T := T

T := B op C

Assumes that B op C does not

Programming with Partial Orders and Lattices

Terms and Exprs

LUB and GLB are basic operations

Pattern Matching with Sets

Program Flow Analysis:Reaching Definitions

Program Flow Analysis:Very Busy Expressions

Note: E is the set of all expressions in the program being analyzed.

Conditional Clauses:Shortest Distance

Conditional Clauses:Shortest Distance

function short/total

top related