formal methods program slicing & dataflow analysis february 2015

84
Formal Methods Program Slicing & Dataflow Analysis February 2015

Upload: solomon-strickland

Post on 30-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Formal Methods Program Slicing & Dataflow Analysis February 2015

Formal Methods

Program Slicing & Dataflow Analysis

February 2015

Page 2: Formal Methods Program Slicing & Dataflow Analysis February 2015

Program Analysis

• Automatic analysis of a program

• Two main objectives– Correctness: program verification– Efficiency: code optimization (compilers)– Security: understand code vulnerabilities

• Two types of analysis– Static Analysis:

• do not execute program; reason over all inputs– Dynamic Analysis:

• Execute program; reason over specific input

Page 3: Formal Methods Program Slicing & Dataflow Analysis February 2015

Static Analysis

• Based upon source code analysis• Useful for: – Semantic Analysis of Programs

e.g. Type Inference, etc. – Optimizations and Transformations

e.g. Dataflow/Control-flow Analysis

– Program Verificatione.g. Dijkstra’s Weakest Precondition Methods

Page 4: Formal Methods Program Slicing & Dataflow Analysis February 2015

Dynamic Analysis

• Based upon one or more runs of the program on given inputs

• Useful for: – Performance Analysis – Dynamic Slicing – Program Debugging

Page 5: Formal Methods Program Slicing & Dataflow Analysis February 2015

Static Analysis Techniques

• Type Inference– Check or infer types for program expressions

• Data Flow Analysis – Analyze variable and other dependencies• Program Slicing

– Construct reduced program WRT variables of interest

• Model checking– Check temporal properties of programs

• Theorem proving– Use logical deduction to prove facts

Page 6: Formal Methods Program Slicing & Dataflow Analysis February 2015

References

Hiralal Agrawal and Joseph Horgan, Dynamic Program Slicing, ACM SIGPLAN Conf. on Programming Language Design and Implementation; also in SIGPLAN Notices, 25(6): 246-256, 1990

H. Agrawal, Richard A. DeMillo, Eugene H. Spafford: Dynamic Slicing in the Presence of Unconstrained Pointers.  Proceedings of Symposium on Testing, Analysis, and Verification, 1991: 60-73

Frank Tip,  A Survey on Program Slicing Techniques, Journal of Programming Languages, (3):121-189, 1995

Mark Weiser: Program Slicing.  IEEE Transactions on Software Engineering. 10(4): 352-357 (1984)

Page 7: Formal Methods Program Slicing & Dataflow Analysis February 2015

Static and Dynamic Program

Slicing

Page 8: Formal Methods Program Slicing & Dataflow Analysis February 2015

Static Program Slicing

• Computing a reduced program with respect to a criterion: <stmt, vars>

• Helps understand dependencies in programs and helps program debugging

• Other applications: • software testing• software maintenance• parallelization

Page 9: Formal Methods Program Slicing & Dataflow Analysis February 2015

#define YES 1#define NO 0main() {

int c, nl, nw, nc, inword;inword = NO;nl = 0; nw = 0; nc = 0;c = getchar();while (c != EOF) {

nc = nc + 1;if (c == ‘\n’) nl = nl + 1;if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO;else if (inword == NO) {

inword = YES;nw = nw + 1;}

c = getchar();}

printf(“%d \n”, nl);printf(“%d \n”, nw);printf(“%d \n”, nc);}

Example: Char, Line, and Word Counter

Page 10: Formal Methods Program Slicing & Dataflow Analysis February 2015

while (c != EOF)

nc = nc + 1

if (c == ‘\n’)

if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’)

nl = nl + 1

if (inword == NO)

inword = YESnw = nw + 1

inword = NO; nl = 0; nw = 0; nc = 0;c = getchar();

TRUE

TRUE

TRUE

TRUE

printf(“%d \n”, nl);printf(“%d \n”, nw);printf(“%d \n”, nc);

inword == NO

c = getchar();

Page 11: Formal Methods Program Slicing & Dataflow Analysis February 2015

#define YES 1#define NO 0main() {

int c, nw, inword;inword = NO;nw = 0; c = getchar();while (c != EOF) {

if (c == ‘ ‘ || c == ‘\n’ || c == ‘\t’) inword = NO;else if (inword == NO) {

inword = YES;nw = nw + 1;}

c = getchar();}

printf(“%d \n”, nw);}

Program Slice: Word Counter

Page 12: Formal Methods Program Slicing & Dataflow Analysis February 2015

#define YES 1#define NO 0main() {

int c, nl;nl = 0; c = getchar();while (c != EOF) {

if (c == ‘\n’) nl = nl + 1;c = getchar();}

printf(“%d \n”, nl);}

Program Slice: Line Counter

Page 13: Formal Methods Program Slicing & Dataflow Analysis February 2015

#define YES 1#define NO 0main() {

int c, nc;nc = 0;c = getchar();while (c != EOF) {

nc = nc + 1;c = getchar();}

printf(“%d \n”, nc);}

Program Slice: Character Counter

Page 14: Formal Methods Program Slicing & Dataflow Analysis February 2015

Slicing OO Programs: Example

Ohm’s Lawclass component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; }}

class parallel extends component {attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}

Page 15: Formal Methods Program Slicing & Dataflow Analysis February 2015

Slice WRT Resistance

class parallel extends component {attributes component [ ] C; constraints forall X in C: (X.V = V); (sum X in C: X.I) = I; (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}

class parallel extends component {attributes component [ ] C; constraints (sum X in C: 1/X.R) = 1/R; constructor parallel(P) { C = P; }}

Page 16: Formal Methods Program Slicing & Dataflow Analysis February 2015

Slice WRT Resistance

class component { attributes Real V, I, R; constraints V = I * R; constructor component(V1, I1, R1) { V = V1; I = I1; R = R1; }}

class component { attributes Real R; constraints

constructor component(R1) { R = R1; }}

Page 17: Formal Methods Program Slicing & Dataflow Analysis February 2015

Static Slicing Classification

• Forward vs Backward• Intra vs Inter Procedural• Procedural vs OO Languages

OO Slicing is a good topic for presentation.

Slicing is based upon Dataflow Analysis, and hence will examine this topic first.

Page 18: Formal Methods Program Slicing & Dataflow Analysis February 2015

Data Flow Analysis

• Compiler does data flow analysis for various reasons: detect common subexpressions, loop invariant operations, uninitialized variables, etc.

• Two forms of data flow analysis:– Forward Flow– Backward Flow

• Characterized by Data Flow Constraints

Page 19: Formal Methods Program Slicing & Dataflow Analysis February 2015

Examples of Data Flow Analysis

• Forward Flow– Reaching Definitions (U)– Available Expressions (ח)

• Backward Flow– Live Variables (U)– Very Busy Expressions (Π)

Page 20: Formal Methods Program Slicing & Dataflow Analysis February 2015

Summary of DF Analyses

Forward

Backward

ReachingDefinitions

Available Expressio

ns

Live Variables

Very BusyExpressio

ns

U (LFP) Π (GFP)

Page 21: Formal Methods Program Slicing & Dataflow Analysis February 2015

KILL(B) and GEN(B) sets

x := y + 1; z := w + x; v := z + u; x := x + v;

d1:d2:d3:d4:

B

KILL(B) = {d5} GEN(B) = {d2,d3,d4}

IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;}

KILL(B) eliminates each definition whose variable is re-assigned within B.

GEN(B) adds (the last) definition for each variable that is assigned in B.

Page 22: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 23: Formal Methods Program Slicing & Dataflow Analysis February 2015

Reaching Definitions

OUT(B) = (IN(B) – KILL(B)) U GEN(B)

B

… …

OUT(B)

IN(B)

IN(B) = U {OUT(P) | P B in the graph}

Page 24: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 25: Formal Methods Program Slicing & Dataflow Analysis February 2015

Illustrating the equations

x := y + 1; z := w + x; v := z + u; x := x + v;

d1:d2:d3:d4:

B

KILL(B) = {d5} GEN(B) = {d2,d3,d4}

IN = { d5: v := 10; d6: y:= 20; d7: w := 30; d8: u := 40;}

OUT(B) = IN(B) – KILL(B) U GEN(B)= {d2,d3,d4,d6,d7,d8}

Page 26: Formal Methods Program Slicing & Dataflow Analysis February 2015

Least Fixed Point

Theorem: Every Monotonic Function on a Finite Lattice has a Least Fixed Point.

For Reaching Definitions, the lattice of interest is P(S), the powerset of S, the set of all definition points, ordered by the ≤ (subset or equal) relation.

Note that S is finite.

The least upper bound and greatest lower bound are set union and set intersection respectively.

Page 27: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 28: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 29: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 30: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 31: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 32: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 33: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 34: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 35: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 36: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 37: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 38: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 39: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 40: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 41: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 42: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 43: Formal Methods Program Slicing & Dataflow Analysis February 2015

More on Monotonicity

• OUT(B) = (IN(B) – KILL(B)) U GEN(B)• U is monotonic in both arguments• X – Y is monotonic in X but not Y• Since KILL(B) is a constant for each B,

its use does not violate monotonicity

Fixed point iteration will converge only if the functions are monotonic. Note: The composition of monotonic functions is monotonic.

Page 44: Formal Methods Program Slicing & Dataflow Analysis February 2015

Example: Non-Monotonic Function

x = not(y)y = not(x)

Boolean Lattice: <{T,F}, F ≤ T, and, or>

There are two fixed points:

1. x = T, y = F2. x = F, y = T

No unique solution!

Page 45: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 46: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 47: Formal Methods Program Slicing & Dataflow Analysis February 2015

Sketch of Algorithm:Least Fixed Point Iteration

• Forall basic blocks B Do { IN(B) := {}; OUT(B) := GEN(B)

}• While no more changes Do {

Forall B Do { IN(B) := U { OUT(P) | P B in the

graph}; } Forall B Do {

OUT(B) := IN(B) – KILL(B) U GEN(B);

} }

Page 48: Formal Methods Program Slicing & Dataflow Analysis February 2015

Control Flow Graph

From Aho & Ullman, Principles of Compiler Design, 1977

Page 49: Formal Methods Program Slicing & Dataflow Analysis February 2015

KILL(B) and GEN(B)

{d1, d2}

{d3}

{d4}

{d5}

{}

{d3,d4,d5}

{d1}

{d2,d5}

{d2,d4}

{}

Page 50: Formal Methods Program Slicing & Dataflow Analysis February 2015

Initialize: IN(B) = {} and OUT(B) = GEN(B)

{d1, d2}

{d3}

{d4}

{d5}

{}

Page 51: Formal Methods Program Slicing & Dataflow Analysis February 2015

Iteration 1: IN(B) = U {OUT(P) | P B}

{d3}

{d1, d2}

{d3}

{d4}

{d4,d5}

Page 52: Formal Methods Program Slicing & Dataflow Analysis February 2015

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1, d2}

{d2, d3}

{d3, d4}

{d5}

{d4,d5}

Page 53: Formal Methods Program Slicing & Dataflow Analysis February 2015

IN(B) = U {OUT(P) | P B}

{d2,d3}

{d1, d2, d4, d5}

{d2, d3}

{d3, d4}

{d3, d4,d5}

Iteration 2:

Page 54: Formal Methods Program Slicing & Dataflow Analysis February 2015

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1,d2}

{d2,d3,d4,d5}

{d3,d4}

{d3,d5}

{d3,d4,d5}

Page 55: Formal Methods Program Slicing & Dataflow Analysis February 2015

IN(B) = U {OUT(P) | P B}

{d2,d3,d4,d5}

{d1,d2,d3,d4,d5}

{d2,d3,d4,d5}

{d3, d4}

{d3, d4,d5}

Iteration 3:

Page 56: Formal Methods Program Slicing & Dataflow Analysis February 2015

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1,d2}

{d2,d3,d4,d5}

{d3, d4}

{d3, d5}

{d3,d4,d5}

Page 57: Formal Methods Program Slicing & Dataflow Analysis February 2015

IN(B) = U {OUT(P) | P B}

{d2,d3,d4,d5}

{d1,d2,d3,d4,d5}

{d2,d3,d4,d5}

{d3,d4}

{d3,d4,d5}

Iteration 4:

Page 58: Formal Methods Program Slicing & Dataflow Analysis February 2015

OUT(B) = IN(B) – KILL(B) U GEN(B)

{d1,d2}

{d2,d3,d4,d5}

{d3, d4}

{d3, d5}

{d3,d4,d5}

Page 59: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 60: Formal Methods Program Slicing & Dataflow Analysis February 2015

Uses of Reaching Definitions

• Uninitialized Variables: Add a dummy ass’t for all variables at start of the program and check where they “reach”.

• Loop Invariant Operations: An expr ‘X op Y’ in a loop is invariant if all definitions for X and Y are outside the loop.

• Static Program Slicing: We will examine the technique in more detail in the next class.

Page 61: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 62: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 63: Formal Methods Program Slicing & Dataflow Analysis February 2015

Analysis is Approximate

From Aho & Ullman, Principles of Compiler Design, 1977

this statement will not ever be executed

A := 2 and A := 3 reach point p

Page 64: Formal Methods Program Slicing & Dataflow Analysis February 2015

Algorithm Efficiencies

• Can represent sets by bit vectors, so that U and Π become logical \/ and /\.

• The number of iterations bounded by number of nodes in graph.

• By visiting the nodes B1, …, Bk in “depth-first order” the number iterations can be minimized. In practice, the number <= 5.

Page 65: Formal Methods Program Slicing & Dataflow Analysis February 2015

(Reverse) Depth-first Traversal Order

TraversalSequence:

IN(B1)OUT(B1)

IN(B2)OUT(B2)

IN(B3)OUT(B3)

IN(B10)OUT(B10)

The path of “back edges”

10 7 4 3

determinesnumber

of iterations

Page 66: Formal Methods Program Slicing & Dataflow Analysis February 2015

Global Common Subexpressions

… … …

:= X * Y := X * Y:= X * Y

… …

p:= X * Y

X and Ydo not change

here

Page 67: Formal Methods Program Slicing & Dataflow Analysis February 2015

Global Common Subexpressions

… … …

T := X * Y T := X * YT := X * Y

… …

p:= T

X and Ydo not change

here

Page 68: Formal Methods Program Slicing & Dataflow Analysis February 2015

Available Expression

X op Y is said to be ‘available’ at a point p if every path from the start of the program to p evaluates X op Y and after the last such evaluation prior to p, there are no subsequent assignments to X or Y.

OUT(B) = (IN(B) – KILLe(B)) U GENe(B)

IN(B) = Π {OUT(P) | P B in the graph}

Page 69: Formal Methods Program Slicing & Dataflow Analysis February 2015

Algorithmic Sketch:Greatest Fixed Point

ComputationForall basic blocks B except initial block Do {

IN(B) := E (set of all exprs in program);

OUT(B) := E – KILLe(B) }While no more changes Do {

Forall B Do { IN(B) := Π { OUT(P) | P B in the graph}; } Forall B Do { OUT(B) := IN(B) – KILLe(B) U GENe(B); }

}

Page 70: Formal Methods Program Slicing & Dataflow Analysis February 2015

Greatest Fixed Point Iteration

Theorem: Every Monotonic Function on a Finite Lattice has a Greatest Fixed Point.

For Available Expressions, the lattice of interest here is the P(S), the powerset of S, the set of all expressions appearing in the program ordered by the ≤ (subset or equal) relation.

Note that S is finite.

The least upper bound and greatest lower bound are set union and intersection respectively.

Page 71: Formal Methods Program Slicing & Dataflow Analysis February 2015

Live Variables

A variable X is live at p if X will be referenced in some path starting from p to the end of the program

IN(B) = (OUT(B) – DEF(B)) U USE(B)

OUT(B) = U {IN(S) | B S in the graph}

DEF(B) = variables that are assigned in B before they are used

USE(B) = variables that are used in B before any assignment to them in B

Page 72: Formal Methods Program Slicing & Dataflow Analysis February 2015

Live Variable Analysis

• Example of a Backward Flow Analysis.• Useful in register

allocation/deallocation• The role of IN and OUT are reversed

compared with reaching definitions and available expressions

• This is a least fixed point iteration due to the use of the U in defining OUT(B).

Page 73: Formal Methods Program Slicing & Dataflow Analysis February 2015
Page 74: Formal Methods Program Slicing & Dataflow Analysis February 2015

Very Busy Expressions

X op Y is said to be ‘very busy’ at a point p if every path from p encounters X op Y before any assignment to X or Y.

DEF(B) = expressions X op Y in B in which X or Y is defined beforecomputing X op Y USE(B) = expressions X op Y in B in which neither X nor Y is defined beforecomputing X op Y

IN(B) = (OUT(B) – DEFvb(B)) U USEvb(B)

OUT(B) = Π {IN(P) | P B in the graph}

Page 75: Formal Methods Program Slicing & Dataflow Analysis February 2015

Code Hoisting• Very Busy expressions are useful

in “code hoisting”• Example of backward flow

analysis. p

A := B op C D := B op C

:= A := A := D := D

B and C do notchange here

Page 76: Formal Methods Program Slicing & Dataflow Analysis February 2015

After Code Hoisting

p

:= T := T := T := T

T := B op C

Assumes that B op C does not

Page 77: Formal Methods Program Slicing & Dataflow Analysis February 2015

Programming with Partial Orders and Lattices

Page 78: Formal Methods Program Slicing & Dataflow Analysis February 2015

Terms and Exprs

Page 79: Formal Methods Program Slicing & Dataflow Analysis February 2015

LUB and GLB are basic operations

Page 80: Formal Methods Program Slicing & Dataflow Analysis February 2015

Pattern Matching with Sets

Page 81: Formal Methods Program Slicing & Dataflow Analysis February 2015

Program Flow Analysis:Reaching Definitions

Page 82: Formal Methods Program Slicing & Dataflow Analysis February 2015

Program Flow Analysis:Very Busy Expressions

Note: E is the set of all expressions in the program being analyzed.

Page 83: Formal Methods Program Slicing & Dataflow Analysis February 2015

Conditional Clauses:Shortest Distance

Page 84: Formal Methods Program Slicing & Dataflow Analysis February 2015

Conditional Clauses:Shortest Distance

function short/total