program representations xiangyu zhang. cs590z software defect analysis program representations ...

42
Program Representations Xiangyu Zhang

Post on 21-Dec-2015

235 views

Category:

Documents


0 download

TRANSCRIPT

Program Representations

Xiangyu Zhang

CS590Z Software Defect Analysis

Program Representations

Static program representations• Abstract syntax tree;• Control flow graph;• Program dependence graph;• Call graph;• Points-to relations.

Dynamic program representations• Control flow trace, address trace and value trace;• Dynamic dependence graph;• Whole execution trace;

CS590Z Software Defect Analysis

(1) Abstract syntax tree

An abstract syntax tree (AST) is a finite, labeled, directed tree, where the internal nodes are labeled by operators, and the leaf nodes represent the operands of the operators.

Program chipping.

CS590Z Software Defect Analysis

(2) Control Flow Graph (CFG)

Consists of basic blocks and edges• A maximal sequence of consecutive instructions such that

inside the basic block an execution can only proceed from one instruction to the next (SESE).

• Edges represent potential flow of control between BBs.• Program path.

B1

B2 B3

B4

CFG = <V, E, Entry, Exit>

V = Vertices, nodes (BBs)

E = Edges, potential flow of control E V × V

Entry, Exit V, unique entry and exit

CS590Z Software Defect Analysis

(2) An Example of CFG

1: sum=02: i=13: while ( i<N) do 4: i=i+15: sum=sum+i endwhile6: print(sum)

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: sum=sum+i

6: print (sum)

• BB- A maximal sequence of consecutive instructions such that inside the basic block an execution can only proceed from one instruction to the next (SESE).

CS590Z Software Defect Analysis

(3) Program Dependence Graph (PDG)– Data Dependence

S data depends T if there exists a control flow path from T to S and a variable is defined at T and then used at S.

123

456

789

10

CS590Z Software Defect Analysis

(3) PDG – Control Dependence

X dominates Y if every possible program path from the entry to Y has to pass X.

• Strict dominance, dominator, immediate dominator.

1: sum=02: i=13: while ( i<N) do 4: i=i+15: sum=sum+i endwhile6: print(sum)

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: sum=sum+i

6: print (sum)DOM(6)={1,2,3,6} IDOM(6)=3

CS590Z Software Defect Analysis

(3) PDG – Control Dependence

X post-dominates Y if every possible program path from Y to EXIT has to pass X.

• Strict post-dominance, post-dominator, immediate post-dominance.

1: sum=02: i=13: while ( i<N) do 4: i=i+15: sum=sum+i endwhile6: print(sum)

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: sum=sum+i

6: print (sum)PDOM(5)={3,5,6} IPDOM(5)=3

CS590Z Software Defect Analysis

(3) PDG – Control Dependence

Intuitively, Y is control-dependent on X iff X directly determines whether Y executes (statements inside one branch of a predicate are usually control dependent on the predicate)

• there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y

• X is not strictly post-dominated by Y

Sorin Lerner

X

Y

Not post-dominated by Y

Every node is post-dominated by Y

CS590Z Software Defect Analysis

(3) PDG – Control Dependence

1: sum=02: i=13: while ( i<N) do 4: i=i+15: sum=sum+i endwhile6: print(sum)

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: sum=sum+i

6: print (sum)CD(5)=3

CD(3)=3, tricky!

A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes

• there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y

• X is not strictly post-dominated by Y

CS590Z Software Defect Analysis

(3) PDG – Control Dependence is not Syntactically Explicit

1: sum=02: i=13: while ( i<N) do 4: i=i+15: if (i%2==0) 6: continue;7: sum=sum+i endwhile8: print(sum)

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: if (i%2==0)

8: print (sum)

7: print (sum)

A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes

• there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y

• X is not strictly post-dominated by Y

CS590Z Software Defect Analysis

(3) PDG – Control Dependence is Tricky!

A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes

• there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y

• X is not strictly post-dominated by Y

Can a statement control depends on two predicates?

CS590Z Software Defect Analysis

(3) PDG – Control Dependence is Tricky!

A node (basic block) Y is control-dependent on another X iff X directly determines whether Y executes

• there exists a path from X to Y s.t. every node in the path other than X and Y is post-dominated by Y

• X is not strictly post-dominated by Y

1: if ( p1 || p2 ) 2: s1;3: s2;

1: ? p1

Can one statement control depends on two predicates?

1: ? p2

2: s1

3: s2

What if ?1: if ( p1 && p2 ) 2: s1;3: s2;

Interprocedural CD, CD in case of exception,…

CS590Z Software Defect Analysis

(3) PDG

A program dependence graph consists of control dependence graph and data dependence graph

Why it is so important to software reliability?• In debugging, what could possibly induce the failure?• In security

p=getpassword( );…if (p==“zhang”) { send (m);}

CS590Z Software Defect Analysis

(4) Points-to Graph

Aliases: two expressions that denote the same memory location.

Aliases are introduced by:• pointers• call-by-reference• array indexing• C unions

CS590Z Software Defect Analysis

(4) Points-to Graph

Aliases: two expressions that denote the same memory location.

Aliases are introduced by:• pointers• call-by-reference• array indexing• C unions

CS590Z Software Defect Analysis

(4) Why Do We Need Points-to Graphs

Debugging

x.lock();...y.unlock(); // same object as x?

Security

F(x,y) { x.f=password; … print (y.f);}

F(a,a); disaster!

CS590Z Software Defect Analysis

(4) Points-to Graph

Points-to Graph• at a program point, compute a set of pairs of the form p -> x,

where p MAY/MUST points to x.

m(p) {r = new C();p->f = r;t = new C();if (…) q=p;r->f = t;

}

r

CS590Z Software Defect Analysis

(4) Points-to Graph

Points-to Graph• at a program point, compute a set of pairs of the form p->x,

where p MAY/MUST points to x.

m(p) {r = new C();p->f = r;t = new C();if (…) q=p;r->f = t;

}

pr

f

CS590Z Software Defect Analysis

(4) Points-to Graph

Points-to Graph• at a program point, compute a set of pairs of the form p->x,

where p MAY/MUST points to x.

m(p) {r = new C();p->f = r;t = new C();if (…) q=p;r->f = t;

}

pr

f

t

CS590Z Software Defect Analysis

(4) Points-to Graph

Points-to Graph• at a program point, compute a set of pairs of the form p->x,

where p MAY/MUST points to x.

m(p) {r = new C();p->f = r;t = new C();if (…) q=p;r->f = t;

}

p

q

r

f

t

CS590Z Software Defect Analysis

(4) Points-to Graph

Points-to Graph• at a program point, compute a set of pairs of the form p->x,

where p MAY/MUST points to x.

m(p) {r = new C();p->f = r;t = new C();if (…) q=p;r->f = t;

}

p

q

r

f

t

f

p->f->f and t are aliases

CS590Z Software Defect Analysis

(5) Call Graph

Call graph• nodes are procedures• edges are calls

Hard cases for building call graph• calls through function pointers

Can the password acquired at A be leaked at G?

CS590Z Software Defect Analysis

How to acquire and use these representations?

Will be covered by later lectures.

CS590Z Software Defect Analysis

Program Representations

Static program representations Abstract syntax tree; Control flow graph; Program dependence graph; Call graph; Points-to relations.

Dynamic program representations• Control flow trace;• Address trace, Value trace;• Dynamic dependence graph;• Whole execution trace;

CS590Z Software Defect Analysis

(1) Control Flow Trace

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: sum=sum+i

6: print (sum)

11: sum=0

31: while ( i<N) do

51: sum=sum+i32: while ( i<N) do 42: i=i+1

33: while ( i<N) do 61: print (sum)

N=2:

<… xi, …>

<… 804805737, 804805a29, …>

21: i=1

41: i=i+1

52: sum=sum+i

x is a program point, xi is an execution point

CS590Z Software Defect Analysis

(1) Control Flow Trace

3: while ( i<N) do

1: sum=02: i=1

4: i=i+15: sum=sum+i

6: print (sum)

11: sum=0 i=1

31: while ( i<N) do

41: i=i+1 sum=sum+i

32: while ( i<N) do

42: i=i+1 sum=sum+i

33: while ( i<N) do

61: print (sum)

N=2:

A More Compact CFT: < T, T, F >

11 21 31 41 51 32 42 52 33 6111 31 41 32 42 33 61

CS590Z Software Defect Analysis

(2) Dynamic Dependence Graph (DDG)Input: N=2

51: for i=1 to N do61: if (i%2==0) then

81: a=a+1

11: z=021: a=031: b=2

41: p=&b

1: z=02: a=03: b=24: p=&b5: for i = 1 to N do6: if ( i %2 == 0) then7: p=&a endif endfor8: a=a+19: z=2*(*p)10: print(z)

CS590Z Software Defect Analysis

(2) Dynamic Dependence Graph (DDG)Input: N=2

51: for i=1 to N do61: if (i%2==0) then

71: p=&a

81: a=a+191: z=2*(*p)

101: print(z)

11: z=021: a=031: b=2

41: p=&b

52: for I=1 to N do

62: if (i%2==0) then

82: a=a+192: z=2*(*p)

1: z=02: a=03: b=24: p=&b5: for i = 1 to N do6: if ( i %2 == 0) then7: p=&a endif endfor8: a=a+19: z=2*(*p)10: print(z)

One use has only one definition at runtime; One statement instance control depends on

only one predicate instance.

CS590Z Software Defect Analysis

(3) Whole Execution Trace

5:for i=1 to N

6:if (i%2==0) then

7: p=&a

8: a=a+1

9: z=2*(*p)

10: print(z)

T F

1: z=0

2: a=0

3: b=2

4: p=&b

T

Input: N=2

11: z=0

21: a=0

31: b=2

41: p=&b

51: for i = 1 to N do

61: if ( i %2 == 0) then

81: a=a+1

91: z=2*(*p)

52: for i = 1 to N do

62: if ( i %2 == 0) then

71: p=&a

82: a=a+1

92: z=2*(*p)

101: print(z)

T1

2

3

4

5

6

7

8

9

10

11

12

13

14

(3,8)(2,7)

(7,12)

(11,13)

(13,14)

(4,8)

(12,13)

(5,6)(9,10)

(10,11)

(5,7)(9,12)

(5,8)(9,13)

1

2

3

4

5,9

6,10

11

7,12

8,13

14

F

&b

&a

0

0

2

1,2

F,T

1,2

4,4

4

CS590Z Software Defect Analysis

(3) Whole Execution Trace

< tn >S1 < v1 >

< (ts, tn) >< (ts, tn) >

Multiple streams of numbers.

CS590Z Software Defect Analysis

Program Representations

Static program representations• Abstract syntax tree;• Control flow graph;• Program dependence graph;• Call graph;• Points-to relations.

Dynamic program representations• Control flow trace, address trace and value trace;• Dynamic dependence graph;• Whole execution trace;

CS590Z Software Defect Analysis

What is a slice?

S: …. = f (v)

Slice of v at S is the set of statements involved in computing v’s value at S.

[Mark Weiser, 1982] • Data dependence• Control dependence

Void main ( ) { int I=0; int sum=0; while (I<N) { sum=add(sum,I); I=add(I,1); } printf (“sum=%d\n”,sum); printf(“I=%d\n”,I);

CS590Z Software Defect Analysis

How to do slicing?

Static analysis• Input insensitive• May analysis

Dependence Graph

Characteristics• Very fast• Very imprecise

b=0

a=2

1 <=i <=N

if ((i++)%2= =1)

a=a+1 b=a*2

z=a+b

print(z)

T F

T

F

CS590Z Software Defect Analysis

Why is a static slice imprecise?

S1:a=… S2:b=…

L1:…=*p

• Use of Pointers – static alias analysis is very imprecise

• Use of function pointers – hard to know which function is called, conservative expectation results in imprecision

S1:x=… S2:x=…

L1:…=x

• All possible program paths

CS590Z Software Defect Analysis

Dynamic Slicing

Korel and Laski, 1988

Dynamic slicing makes use of all information about a particular execution of a program and computes the slice based on an execution history (trace)

• Trace consists control flow trace and memory reference trace

A dynamic slice query is a triple • <Var, Input , Execution Point>

Smaller, more precise, more helpful to the user

CS590Z Software Defect Analysis

Dynamic Slicing Example -background

1: b=02: a=23: for i= 1 to N do4: if ((i++)%2==1) then5: a = a+1 else6: b = a*2 endif done7: z = a+b8: print(z)

For input N=2,11: b=0 [b=0]

21: a=2

31: for i = 1 to N do [i=1]

41: if ( (i++) %2 == 1) then [i=1]

51: a=a+1 [a=3]

32: for i=1 to N do [i=2]

42: if ( i%2 == 1) then [i=2]

61: b=a*2 [b=6]

71: z=a+b [z=9]

81: print(z) [z=9]

CS590Z Software Defect Analysis

Issues about Dynamic Slicing

Precision – perfect

Running history – very big ( GB )

Algorithm to compute dynamic slice - slow and very high space requirement.

CS590Z Software Defect Analysis

Backward vs. Forward

1 main( )

2 {

3 int i, sum;

4 sum = 0;

5 i = 1;

6 while(i <= 10)

7 {

8 sum = sum + 1;

9 ++ i;

10 }

11 Cout<< sum;

12 Cout<< i;

13 }

An Example Program & its forward slice w.r.t. <3, sum>

CS590Z Software Defect Analysis

Comments

Want to know more?• Frank Tip’s survey paper (1995)

Static slicing is very useful for static analysis• Code transformation, program understanding, etc.• Points-to analysis is the key challenge• Not as useful in reliability as dynamic slicing

Dynamic slicing• Precise

good for defect analysis.• Solution space is much larger.• There exist hybrid techniques.

CS590Z Software Defect Analysis

Efficiency

How are dynamic slices computed? • Execution traces

control flow trace -- dynamic control dependences memory reference trace -- dynamic data dependences

• Construct a dynamic dependence graph• Traverse dynamic dependence graph to compute slices

CS590Z Software Defect Analysis

How to Detect Dynamic Dependence

Dynamic Data Dependence• Shadow space (SS)

Addr Abstract State

Virtual Space Shadow Space

[r2]

s1x: ST r1, [r2] SS(r2)=s1x

s1x

s2y: LD [r1], r2

[r1]

s2y SS(r1)=s1x

Dynamic control dependence is more tricky!