symbolic execution amal khalil & juergen dingel cisc836: models in software development:...

Post on 14-Dec-2015

225 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Symbolic Execution

Amal Khalil & Juergen Dingel

CISC836: Models in Software Development: Methods, Techniques, and Tools

Winter 2015

Outline

• Overview of Classical Symbolic Execution– How it works– Application of symbolic execution– Challenges of symbolic execution

• Modern Symbolic Execution Techniques: Combining concrete and symbolic executions– Concolic Testing– Execution Generated Testing (EGT)

• KLEE Demo

2

Motivation

• Testing is a practical way for verifying programs.

• Manual testing is difficult and it requires knowledge of code and constant maintenance.

• Random testing is easy to perform but it is ineffective.– It does not guarantee a full coverage of all program paths.

• Symbolic execution can systematically explore a large number of program paths.– It is commonly used to derive the testing process and hence achieving

higher path coverage.

3

Symbolic Execution• A program analysis technique that allows the execution of programs in a

parametric way using symbolic inputs to derive precise characterizations of their properties and their execution paths.

• Firstly introduced in the 70’s by Lori A. Clarke [1976] & James C. King [1976] for program testing.

• Since 2003, a lot of research efforts has been devoted to improve the effectiveness, the efficiency and the applicability of the traditional technique [Yang et al. 2014].

• Examples of Symbolic Execution Tools:– jCUTE, JPF (Java)– KLEE (LLVM IR for C/C++)– Pex (.NET Framework )

4

How does Symbolic Execution work?

• The main idea is to substitute program inputs with symbolic values and then execute the program parametrically such that:

– The values of all program variables are computed as symbolic expressions over the symbolic input values;

– The execution can proceed along any feasible path.

5

How does Symbolic Execution work?

• The result from the symbolic execution of a program is a tree-based structure called symbolic execution tree (SET).– The nodes of a SET represent the symbolic program states and the

edges represent the transitions between these states.– Each program symbolic state consists of the set of program variables

and their symbolic valuations, a program location, and a path constraint (PC) which is the the conjunction of all the logical constraints collected over the program variables to reach that program location.• Decision procedures and SMT solvers are used to check the satisfiablity of each

path constraint (PC).• The set of path constraints computed by symbolic execution is used to enable

various analysis, verification, and testing tasks.

– The paths of a SET characterize all the distinct execution paths of a program.

6

Constraints, Decision Procedures, and SMT Solvers

• Constraints– X > Y Λ Y+X ≤ 10 (X, Y are called free variables)– A solution of the constraint is a set of assignments, one for each free

variable that makes the constraint satisfiable.– {X = 3, Y=2} is a solution but {X = 6, Y=5} is not.– Types of constraints

• Linear constraint (e.g., X > Y Λ Y+X ≤ 10)• Non-linear constraint (e.g., X * Y < 100, X % 3 Λ Y > 10, and (X >> 3) < Y)• Use of function symbols (e.g., f(X)> 10 Λ (forall a. f(a) = a + 10))

• A decision procedure is a tool that can decide if a constraint is satisfiable.– In general, checking constraint satisfiability is undecidable.• A constraint solver is a tool that finds satisfying assignments for a

constraint, if it is satisfiable.Note: This page is taken from Saswat Anand’s slides on Symbolic Execution, 2009. http://www.cc.gatech.edu/~

harrold/6340/cs6340_fall2009/Slides/SymExClass-09.pdf

7

int foo (int x, int y){1: if (x > y)2: x = x - y;3: else4: x = y - x;5: if (x > 0)6: x++;7: else8: x--;9: return x; }

Loc: 1x: X, y: YPC: true

Loc: 4x: X, y: YPC: X<=Y

1: if(x>y) - else

Loc: 5x: Y-X, y: YPC: X<=Y

4: x = y - x;

Loc: 6x: X-Y, y: Y

PC: X>Y^X-Y>0

5: if(x>0) - then

Loc: 8x: X-Y, y: Y

PC: X>Y^X-Y<=0

5: if(x>0) - else

Loc: 6x: Y-X, y: Y

PC: X<=Y^Y-X>0

5: if(x>0) - then

Loc: 8x: Y-X, y: Y

PC: X<=Y^Y-X<=0

5: if(x>0) - else

Loc: 5x: X-Y, y: YPC: X>Y

2: x = x - y;

Loc: 2x: X, y: YPC: X>Y

1: if(x>y) - then

Loc: 9x: X-Y+1, y: Y

PC: X>Y^X-Y>0

6: x++;

Loc: 9x: Y-X+1, y: Y

PC: X<=Y^Y-X>0

6: x++;

Loc: 9x: Y-X-1, y: Y

PC: X<=Y^Y-X<=0

8: x--;Unsatisfiable PC >> Infeasible path

Example #1

Path: 1, 2, 5, 6, 9Test inputs: x: 7, y: 5

Path: 1, 4, 5, 6, 9Test inputs: x: 3, y: 9

Path: 1, 4, 5, 8, 9Test inputs: x: 1, y: 1

8

int foo (int x, int y){1: if (x > y)2: x = x - y;3: else4: x = y - x;5: if (x >= 0)6: x++;7: else8: x--;9: return x; }

Loc: 1x: X, y: YPC: true

Loc: 4x: X, y: YPC: X<=Y

1: if(x>y) - else

Loc: 5x: Y-X, y: YPC: X<=Y

4: x = y - x;

Loc: 6x: X-Y, y: Y

PC: X>Y^X-Y>=0

5: if(x>0) - then

Loc: 8x: X-Y, y: Y

PC: X>Y^X-Y<0

5: if(x>0) - else

Loc: 6x: Y-X, y: Y

PC: X<=Y^Y-X>=0

5: if(x>0) - then

Loc: 8x: Y-X, y: Y

PC: X<=Y^Y-X<0

5: if(x>0) - else

Loc: 5x: X-Y, y: YPC: X>Y

2: x = x - y;

Loc: 2x: X, y: YPC: X>Y

1: if(x>y) - then

Loc: 9x: X-Y+1, y: Y

PC: X>Y^X-Y>=0

6: x++;

Loc: 9x: Y-X+1, y: Y

PC: X<=Y^Y-X>=0

6: x++;Unsatisfiable PC >> Infeasible path

Example #1

“Dead Code”

9

Unsatisfiable PC >> Infeasible path

Applications of Symbolic Execution

• Test case generation• Infeasible paths detection• Invariants checking• Bug findings• Programs equivalence checking• Regression analysis• Others

10

SS3 - Loc: 3N: N1, sum: 0

PC: N1>0

2: while (N>0) - true

SS1 - Loc: 1N: N1

PC: true

SS2 - Loc: 2N: N1, sum: 0

PC: true

1: sum = 0;

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

3: sum = sum + N;

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

11

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS3 - Loc: 3N: N1, sum: 0

PC: N1>0

2: while (N>0) - true

SS1 - Loc: 1N: N1

PC: true

SS2 - Loc: 2N: N1, sum: 0

PC: true

1: sum = 0;

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

4: N = sym_input();

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

3: sum = sum + N;

2: while (N>0) - true

12

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

3: sum = sum + N;

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

2: while (N>0) - true

4: N = sym_input();

13

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

3: sum = sum + N;

2: while (N>0) - true

4: N = sym_input();

14

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

2: while (N>0) - true

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

SS11- Loc: 2N: N4, sum: N1+N2+N3PC: N1>0^N2>0^N3>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS10 - Loc: 4N: N3, sum: N1+N2+N3PC: N1>0^N2>0^N3>0

3: sum = sum + N;

4: N = sym_input();

SS12 - Loc: 3N: N4, sum: N1+N2+N3

PC: N1>0^N2>0^N3>0^N4>0

2: while (N>0) - true

15

Challenges of Symbolic Execution

• Path explosion problem– The number of feasible paths in a program grows exponentially with

the size of the program and can be even infinite for programs with unbounded loops & recursion.

– Proposed solutions:• Set upper bound for the number of iterations;• Summarize loop effects;• Use some abstraction criteria (e.g., subsumption) for pruning redundant

paths and reducing the state space;• Use heuristics for path finding to achieve some user-defined coverage

criteria;• Divide a program into independent parts and run the symbolic execution

for each part in parallel.

16

SS3 - Loc: 3N: N1, sum: 0

PC: N1>0

2: while (N>0) - true

Loc: 5N: N1, sum: 0

PC: N1<=0

2: while (N>0) - false

SS1 - Loc: 1N: N1

PC: true

SS2 - Loc: 2N: N1, sum: 0

PC: true

1: sum = 0;

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

4: N = sym_input();

void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }

Example #2 - [Cadar & Sen 2013]

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

3: sum = sum + N;

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

Loc: 5N: N2, sum: N1

PC: N1>0^N2<=0

2: while (N>0) - false

Solution #1:Set max-depth = 2

17

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS9 - Loc: 5N: N2, sum: N1

PC: N1>0^N2<=0

2: while (N>0) - false

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

Example #2 - [Cadar & Sen 2013]

3: sum = sum + N;

SS9 - Loc: 3N: N3, sum: N1+N2

PC: N1>0^N2>0^N3>0

2: while (N>0) - true

4: N = sym_input();

Subsumed by

(N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …}

Concretization of SS5

(N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …}

Concretization of SS8

Solution #2: Subsumption

18

Example #2 - [Cadar & Sen 2013]

Solution #2: Subsumption

SS10 - Loc: 5N: N1, sum: 0

PC: N1<=0

2: while (N>0) - false

SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0

2: while (N>0) - true

SS9 - Loc: 5N: N2, sum: N1

PC: N1>0^N2<=0

2: while (N>0) - false

SS4 - Loc: 4N: N1, sum: N1

PC: N1>0

SS5 - Loc: 2N: N2, sum: N1

PC: N1>0

SS7 - Loc: 4N: N2, sum: N1+N1

PC: N1>0^N2>0

SS8 - Loc: 2N: N3, sum: N1+N2

PC: N1>0^N2>0

4: N = sym_input();

4: N = sym_input();

Subsumed by

SS3 - Loc: 3N: N1, sum: 0

PC: N1>03: sum = sum + N;

⊆(N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …}

Concretization of SS5

(N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …}

Concretization of SS8

19

Challenges of Symbolic Execution

• Inability to solve very complex and non-linear constraints– Proposed solutions:

• Use concretization (e.g., Concolic Symbolic Execution);• Perform constraints simplification.

• Inability to handle external library calls– Proposed solutions:

• Use concretization (e.g., Concolic Symbolic Execution);• Provide models to simulate/abstract the behavior of such external

modules.

20

• Complex constraints

void testme(int x, int y) {1: int z = (y*y)%50;2: if (z == x) {3: if (x > y+10) {4: abort(); //ERROR5: }6: } } Loc: 1

x: X, y: YPC: true

Loc: 2x: X, y: Y, z: (Y*Y)%50

PC: true

1: int z = (y*y)%50;

SE cannot handle symbolic value of z!>> Stuck!

• External system/library calls

void testme(int x, int y) {1: int z = F(y);2: if (z == x) {3: if (x > y+10) {4: abort(); //ERROR5: }6: } } Loc: 1

x: X, y: YPC: true

Loc: 2x: X, y: Y, z: F(Y)

PC: true

1: int z = F(y);

Example #3 - [Cadar & Sen 2013]

21

Concolic Symbolic Execution

• Novelty: Simultaneous Concrete & Symbolic Executions– DART: Directed Automated Random Testing [Godefroid

et al. 2005]– Execution-Generated Testing (EGT) [Cadar et al. 2005]

• “Replace symbolic expression by concrete value when symbolic expression becomes

unmanageable (e.g. non-linear).”

22

Overview of DART• Example #3 - [Cadar & Sen 2013]

• Random testing alone is ineffective.– Probability of reaching abort() is extremely low!

• Solution?– Combine random testing & symbolic execution (twofold benefit).

• Improve test coverage of random testing• Alleviate some of the imprecision in SE

/* simple driver exercising testme() */int main(){ int inp1 = random(); int inp2 = random(); testme(inp1, inp2); return 0;}

void testme(int x, int y) {1: int z = 2 * y;2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }

23

Test inputs: x = 22, y = 7Path: 1, 2, 5

Loc: 1x: X, y: YPC: true

Loc: 2x: X, y: Y, z: 2*Y

PC: true

1: int z = 2*y;

Loc: 5x: X, y: Y, z: 2*Y

PC: 2*Y!=X

2: if(z==x) - false

Solve: 2*Y==XSolution: x=2, y=1

Test inputs: x = 2, y = 1Path: 1, 2, 3, 5

Loc: 3x: X, y: Y, z: 2*Y

PC: 2*Y==X

2: if(z==x) - true

Loc: 5x: X, y: Y, z: 2*Y

PC: 2*Y==X^X<=Y+10

3: if(x>y+10) - false

Solve: 2*Y==X^X>Y+10Solution: x=30, y=15

Test inputs: x = 30, y = 15Path: 1, 2, 3, 4

Loc: 4x: X, y: Y, z: 2*Y

PC: 2*Y==X^X>Y+10

3: if(x>y+10) - true

void testme(int x, int y) {1: int z = 2 * y;2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }

Example #3 - [Cadar & Sen 2013]

Abort>>ERROR

24

Test inputs: x = 22, y = 7Path: 1, 2, 5

Loc: 1x: X, y: YPC: true

Loc: 2x: X, y: 7, z: 49

PC: true

1: int z = (y*y)%50;

Loc: 5x: X, y: 7, z: 49

PC: 49!=X

2: if(z==x) - false

Solve: 49==XSolution: x=49, y=7

Test inputs: x = 49, y = 7Path: 1, 2, 3, 7

Loc: 3x: X, y: 7, z: 49

PC: 49==X

2: if(z==x) - true

Loc: 5x: X, y: 7, z: 49

PC: 49==X^X>17

3: if(x>y+10) - true

void testme(int x, int y) {1: int z = (y*y)%50; //int z = F(y);2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }

Example #3 - [Cadar & Sen 2013]

Abort>>ERROR

25

KLEE Demo

KLEE LLVM Execution Engine [Cadar et al 2008]

https://klee.github.io/

26

References• [1] King, James C, "Symbolic execution and program testing", Communications of the ACM 19,

7 (1976), pp. 385--394.• [2] Clarke, Lori A. "A system to generate test data and symbolically execute programs."

Software Engineering, IEEE Transactions on 3 (1976): 215-222.• [3] Khurshid, Sarfraz, Corina S. Păsăreanu, and Willem Visser. "Generalized symbolic execution

for model checking and testing." Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 2003. 553-568.

• [4] Godefroid, Patrice, Nils Klarlund, and Koushik Sen. "DART: directed automated random testing." ACM Sigplan Notices. Vol. 40. No. 6. ACM, 2005.

• [5] Cadar, Cristian, and Dawson Engler. "Execution generated test cases: How to make systems code crash itself." Model Checking Software. Springer Berlin Heidelberg, 2005. 2-23.

• [6] Cadar, Cristian, Daniel Dunbar, and Dawson R. Engler. "KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs." OSDI. Vol. 8. 2008.

• [7] Cadar, Cristian, and Koushik Sen. "Symbolic execution for software testing: three decades later." Communications of the ACM 56.2 (2013): 82-90.

• [8] Yang, Guowei, et al. "Directed incremental symbolic execution." ACM Transactions on Software Engineering and Methodology (TOSEM) 24.1 (2014): 3.

27

top related