symbolic execution amal khalil & juergen dingel cisc836: models in software development:...
TRANSCRIPT
Symbolic Execution
Amal Khalil & Juergen Dingel
CISC836: Models in Software Development: Methods, Techniques, and Tools
Winter 2015
Outline
• Overview of Classical Symbolic Execution– How it works– Application of symbolic execution– Challenges of symbolic execution
• Modern Symbolic Execution Techniques: Combining concrete and symbolic executions– Concolic Testing– Execution Generated Testing (EGT)
• KLEE Demo
2
Motivation
• Testing is a practical way for verifying programs.
• Manual testing is difficult and it requires knowledge of code and constant maintenance.
• Random testing is easy to perform but it is ineffective.– It does not guarantee a full coverage of all program paths.
• Symbolic execution can systematically explore a large number of program paths.– It is commonly used to derive the testing process and hence achieving
higher path coverage.
3
Symbolic Execution• A program analysis technique that allows the execution of programs in a
parametric way using symbolic inputs to derive precise characterizations of their properties and their execution paths.
• Firstly introduced in the 70’s by Lori A. Clarke [1976] & James C. King [1976] for program testing.
• Since 2003, a lot of research efforts has been devoted to improve the effectiveness, the efficiency and the applicability of the traditional technique [Yang et al. 2014].
• Examples of Symbolic Execution Tools:– jCUTE, JPF (Java)– KLEE (LLVM IR for C/C++)– Pex (.NET Framework )
4
How does Symbolic Execution work?
• The main idea is to substitute program inputs with symbolic values and then execute the program parametrically such that:
– The values of all program variables are computed as symbolic expressions over the symbolic input values;
– The execution can proceed along any feasible path.
5
How does Symbolic Execution work?
• The result from the symbolic execution of a program is a tree-based structure called symbolic execution tree (SET).– The nodes of a SET represent the symbolic program states and the
edges represent the transitions between these states.– Each program symbolic state consists of the set of program variables
and their symbolic valuations, a program location, and a path constraint (PC) which is the the conjunction of all the logical constraints collected over the program variables to reach that program location.• Decision procedures and SMT solvers are used to check the satisfiablity of each
path constraint (PC).• The set of path constraints computed by symbolic execution is used to enable
various analysis, verification, and testing tasks.
– The paths of a SET characterize all the distinct execution paths of a program.
6
Constraints, Decision Procedures, and SMT Solvers
• Constraints– X > Y Λ Y+X ≤ 10 (X, Y are called free variables)– A solution of the constraint is a set of assignments, one for each free
variable that makes the constraint satisfiable.– {X = 3, Y=2} is a solution but {X = 6, Y=5} is not.– Types of constraints
• Linear constraint (e.g., X > Y Λ Y+X ≤ 10)• Non-linear constraint (e.g., X * Y < 100, X % 3 Λ Y > 10, and (X >> 3) < Y)• Use of function symbols (e.g., f(X)> 10 Λ (forall a. f(a) = a + 10))
• A decision procedure is a tool that can decide if a constraint is satisfiable.– In general, checking constraint satisfiability is undecidable.• A constraint solver is a tool that finds satisfying assignments for a
constraint, if it is satisfiable.Note: This page is taken from Saswat Anand’s slides on Symbolic Execution, 2009. http://www.cc.gatech.edu/~
harrold/6340/cs6340_fall2009/Slides/SymExClass-09.pdf
7
int foo (int x, int y){1: if (x > y)2: x = x - y;3: else4: x = y - x;5: if (x > 0)6: x++;7: else8: x--;9: return x; }
Loc: 1x: X, y: YPC: true
Loc: 4x: X, y: YPC: X<=Y
1: if(x>y) - else
Loc: 5x: Y-X, y: YPC: X<=Y
4: x = y - x;
Loc: 6x: X-Y, y: Y
PC: X>Y^X-Y>0
5: if(x>0) - then
Loc: 8x: X-Y, y: Y
PC: X>Y^X-Y<=0
5: if(x>0) - else
Loc: 6x: Y-X, y: Y
PC: X<=Y^Y-X>0
5: if(x>0) - then
Loc: 8x: Y-X, y: Y
PC: X<=Y^Y-X<=0
5: if(x>0) - else
Loc: 5x: X-Y, y: YPC: X>Y
2: x = x - y;
Loc: 2x: X, y: YPC: X>Y
1: if(x>y) - then
Loc: 9x: X-Y+1, y: Y
PC: X>Y^X-Y>0
6: x++;
Loc: 9x: Y-X+1, y: Y
PC: X<=Y^Y-X>0
6: x++;
Loc: 9x: Y-X-1, y: Y
PC: X<=Y^Y-X<=0
8: x--;Unsatisfiable PC >> Infeasible path
Example #1
Path: 1, 2, 5, 6, 9Test inputs: x: 7, y: 5
Path: 1, 4, 5, 6, 9Test inputs: x: 3, y: 9
Path: 1, 4, 5, 8, 9Test inputs: x: 1, y: 1
8
int foo (int x, int y){1: if (x > y)2: x = x - y;3: else4: x = y - x;5: if (x >= 0)6: x++;7: else8: x--;9: return x; }
Loc: 1x: X, y: YPC: true
Loc: 4x: X, y: YPC: X<=Y
1: if(x>y) - else
Loc: 5x: Y-X, y: YPC: X<=Y
4: x = y - x;
Loc: 6x: X-Y, y: Y
PC: X>Y^X-Y>=0
5: if(x>0) - then
Loc: 8x: X-Y, y: Y
PC: X>Y^X-Y<0
5: if(x>0) - else
Loc: 6x: Y-X, y: Y
PC: X<=Y^Y-X>=0
5: if(x>0) - then
Loc: 8x: Y-X, y: Y
PC: X<=Y^Y-X<0
5: if(x>0) - else
Loc: 5x: X-Y, y: YPC: X>Y
2: x = x - y;
Loc: 2x: X, y: YPC: X>Y
1: if(x>y) - then
Loc: 9x: X-Y+1, y: Y
PC: X>Y^X-Y>=0
6: x++;
Loc: 9x: Y-X+1, y: Y
PC: X<=Y^Y-X>=0
6: x++;Unsatisfiable PC >> Infeasible path
Example #1
“Dead Code”
9
Unsatisfiable PC >> Infeasible path
Applications of Symbolic Execution
• Test case generation• Infeasible paths detection• Invariants checking• Bug findings• Programs equivalence checking• Regression analysis• Others
10
SS3 - Loc: 3N: N1, sum: 0
PC: N1>0
2: while (N>0) - true
SS1 - Loc: 1N: N1
PC: true
SS2 - Loc: 2N: N1, sum: 0
PC: true
1: sum = 0;
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
4: N = sym_input();
void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }
Example #2 - [Cadar & Sen 2013]
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
3: sum = sum + N;
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
2: while (N>0) - true
11
void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }
Example #2 - [Cadar & Sen 2013]
SS3 - Loc: 3N: N1, sum: 0
PC: N1>0
2: while (N>0) - true
SS1 - Loc: 1N: N1
PC: true
SS2 - Loc: 2N: N1, sum: 0
PC: true
1: sum = 0;
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
4: N = sym_input();
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
3: sum = sum + N;
2: while (N>0) - true
12
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
2: while (N>0) - true
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
SS8 - Loc: 2N: N3, sum: N1+N2
PC: N1>0^N2>0
4: N = sym_input();
void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }
Example #2 - [Cadar & Sen 2013]
SS7 - Loc: 4N: N2, sum: N1+N1
PC: N1>0^N2>0
3: sum = sum + N;
SS9 - Loc: 3N: N3, sum: N1+N2
PC: N1>0^N2>0^N3>0
2: while (N>0) - true
4: N = sym_input();
13
void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }
Example #2 - [Cadar & Sen 2013]
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
2: while (N>0) - true
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
SS7 - Loc: 4N: N2, sum: N1+N1
PC: N1>0^N2>0
SS8 - Loc: 2N: N3, sum: N1+N2
PC: N1>0^N2>0
4: N = sym_input();
SS9 - Loc: 3N: N3, sum: N1+N2
PC: N1>0^N2>0^N3>0
3: sum = sum + N;
2: while (N>0) - true
4: N = sym_input();
14
SS9 - Loc: 3N: N3, sum: N1+N2
PC: N1>0^N2>0^N3>0
2: while (N>0) - true
SS7 - Loc: 4N: N2, sum: N1+N1
PC: N1>0^N2>0
SS8 - Loc: 2N: N3, sum: N1+N2
PC: N1>0^N2>0
SS11- Loc: 2N: N4, sum: N1+N2+N3PC: N1>0^N2>0^N3>0
4: N = sym_input();
void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }
Example #2 - [Cadar & Sen 2013]
SS10 - Loc: 4N: N3, sum: N1+N2+N3PC: N1>0^N2>0^N3>0
3: sum = sum + N;
4: N = sym_input();
SS12 - Loc: 3N: N4, sum: N1+N2+N3
PC: N1>0^N2>0^N3>0^N4>0
2: while (N>0) - true
…
15
Challenges of Symbolic Execution
• Path explosion problem– The number of feasible paths in a program grows exponentially with
the size of the program and can be even infinite for programs with unbounded loops & recursion.
– Proposed solutions:• Set upper bound for the number of iterations;• Summarize loop effects;• Use some abstraction criteria (e.g., subsumption) for pruning redundant
paths and reducing the state space;• Use heuristics for path finding to achieve some user-defined coverage
criteria;• Divide a program into independent parts and run the symbolic execution
for each part in parallel.
16
SS3 - Loc: 3N: N1, sum: 0
PC: N1>0
2: while (N>0) - true
Loc: 5N: N1, sum: 0
PC: N1<=0
2: while (N>0) - false
SS1 - Loc: 1N: N1
PC: true
SS2 - Loc: 2N: N1, sum: 0
PC: true
1: sum = 0;
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
4: N = sym_input();
void testme_inf(int N) {1: int sum = 0;2: while (N > 0) {3: sum = sum + N;4: N = sym_input();5: } }
Example #2 - [Cadar & Sen 2013]
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
3: sum = sum + N;
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
2: while (N>0) - true
Loc: 5N: N2, sum: N1
PC: N1>0^N2<=0
2: while (N>0) - false
Solution #1:Set max-depth = 2
17
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
2: while (N>0) - true
SS9 - Loc: 5N: N2, sum: N1
PC: N1>0^N2<=0
2: while (N>0) - false
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
SS7 - Loc: 4N: N2, sum: N1+N1
PC: N1>0^N2>0
SS8 - Loc: 2N: N3, sum: N1+N2
PC: N1>0^N2>0
4: N = sym_input();
Example #2 - [Cadar & Sen 2013]
3: sum = sum + N;
SS9 - Loc: 3N: N3, sum: N1+N2
PC: N1>0^N2>0^N3>0
2: while (N>0) - true
4: N = sym_input();
⊆
Subsumed by
(N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …}
Concretization of SS5
(N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …}
Concretization of SS8
Solution #2: Subsumption
18
Example #2 - [Cadar & Sen 2013]
Solution #2: Subsumption
SS10 - Loc: 5N: N1, sum: 0
PC: N1<=0
2: while (N>0) - false
SS6 - Loc: 3N: N2, sum: N1PC: N1>0^N2>0
2: while (N>0) - true
SS9 - Loc: 5N: N2, sum: N1
PC: N1>0^N2<=0
2: while (N>0) - false
SS4 - Loc: 4N: N1, sum: N1
PC: N1>0
SS5 - Loc: 2N: N2, sum: N1
PC: N1>0
SS7 - Loc: 4N: N2, sum: N1+N1
PC: N1>0^N2>0
SS8 - Loc: 2N: N3, sum: N1+N2
PC: N1>0^N2>0
4: N = sym_input();
4: N = sym_input();
Subsumed by
SS3 - Loc: 3N: N1, sum: 0
PC: N1>03: sum = sum + N;
⊆(N, sum) = {([-∞, + ∞], 1), ([-∞, + ∞], 2), ([-∞, + ∞], 3), …}
Concretization of SS5
(N, sum) = {([-∞, + ∞], 2), ([-∞, + ∞], 3), ([-∞, + ∞], 4), …}
Concretization of SS8
19
Challenges of Symbolic Execution
• Inability to solve very complex and non-linear constraints– Proposed solutions:
• Use concretization (e.g., Concolic Symbolic Execution);• Perform constraints simplification.
• Inability to handle external library calls– Proposed solutions:
• Use concretization (e.g., Concolic Symbolic Execution);• Provide models to simulate/abstract the behavior of such external
modules.
20
• Complex constraints
void testme(int x, int y) {1: int z = (y*y)%50;2: if (z == x) {3: if (x > y+10) {4: abort(); //ERROR5: }6: } } Loc: 1
x: X, y: YPC: true
Loc: 2x: X, y: Y, z: (Y*Y)%50
PC: true
1: int z = (y*y)%50;
SE cannot handle symbolic value of z!>> Stuck!
• External system/library calls
void testme(int x, int y) {1: int z = F(y);2: if (z == x) {3: if (x > y+10) {4: abort(); //ERROR5: }6: } } Loc: 1
x: X, y: YPC: true
Loc: 2x: X, y: Y, z: F(Y)
PC: true
1: int z = F(y);
Example #3 - [Cadar & Sen 2013]
21
Concolic Symbolic Execution
• Novelty: Simultaneous Concrete & Symbolic Executions– DART: Directed Automated Random Testing [Godefroid
et al. 2005]– Execution-Generated Testing (EGT) [Cadar et al. 2005]
• “Replace symbolic expression by concrete value when symbolic expression becomes
unmanageable (e.g. non-linear).”
22
Overview of DART• Example #3 - [Cadar & Sen 2013]
• Random testing alone is ineffective.– Probability of reaching abort() is extremely low!
• Solution?– Combine random testing & symbolic execution (twofold benefit).
• Improve test coverage of random testing• Alleviate some of the imprecision in SE
/* simple driver exercising testme() */int main(){ int inp1 = random(); int inp2 = random(); testme(inp1, inp2); return 0;}
void testme(int x, int y) {1: int z = 2 * y;2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }
23
Test inputs: x = 22, y = 7Path: 1, 2, 5
Loc: 1x: X, y: YPC: true
Loc: 2x: X, y: Y, z: 2*Y
PC: true
1: int z = 2*y;
Loc: 5x: X, y: Y, z: 2*Y
PC: 2*Y!=X
2: if(z==x) - false
Solve: 2*Y==XSolution: x=2, y=1
Test inputs: x = 2, y = 1Path: 1, 2, 3, 5
Loc: 3x: X, y: Y, z: 2*Y
PC: 2*Y==X
2: if(z==x) - true
Loc: 5x: X, y: Y, z: 2*Y
PC: 2*Y==X^X<=Y+10
3: if(x>y+10) - false
Solve: 2*Y==X^X>Y+10Solution: x=30, y=15
Test inputs: x = 30, y = 15Path: 1, 2, 3, 4
Loc: 4x: X, y: Y, z: 2*Y
PC: 2*Y==X^X>Y+10
3: if(x>y+10) - true
void testme(int x, int y) {1: int z = 2 * y;2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }
Example #3 - [Cadar & Sen 2013]
Abort>>ERROR
24
Test inputs: x = 22, y = 7Path: 1, 2, 5
Loc: 1x: X, y: YPC: true
Loc: 2x: X, y: 7, z: 49
PC: true
1: int z = (y*y)%50;
Loc: 5x: X, y: 7, z: 49
PC: 49!=X
2: if(z==x) - false
Solve: 49==XSolution: x=49, y=7
Test inputs: x = 49, y = 7Path: 1, 2, 3, 7
Loc: 3x: X, y: 7, z: 49
PC: 49==X
2: if(z==x) - true
Loc: 5x: X, y: 7, z: 49
PC: 49==X^X>17
3: if(x>y+10) - true
void testme(int x, int y) {1: int z = (y*y)%50; //int z = F(y);2: if (z == x) {3: if (x > y + 10) 4: abort(); //ERROR5: } }
Example #3 - [Cadar & Sen 2013]
Abort>>ERROR
25
KLEE Demo
KLEE LLVM Execution Engine [Cadar et al 2008]
https://klee.github.io/
26
References• [1] King, James C, "Symbolic execution and program testing", Communications of the ACM 19,
7 (1976), pp. 385--394.• [2] Clarke, Lori A. "A system to generate test data and symbolically execute programs."
Software Engineering, IEEE Transactions on 3 (1976): 215-222.• [3] Khurshid, Sarfraz, Corina S. Păsăreanu, and Willem Visser. "Generalized symbolic execution
for model checking and testing." Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 2003. 553-568.
• [4] Godefroid, Patrice, Nils Klarlund, and Koushik Sen. "DART: directed automated random testing." ACM Sigplan Notices. Vol. 40. No. 6. ACM, 2005.
• [5] Cadar, Cristian, and Dawson Engler. "Execution generated test cases: How to make systems code crash itself." Model Checking Software. Springer Berlin Heidelberg, 2005. 2-23.
• [6] Cadar, Cristian, Daniel Dunbar, and Dawson R. Engler. "KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs." OSDI. Vol. 8. 2008.
• [7] Cadar, Cristian, and Koushik Sen. "Symbolic execution for software testing: three decades later." Communications of the ACM 56.2 (2013): 82-90.
• [8] Yang, Guowei, et al. "Directed incremental symbolic execution." ACM Transactions on Software Engineering and Methodology (TOSEM) 24.1 (2014): 3.
27