cs4723 software validation and quality assurance lecture 11 static bug detection and verification
TRANSCRIPT
CS4723 Software
Validation and Quality Assurance
Lecture 11Static Bug Detection and
Verification
2
Static bug detection
Static bug detection is a minor approach for software quality assurance, compared with testing
Compared to testing Work for specific kinds of bugs
Sometimes not scalable
Generate false positives
Easy to start (no build, no setup, no install …)
Sometimes can guarantee the software to be free of certain kinds of bugs
No need for debugging
3
State-of-art: static bug detection
Type-specific detection (Fixed Specification and improvement is provided) Major or important type of bugs
Null pointer, memory leak, unsafe cast, injection, buffer overflow, Dynamic SQL error, racing, deadlock, dead loop, html error, UI inconsistency, i18n bugs, …
A large bunch of techniques for each kind of bugs Most of them have severe limitations preventing them
from practical usage
Specification based detection Model checking, symbolic execution, theorem proving
4
Specification
A description of the correct behavior of software
We must have formal specification to do static bug detection
Three main types of specifications Value
Temporal
Data Flow
5
Value Specification
The value (s) of one or several variable (s) must satisfy a certain constraint
Example: Final Exam Score <= 100
sortedlist(0) >= sortedlist(1)
http_url.startsWith(“http”)
Sql_query belongs to Language_SQL
6
Temporal Specification
Two events (or a series of events) must happen in a certain order
Example lock() -> unlock()
file.open() -> file.close() and file.open() -> file.read()
They are different, right?
Temporal Logic Lock() -> F(unlock())
(!read())U(open())
7
Data Flow Specification
Data from a certain source must / must not flow to a certain sink
Example: ! Contact Info -> Internet
Password -> encryption -> Internet
Data Flow Specification are mainly for security usage
8
General Specifications
Common behaviors of all software a/b -> b!=0
a.field -> a!=null
a[x] -> x<a.length()
p.malloc() -> p.free()
lock(s) -> unlock(s)
while(Condition) -> F(!Condition)
<script> xxx </script> -> ! User_input -> xxx
! Hard-coded string -> User Interface
Divide by 0
Null Pointer Reference
Buffer Overflow
Memory Leak
deadlock
Infinite Loop
XSS
I18n error
9
Checking SpecificationsBasic ways
Value Specifications Symbolic execution
Abstract Interpretation
Temporal Specification Model Checking
Data Flow Specification Graph traversal (Data Dependence Graph)
Static symbolic execution
Basic Example
y = read(); y = 2 * y; if (y <= 12) y = 3; else y = y + 1;print ("OK");
T (y=s), s is a symbolic variable for input
Here T is the condition for the statement to be executed, (y=s) is the relationship of all variables to the inputs after the statement is executed
T (y=2*s)T (y=2*s)T^y<=12 (y = 3)
T^!(y<=12) (y= 2*s + 1)
T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1)
(2*s <= 12 & y = 3) & y <= 0 Not Satisfiable
!(2*s <= 12) & (y = 2*s + 1) & y<=0 Not SatisfiableProve y > 0?
11
Static symbolic execution
Complex Example
y = read(); p = 1; while(y < 10){ y = y + 1; if y >2 p = p + 1; else p = p + 2;}print (p);
T (y=s), s is a symbolic variable for inputT (p = 1, y = s)T (p = 1, y = s)T^ s<10 (y = s + 1, p = 1)
T^!(2 < s + 1< 10) (y = s + 1, p = 2)
T^s + 1<=2 (y = s + 1, p = 3)
T^ 2<s+1<10 (y = s + 2, p = 2) | s+1<=2 (y = s + 2, p = 3)
…
Prove p > 0?
12
Abstract Interpretation
Symbolic execution tries to record all changes and relations in the memory with symbolic values Too many things to record, not scalable Usually only a small part of data is useful
Abstract Interpretation Using similar ways with symbolic execution Instead of using symbolic values, using abstract
values…
13
Abstract Interpretation
Abstract domains A map from concrete values to abstract values Example:
Integer -> +, -, 0 String -> [0…9]*, other Pointer -> null / not null
Abstract Operations +, -, *, /, concatenation … Join: when two branch merge, or a statement is
executed for the second time OP: Dom*Dom -> Dom
14
Abstract Operations
An example of integers Integer -> +, -, 0 + (+) + = + - (+) - = - + (+) - = ?
Two special abstract values in abstract domains : means all possible values : means no value
Abstract Interpretation
Complex Example
y = read(); p = 1; while(y < 10){ y = y + 1; if y >2 p = p + 1; else p = p + 2;}print (p);
p =
p > 0
p > 0
p > 0 (+) 1 -> p > 0
Prove p > 0?
p > 0
p > 0 (+) 2 -> p > 0
p > 0 (join) p > 0 -> p > 0
p > 0
p > 0 (+) 1 -> p > 0
p > 0 (+) 2 -> p > 0
It is called a fixed point!
p > 0 (join) p > 0 -> p > 0
Abstract Interpretation
Can we make sure there is a fix point?
y = read(); p = 1; while(y < 10){ y = y + 1; p = p*(-1);}print (p);
p =
p > 0
p > 0
p > 0 (*) -1 -> p < 0
Prove p != 0?
p > 0 p < 0
We should try to join p<0 and p>0
Abstract Interpretation
First trial, use the later value: Join (a, b) = b
y = read(); p = 1; while(y < 10){ y = y + 1; p = p*(-1);}print (p);
p =
p > 0
p > 0
p > 0 (*) -1 -> p < 0
Prove p != 0?
p > 0 p < 0
p < 0 (*) -1 -> p > 0
Cannot reach a fixed point!
18
Abstract Domain
How to make sure we reach a fixed point? Order all abstract values as a partial order Join operations should all be monotonic
+ --0
+ --0
>=0 <=0
0 --+
!=0
Abstract Interpretation
Can we make sure there is a fix point?
y = read(); p = 1; while(y < 10){ y = y + 1; p = p*(-1);}print (p);
p =
p > 0
p > 0
p > 0 (*) -1 -> p < 0
Prove p != 0?
p > 0 p < 0
We should make sure the value of p stays or go upOtherwise, cannot reach a fixed point…
Join: p!=0
p!=0 * (-1) -> p!=0 Join: p!=0
20
Abstract Domain
So the idea is Abstract value of a variable can only stays Or goes up until reaches
Designing appropriate domain is important Same domain for the !=0 example? No! Consider a = a(>0) + b(>0) For the first domain:
+(+)+ = +, join(+, +) = + For the second domain:
!=0(+)!=0 = , join(!=0, ) =
!=00 --+
!=0
21
Abstract Domain
Common Abstract domains Numeric values Regular expressions Sets
They are relatively easy to define order Operations are monotonic
22
Numeric Value Domains
Most widely used domains are just about ‘+’, ‘-’, ‘0’
It is also possible to have a number of value ranges
0-60- 60-90 90-100 100+
60- 0-90 60-100 90+
90- 0-100 60+
100- 0+
23
String Value Domains
Usually useful for prove string formats All URLs start with “http”? All file names end with “.php”? …
Prefix domains
abcde abcdd abce abl
abcd*
abc*
ab*
24
Set Domains
Useful for determine the possible constant values of a variable
Join represents the relation of subsume and merge
abcde abcdd abce abl
{abcde, abcdd}
{abcde, abcdd, abce}
{abcde, abcdd, abce, abl}
{abcdd, abce} {abce, abl}
{abcdd, abce, abl}
25
Checking SpecificationsBasic ways
Value Specifications Symbolic execution
Abstract Interpretation
Temporal Specification Model Checking
Data Flow Specification Graph traversal (Data Dependence Graph)
26
Model Checking
Basic idea Transform the program to an automaton
Program states are state of the automaton, and statements are transitions / edges
Checking temporal properties on the automaton by traversing it
27
Model Checking: Model Building
Basic approach: Use Control Flow Graph:
View all program states after a statement as ONE state
Use Abstract Values View all program states after a statement with
same abstract values as ONE state Use Concrete values
View all program states after a statement with same concrete values as ONE state: usually impossible
28
An example with CFG-model Checking whether a file is closed in all
casesboolean load(){ f.open(); line = f.read(); while(line!=null){ if(line.contains('key')){ f.close() return true; }else if(line.contains('value')){ f.close() } line = f.read(); } return false;}
Start
opened
new line read
!=null
key
value
none==null
f is not open
closed
closed
ret
29
An example with CFG-model Traversing the model to find contrary
examples Start
opened
new line read
!=null
key
value
none==null
f is not open
closed
closed
ret
30
An example with CFG-model Read must before close
Start
opened
new line read
!=null
key
value
none==null
f is not open
closed
closed
ret
31
Temporal Logic
The basic idea of model checking is to find a certain path in the model that violate the specification
Describe the sequential relationship among a number of events: the specification So that any specification can just be read by a
path finding tool Do not need to bother writing a path finding tool
for each proof
32
Usage of Temporal Logic
Describe the sequential relationship among a number of events
U: until PUQ means that P has to be true until Q is true
!read(f)Uopen(f) !close(f)Uopen(f)
F: Future FP means that P will be true some time in future
open(f) -> Fclose(f) close(f) -> !Fread(f)
33
Checking SpecificationsBasic ways
Value Specifications Symbolic execution
Abstract Interpretation
Temporal Specification Model Checking
Data Flow Specification Graph traversal (Data Dependence Graph)
34
Some Simple check with Graph Traversal
Check x flows to w
Check (!z used as divider)U(Z is written)
35
Program slicing for sum = 0 -> sum = 1entry:main
expression: sum=0
expression: i=1
control-point: while i<11
call-site: add
expression:sum=add$0
call-site: add
expression:i=add$1
actual-out:add$0
actual-out:add$1
actual-in:sum$0
actual-in: i$0
actual-in: i$1
entry: add
Formal-in: a Formal-in:b formal-out:add$result
expression: add$result=a+b
???
actual-in: 1
36
Sensitivity of graph traversal
Context Sensitivity Example:
x = f(x); y = f(y);
Flow Sensitivity Example:
x = 2; x = 3; y = x;
int f(int i){ return i;}
37
Problems of static bug detection
Lack of Specifications Very rare project-specific formal specification
Solutions: General specifications (for typical bugs) Mining specifications (for API-specific, project-specific
specifications)
False Positives vs. Efficiency More sensitivities -> higher cost
Path sensitivity is rarely achieved
Combination of all sensitivities -> Incomputable problems
38
State-of-practice: static bug detection
Findbugs A tool developed by researchers from UMD
Widely used in industry for code checking before commit
The idea actually comes from Lint
Lint A code style enforcing tool for C language
Find bad coding styles and raise warnings Bad naming Hard coded strings …
39
Idea: do it reversely Most static bug detection tools
Set up a specification (either from users or well-defined ones) E.g., Devisor should not be 0, null pointer should not
be referred to, the salary of a personal cannot be negative
Check all possible cases to guarantee that the specification hold
Otherwise provide counter-examples
Findbugs Detect code patterns for bugs
E.g., a = null, b = a.field; str.replace(“ ”, “”);
40
Characters of Findbugs Based on existing concrete code patterns
Check code patterns locally: only do inner-procedure analysis What are the advantages and disadvantages of
doing so?
Perform bug ranking according to the probability and potential severity of bugs Probability: the bug is likely to be true
Severity: the bug may cause severe consequence if not fixed
41
Application of Findbugs-like tools Findbugs is adopted by a number of large
companies such as Google Usually only the issues with highest
confidence/severity are reported as issues
A statistics in Google 2009: More than 4000 issues are identified, in which
1700 bugs are confirmed, and 1100 are fixed.
The software department of USAA is using PMD, an alternative of Findbugs
42
Patterns to be checked 404 bug patterns in 6 major categories
Bad Practice / Dodgy code
Correctness
Internationalization
Vulnerability / Security
Multithread correctness
Performance
43
Bad Practice / Dodgy code Hackish code, not stable and may harm future
maintenance
Examples: Equals method should not assume type of object
argument
boolean Equals(Object o){
Myclass my = (Myclass)o;
return my.id = this.id;
}
Abstract class defines covariant compareTo() method
int compareTo(Myclass obj){ … }
44
Correctness The code pattern may result in incorrect
behavior of the software
Examples: DMI: Collections should not contain themselves
List s = new …; …
if(s.contains(s)){ … }
DMI: Invocation of hashCode on an array
Int[] x = new int[10];
…
x.hashcode();
45
Internationalization A code pattern that will hard future i18n of
the software
Example: Use toUpperCase, toLowerCase on localized
strings
String s = getLocale(key);
s.toUpperCase(); Perfrom tobytes() on localized strings
String s = getLocale(key);
s.getBytes();
46
Multi-thread correctness A code pattern that may cause
incorrectness in multi-thread execution
Examples Synchronization on boxed primitive
private static Boolean inited = Boolean.FALSE;... synchronized(inited) { if (!inited) { init(); inited = Boolean.TRUE; } }...
47
Vulnerability/Security The code pattern may result in vulnerability
or security issues
Examples: SQL: A SQL query is generated from a non-constant
String
String str = “select” + bb + ” ddd” + …
server.execute(str);
This code directly writes an HTTP parameter to JSP output, which allows for a cross site scripting vulnerability
Para = request.getParameter(key);
out.print(Para);
48
Performance The code pattern may harm the performance
of the software
Examples: SBSC: Method concatenates strings using + in a loop
String s = "";for (int i = 0; i < field.length; ++i) { s = s + field[i]; }
StringBuffer buf = new StringBuffer();for (int i = 0; i < field.length; ++i) { buf.append(field[i]);}String s = buf.toString();
49
Major problem: False positives
Overall precision 5% to 10% on open source and industry
projects
Developers want to make sure they do not waste effort on a false positive
Usually more bugs than developers can fix
50
Solution: Bug ranking
Ranking bug categories Some categories are more likely to be
bugs than others How to give scores to each category?
Check large number of issues in the history of software
How large a proportion is fixed?
Raise precision to about 30% in the 25% top ranked bugs
51
Findbugs
Disadvantages Can not guarantee the software to be free of certain
bugs
Still involve many false positives
Advantages Easy to start
Scalable
Relatively less false positives
Some what like testing Becomes the most popular and practical static bug
detection techniques
52
Findbugs
Demo Install as plugin
Run Findbugs
Review issues
Review of Static Bug Detection
Specification-based static bug detection Value Specifications : Symbolic Execution,
Abstract Interpretation
Temporal Specifications: Model Checking
Data Flow Specifications: Dependence Graph, Traversing
Pattern-based static bug detection Findbugs
Bug Ranking
CS4723 Course Project
Writing Test Scripts for Android Apps
55
Description
Learn about user-interface testing and testing Android Apps
Writing test scripts for the stock app: contact manager
56
Requirements
All features of the contact manger must be covered. For example, contact manager also affects the
name shown in incoming and outgoing calls, messages.
Once the emulator is started, the test scripts should be able to run fully automatically.
57
Requirements
The test script should setup all data by itself and clean up the data at the end so that the script can be executed from time to time.
The test script should automatically open and close emulator logging system for recording test results.
58
Deliverables
Test Scripts System logs Due: Apr 28th
59
Evaluation
Covered Features (10 points) No runtime error (4 points) Running normally for multiple times (4
points) Logging (2 points)
60
Demo
Download and install Android SDK Create emulator and start emulator Download and install python Simple monkeyrunner scripts Mouse click recorder