alattin: mining alternative patterns for detecting neglected conditions suresh thummalapenta and tao...
TRANSCRIPT
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions
Suresh Thummalapenta and Tao XieDepartment of Computer Science
North Carolina State UniversityRaleigh, USA
ASE 2009
This work is supported in part by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443 and ARO grant W911NF-08-1-0105 managed by NCSU Secure Open Source Systems Initiative (SOSI)
Alattin: Motivation
2
Problem: Programming rules are often not well documented
General solution: Mine common patterns across a large
number of data points (e.g., code samples) Use common patterns as programming
rules to detect defects
3
Limited data points Existing approaches mine specifications from a few code
bases miss specifications due to lack of sufficient data points
Existing approaches produce a large number of false positives
Challenges addressed by Alattin
4
44
Code repositories Code repositories
1 2 N…
1 2mining patterns
searching miningpatterns
Code search engine e.g., Open source codeon the web
Eclipse, Linux, …
Existing approaches
Alattin approach
Often lack sufficient relevant data points (eg. API call sites)
Code repositories
Limited Data Points
5
5
Existing approaches produce a large number of false positives
One major observation: Programmers often write code in different ways for
achieving the same task Some ways are more frequent than others
Large Number of False Positives
Frequent ways
Infrequent ways
Mined Patterns
mine patterns detect violations
ViolationsFalse
Positives
6
Example: java.util.Iterator.next()
PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}
Code Sample 1
PrintEntries2(ArrayList<string> entries)
{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}
Code Example 2
Code Sample 2
Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements
7
Example: java.util.Iterator.next()
PrintEntries1(ArrayList<string> entries)
{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}
Code Sample 1
PrintEntries2(ArrayList<string> entries)
{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}
Code Sample 2
1243 code examples
Sample 1 (1218 / 1243)
Sample 2 (6/1243)
Mined Pattern from existing approaches:“boolean check on return of Iterator.hasNext before Iterator.next”
8
Example: java.util.Iterator.next()
Require more general patterns (alternative patterns): P1 or P2
P1 : boolean check on return of Iterator.hasNext before Iterator.nextP2 : boolean check on return of ArrayList.size before Iterator.next
Existing approaches cannot mine, since alternative P2 is infrequent
PrintEntries1(ArrayList<string> entries)
{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}
Code Sample 1
PrintEntries2(ArrayList<string> entries)
{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}
Code Sample 2
9
Our Solution: ImMiner Algorithm Mines alternative patterns of the form P1 or P2
Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1
1243 code examples
Sample 1 (1218 / 1243)
Sample 2 (6/1243)
P2 is frequent among code examples not supporting P1
P2 is infrequent among entire 1243 code examples
10
Alternative Patterns ImMiner mines three kinds of alternative patterns of the general form “P1 or P2”
Balanced: all alternatives (both P1 and P2) are frequent
Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^
2”
Single: only one alternative
11
ImMiner Algorithm Uses frequent-itemset [Burdick et al. ICDE 01] mining iteratively
An input database with the following APIs for Iterator.next()Input database Mapping of IDs to APIs
12
ImMiner Algorithm: Frequent AlternativesInput database
Frequent itemset mining
(min_sup 0.5)
Frequent item: 1P1: boolean-check on the return of
Iterator.hasNext() before Iterator.next()
13
ImMiner: Infrequent Alternatives of P1
Positive database (PSD)
Negative database (NSD)
Split input database into two databases: Positive and Negative
Mine patterns that are frequent in NSD and are infrequent in PSD Reason: Only such patterns serve as alternatives for P1
Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”
Alattin applies ImMiner algorithm to detect neglected conditions
14
Neglected Conditions Neglected conditions refer to
Missing conditions that check the arguments or receiver of the API call before the API call
Missing conditions that check the return or receiver of the API call after the API call
One of the primary reasons for many fatal issues security or buffer-overflow vulnerabilities [Chang et
al. ISSTA 07]
15
Alattin Approach
ApplicationUnder Analysis
Detect neglected conditions
Classes and methods
Open Source Projects on web Open Source Projects on web
1 2 N…
…Pattern
Candidates
Alternative Patterns
Violations
Extract classes and methods
reused
Phase 1: Issue queries and collect relevant code samples. Eg: “lang:java
java.util.Iterator next”Phase 2: Generate pattern candidates
Phase 3: Mine alternative patterns
Phase 4: Detect neglected conditions statically
16
Evaluation Research Questions:
Does alternative patterns exist in real applications?
How high percentage of false positives are reduced (with low or no increase of false negatives) in detected violations?
17
Subjects
Two categories of subjects: 3 Java default API libraries 3 popular open source libraries
Column “Samples”: number of code examples collected from Google code search
18
RQ1: Balanced and Imbalanced Patterns How high percentage of balanced and imbalanced patterns exist in real applications?
Balanced patterns: 0% to 30% (average: 9.69%) Imbalanced patterns:
30% to 100% (average: 65%) for Java default API libraries 0% to 9.5% (average: 5%) for open source libraries
Inference: Java default API libraries provide more different ways of writing code compared to open source libraries
19
RQ2: False Positives and False Negatives How high % of false positives are reduced (with low or no increase of
false negatives)? Applied mined patterns (“P1 or P2 or ... or Pi or A^
1 or A^2 or ... or A^
j ”) in three modes:
Existing mode:
“P1 or P2 or ... or Pi or A^1 or A^
2 or ... or A^j ”
P1 ,P2, ... , Pi
Balanced mode:
“P1 or P2 or ... or Pi or A^1 or A^
2 or ... or A^j ”
“P1 or P2 or ... or Pi” Imbalanced mode:
“P1 or P2 or ... or Pi or A^1 or A^
2 or ... or A^j ”
“P1 or P2 or ... or Pi or A^1 or A^
2 or ... or A^
j ”
19
20
RQ2: False Positives and False Negatives
Application Existing Mode Balanced Mode
Defects False Positives
Defects False Positives
% of reduction
False Negatives
Java Util 37 104 37 104 0 0
Java Transaction
51 105 51 105 0 0
Java SQL 56 143 56 90 37.06 0
BCEL 2 14 2 8 42.86 0
HSqlDB 1 0 1 0 0 0
Hibernate 10 9 10 8 11.11 0
AVERAGE/TOTAL
15.17 0
Existing Mode vs Balanced Mode
Balanced mode reduced false positives by 15.17% without any increase in false negatives
20
21
RQ2: False Positives and False Negatives
Application Existing Mode Imbalanced Mode
Defects False Positives
Defects False Positives
% of reduction
False Negatives
Java Util 37 104 36 74 28.85 1
Java Transaction
51 105 47 76 27.62 4
Java SQL 56 143 53 81 43.36 3
BCEL 2 14 2 6 57.04 0
HSqlDB 1 0 1 0 0 0
Hibernate 10 9 10 8 11.11 0
AVERAGE/TOTAL
28.01 8
Existing Mode vs Imbalanced Mode
Imbalanced mode reduced false positives by 28% with quite small increase in false negatives
21
22
Conclusion Problem-driven methodology for advancing mining software
engineering data by identifying new problems, patterns mining algorithms, defects
Alattin mines alternative patterns classified into three categories: balanced, imbalanced, and single
Alattin can be used to enhance various existing mining approaches to reduce false positives
Future work: Exploit synergy between static and dynamic analysis to further reduce false positives
23
Thank You