alattin: mining alternative patterns for detecting neglected conditions suresh thummalapenta and tao...

Post on 04-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Alattin: Mining Alternative Patterns for Detecting Neglected Conditions

Suresh Thummalapenta and Tao XieDepartment of Computer Science

North Carolina State UniversityRaleigh, USA

ASE 2009

This work is supported in part by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443 and ARO grant W911NF-08-1-0105 managed by NCSU Secure Open Source Systems Initiative (SOSI)

Alattin: Motivation

2

Problem: Programming rules are often not well documented

General solution: Mine common patterns across a large

number of data points (e.g., code samples) Use common patterns as programming

rules to detect defects

3

Limited data points Existing approaches mine specifications from a few code

bases miss specifications due to lack of sufficient data points

Existing approaches produce a large number of false positives

Challenges addressed by Alattin

4

44

Code repositories Code repositories

1 2 N…

1 2mining patterns

searching miningpatterns

Code search engine e.g., Open source codeon the web

Eclipse, Linux, …

Existing approaches

Alattin approach

Often lack sufficient relevant data points (eg. API call sites)

Code repositories

Limited Data Points

5

5

Existing approaches produce a large number of false positives

One major observation: Programmers often write code in different ways for

achieving the same task Some ways are more frequent than others

Large Number of False Positives

Frequent ways

Infrequent ways

Mined Patterns

mine patterns detect violations

ViolationsFalse

Positives

6

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Example 2

Code Sample 2

Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements

7

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

Mined Pattern from existing approaches:“boolean check on return of Iterator.hasNext before Iterator.next”

8

Example: java.util.Iterator.next()

Require more general patterns (alternative patterns): P1 or P2

P1 : boolean check on return of Iterator.hasNext before Iterator.nextP2 : boolean check on return of ArrayList.size before Iterator.next

Existing approaches cannot mine, since alternative P2 is infrequent

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

9

Our Solution: ImMiner Algorithm Mines alternative patterns of the form P1 or P2

Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

P2 is frequent among code examples not supporting P1

P2 is infrequent among entire 1243 code examples

10

Alternative Patterns ImMiner mines three kinds of alternative patterns of the general form “P1 or P2”

Balanced: all alternatives (both P1 and P2) are frequent

Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^

2”

Single: only one alternative

11

ImMiner Algorithm Uses frequent-itemset [Burdick et al. ICDE 01] mining iteratively

An input database with the following APIs for Iterator.next()Input database Mapping of IDs to APIs

12

ImMiner Algorithm: Frequent AlternativesInput database

Frequent itemset mining

(min_sup 0.5)

Frequent item: 1P1: boolean-check on the return of

Iterator.hasNext() before Iterator.next()

13

ImMiner: Infrequent Alternatives of P1

Positive database (PSD)

Negative database (NSD)

Split input database into two databases: Positive and Negative

Mine patterns that are frequent in NSD and are infrequent in PSD Reason: Only such patterns serve as alternatives for P1

Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”

Alattin applies ImMiner algorithm to detect neglected conditions

14

Neglected Conditions Neglected conditions refer to

Missing conditions that check the arguments or receiver of the API call before the API call

Missing conditions that check the return or receiver of the API call after the API call

One of the primary reasons for many fatal issues security or buffer-overflow vulnerabilities [Chang et

al. ISSTA 07]

15

Alattin Approach

ApplicationUnder Analysis

Detect neglected conditions

Classes and methods

Open Source Projects on web Open Source Projects on web

1 2 N…

…Pattern

Candidates

Alternative Patterns

Violations

Extract classes and methods

reused

Phase 1: Issue queries and collect relevant code samples. Eg: “lang:java

java.util.Iterator next”Phase 2: Generate pattern candidates

Phase 3: Mine alternative patterns

Phase 4: Detect neglected conditions statically

16

Evaluation Research Questions:

Does alternative patterns exist in real applications?

How high percentage of false positives are reduced (with low or no increase of false negatives) in detected violations?

17

Subjects

Two categories of subjects: 3 Java default API libraries 3 popular open source libraries

Column “Samples”: number of code examples collected from Google code search

18

RQ1: Balanced and Imbalanced Patterns How high percentage of balanced and imbalanced patterns exist in real applications?

Balanced patterns: 0% to 30% (average: 9.69%) Imbalanced patterns:

30% to 100% (average: 65%) for Java default API libraries 0% to 9.5% (average: 5%) for open source libraries

Inference: Java default API libraries provide more different ways of writing code compared to open source libraries

19

RQ2: False Positives and False Negatives How high % of false positives are reduced (with low or no increase of

false negatives)? Applied mined patterns (“P1 or P2 or ... or Pi or A^

1 or A^2 or ... or A^

j ”) in three modes:

Existing mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

P1 ,P2, ... , Pi

Balanced mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

“P1 or P2 or ... or Pi” Imbalanced mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^

j ”

19

20

RQ2: False Positives and False Negatives

Application Existing Mode Balanced Mode

Defects False Positives

Defects False Positives

% of reduction

False Negatives

Java Util 37 104 37 104 0 0

Java Transaction

51 105 51 105 0 0

Java SQL 56 143 56 90 37.06 0

BCEL 2 14 2 8 42.86 0

HSqlDB 1 0 1 0 0 0

Hibernate 10 9 10 8 11.11 0

AVERAGE/TOTAL

15.17 0

Existing Mode vs Balanced Mode

Balanced mode reduced false positives by 15.17% without any increase in false negatives

20

21

RQ2: False Positives and False Negatives

Application Existing Mode Imbalanced Mode

Defects False Positives

Defects False Positives

% of reduction

False Negatives

Java Util 37 104 36 74 28.85 1

Java Transaction

51 105 47 76 27.62 4

Java SQL 56 143 53 81 43.36 3

BCEL 2 14 2 6 57.04 0

HSqlDB 1 0 1 0 0 0

Hibernate 10 9 10 8 11.11 0

AVERAGE/TOTAL

28.01 8

Existing Mode vs Imbalanced Mode

Imbalanced mode reduced false positives by 28% with quite small increase in false negatives

21

22

Conclusion Problem-driven methodology for advancing mining software

engineering data by identifying new problems, patterns mining algorithms, defects

Alattin mines alternative patterns classified into three categories: balanced, imbalanced, and single

Alattin can be used to enhance various existing mining approaches to reduce false positives

Future work: Exploit synergy between static and dynamic analysis to further reduce false positives

23

Thank You

top related