alattin: mining alternative patterns for detecting neglected conditions suresh thummalapenta and tao...

23
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina State University Raleigh, USA ASE 2009 This work is supported in part by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443 and ARO grant W911NF-08-1-0105 managed by NCSU Secure Open Source Systems Initiative (SOSI)

Upload: bryan-parks

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

Alattin: Mining Alternative Patterns for Detecting Neglected Conditions

Suresh Thummalapenta and Tao XieDepartment of Computer Science

North Carolina State UniversityRaleigh, USA

ASE 2009

This work is supported in part by NSF grant CCF-0725190 and ARO grant W911NF-08-1-0443 and ARO grant W911NF-08-1-0105 managed by NCSU Secure Open Source Systems Initiative (SOSI)

Page 2: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

Alattin: Motivation

2

Problem: Programming rules are often not well documented

General solution: Mine common patterns across a large

number of data points (e.g., code samples) Use common patterns as programming

rules to detect defects

Page 3: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

3

Limited data points Existing approaches mine specifications from a few code

bases miss specifications due to lack of sufficient data points

Existing approaches produce a large number of false positives

Challenges addressed by Alattin

Page 4: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

4

44

Code repositories Code repositories

1 2 N…

1 2mining patterns

searching miningpatterns

Code search engine e.g., Open source codeon the web

Eclipse, Linux, …

Existing approaches

Alattin approach

Often lack sufficient relevant data points (eg. API call sites)

Code repositories

Limited Data Points

Page 5: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

5

5

Existing approaches produce a large number of false positives

One major observation: Programmers often write code in different ways for

achieving the same task Some ways are more frequent than others

Large Number of False Positives

Frequent ways

Infrequent ways

Mined Patterns

mine patterns detect violations

ViolationsFalse

Positives

Page 6: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

6

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries){ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Example 2

Code Sample 2

Java.util.Iterator.next() throws NoSuchElementException when invoked on a list without any elements

Page 7: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

7

Example: java.util.Iterator.next()

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

Mined Pattern from existing approaches:“boolean check on return of Iterator.hasNext before Iterator.next”

Page 8: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

8

Example: java.util.Iterator.next()

Require more general patterns (alternative patterns): P1 or P2

P1 : boolean check on return of Iterator.hasNext before Iterator.nextP2 : boolean check on return of ArrayList.size before Iterator.next

Existing approaches cannot mine, since alternative P2 is infrequent

PrintEntries1(ArrayList<string> entries)

{ … Iterator it = entries.iterator(); if(it.hasNext()) { string last = (string) it.next(); } …}

Code Sample 1

PrintEntries2(ArrayList<string> entries)

{ … if(entries.size() > 0) { Iterator it = entries.iterator(); string last = (string) it.next(); } …}

Code Sample 2

Page 9: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

9

Our Solution: ImMiner Algorithm Mines alternative patterns of the form P1 or P2

Based on the observation that infrequent alternatives such as P2 are frequent among code examples that do not support P1

1243 code examples

Sample 1 (1218 / 1243)

Sample 2 (6/1243)

P2 is frequent among code examples not supporting P1

P2 is infrequent among entire 1243 code examples

Page 10: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

10

Alternative Patterns ImMiner mines three kinds of alternative patterns of the general form “P1 or P2”

Balanced: all alternatives (both P1 and P2) are frequent

Imbalanced: some alternatives (P1) are frequent and others are infrequent (P2). Represented as “P1 or P^

2”

Single: only one alternative

Page 11: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

11

ImMiner Algorithm Uses frequent-itemset [Burdick et al. ICDE 01] mining iteratively

An input database with the following APIs for Iterator.next()Input database Mapping of IDs to APIs

Page 12: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

12

ImMiner Algorithm: Frequent AlternativesInput database

Frequent itemset mining

(min_sup 0.5)

Frequent item: 1P1: boolean-check on the return of

Iterator.hasNext() before Iterator.next()

Page 13: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

13

ImMiner: Infrequent Alternatives of P1

Positive database (PSD)

Negative database (NSD)

Split input database into two databases: Positive and Negative

Mine patterns that are frequent in NSD and are infrequent in PSD Reason: Only such patterns serve as alternatives for P1

Alternative Pattern : P2 “const check on the return of ArrayList.size() before Iterator.next()”

Alattin applies ImMiner algorithm to detect neglected conditions

Page 14: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

14

Neglected Conditions Neglected conditions refer to

Missing conditions that check the arguments or receiver of the API call before the API call

Missing conditions that check the return or receiver of the API call after the API call

One of the primary reasons for many fatal issues security or buffer-overflow vulnerabilities [Chang et

al. ISSTA 07]

Page 15: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

15

Alattin Approach

ApplicationUnder Analysis

Detect neglected conditions

Classes and methods

Open Source Projects on web Open Source Projects on web

1 2 N…

…Pattern

Candidates

Alternative Patterns

Violations

Extract classes and methods

reused

Phase 1: Issue queries and collect relevant code samples. Eg: “lang:java

java.util.Iterator next”Phase 2: Generate pattern candidates

Phase 3: Mine alternative patterns

Phase 4: Detect neglected conditions statically

Page 16: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

16

Evaluation Research Questions:

Does alternative patterns exist in real applications?

How high percentage of false positives are reduced (with low or no increase of false negatives) in detected violations?

Page 17: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

17

Subjects

Two categories of subjects: 3 Java default API libraries 3 popular open source libraries

Column “Samples”: number of code examples collected from Google code search

Page 18: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

18

RQ1: Balanced and Imbalanced Patterns How high percentage of balanced and imbalanced patterns exist in real applications?

Balanced patterns: 0% to 30% (average: 9.69%) Imbalanced patterns:

30% to 100% (average: 65%) for Java default API libraries 0% to 9.5% (average: 5%) for open source libraries

Inference: Java default API libraries provide more different ways of writing code compared to open source libraries

Page 19: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

19

RQ2: False Positives and False Negatives How high % of false positives are reduced (with low or no increase of

false negatives)? Applied mined patterns (“P1 or P2 or ... or Pi or A^

1 or A^2 or ... or A^

j ”) in three modes:

Existing mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

P1 ,P2, ... , Pi

Balanced mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

“P1 or P2 or ... or Pi” Imbalanced mode:

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^j ”

“P1 or P2 or ... or Pi or A^1 or A^

2 or ... or A^

j ”

19

Page 20: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

20

RQ2: False Positives and False Negatives

Application Existing Mode Balanced Mode

Defects False Positives

Defects False Positives

% of reduction

False Negatives

Java Util 37 104 37 104 0 0

Java Transaction

51 105 51 105 0 0

Java SQL 56 143 56 90 37.06 0

BCEL 2 14 2 8 42.86 0

HSqlDB 1 0 1 0 0 0

Hibernate 10 9 10 8 11.11 0

AVERAGE/TOTAL

15.17 0

Existing Mode vs Balanced Mode

Balanced mode reduced false positives by 15.17% without any increase in false negatives

20

Page 21: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

21

RQ2: False Positives and False Negatives

Application Existing Mode Imbalanced Mode

Defects False Positives

Defects False Positives

% of reduction

False Negatives

Java Util 37 104 36 74 28.85 1

Java Transaction

51 105 47 76 27.62 4

Java SQL 56 143 53 81 43.36 3

BCEL 2 14 2 6 57.04 0

HSqlDB 1 0 1 0 0 0

Hibernate 10 9 10 8 11.11 0

AVERAGE/TOTAL

28.01 8

Existing Mode vs Imbalanced Mode

Imbalanced mode reduced false positives by 28% with quite small increase in false negatives

21

Page 22: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

22

Conclusion Problem-driven methodology for advancing mining software

engineering data by identifying new problems, patterns mining algorithms, defects

Alattin mines alternative patterns classified into three categories: balanced, imbalanced, and single

Alattin can be used to enhance various existing mining approaches to reduce false positives

Future work: Exploit synergy between static and dynamic analysis to further reduce false positives

Page 23: Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina

23

Thank You