50.530: software engineering sun jun sutd. datetopicremarks sep 15introduction sep 22automatic...

75
50.530: Software Engineering Sun Jun SUTD

Upload: natalie-fletcher

Post on 23-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

50.530: Software Engineering

Sun JunSUTD

Date Topic RemarksSep 15 IntroductionSep 22 Automatic Testing Sep 29 Delta DebuggingOct 13 Bug LocalizationOct 20 Specification MiningNov 3 Race Detection Nov 10 Hoare Logic and ProvingNov 17 Symbolic ExecutionNov 24 Invariant GenerationDec 1 Software Model Checking Dec 12 Rely Guarantee Reasoning Dec 15, 10 - 12Dec 19 Final Exam

Course Outline

Debugging

Verification

Week 6: Specification Mining

Where the bug is?

Where the bug is depends on what the programmer wants at each step. How do we know what the programmer wants?

We “find out” what the programmer wants, borrowing ideas and techniques from machine learning.

USING LIKELY INVARIANTS FOR AUTOMATED SOFTWARE FAULT LOCALIZATION

Sahoo et al. ASPLOS 2013

The Idea

Delta Debugging is perhaps inefficient and un-scalable because it compares a pair of concrete program states: too many differences and too detailed.

Good Bad

The Idea

In fact, the details don’t matter. The fact that the graph is cyclic matters.

The Idea

1. Generate more passed test cases

Good Good Good

The Idea

2. Generate likely invariants

At L, x = 1 and y = -2

At L, x = 2 and y = 0

At L, x = 3 and y = 1

1<=x<=3 and-2<=y<=1

What forms of invariants do I use?

The Idea

3. Test the likely invariant with the failed test

1<=x<=3 and-2<=y<=1

Bad

At L, x = 50 and y = 0

L is a candidate root cause of the bug!

The Idea

4. Reduce the candidate root causes • Dynamic program slicing: finding out which

statements affect the candidate root cause• Dynamic dependence filtering: given two root

causes A and B, if B is affected by A and A comes earlier, A is more likely the real cause.

Overall Picture

Overall Picture

How to generate inputs?

What invariants to generate?

How to conclude one candidate root cause is more likely than the other?

From MySQL database server

Where is the bug?It fails when the date is 0000-Jan-01.

1. Generate Inputs

• The inputs should be “close” to the failure input, in the same spirit of “nearest neighbor”.

• Systematically generate inputs based on the DDmin algorithm.

The initial good inputs + good inputs generated from DDmin

A queue of good inputs to generate more good inputs from.

A list of good inputs

Algorithm 1

Algorithm 1

Consider the input is “SELECT DATE_FORMAT(“0000-01-01”, ‘%W %d %M %Y’) for the MySQL example, does it work?

If a specification of the input format is given, we can generate better and meaningful inputs.

Algorithm 2

Generate new inputs based on type

Research Discussion

How do we guarantee to generate inputs which are close to the failure input?

Can we generate inputs at a program points closer to the failure?

2. Generate Invariants

• The invariant should rightly “guess” what the programmer wants somewhere in the program. – Where do we generate invariants? – What form of the invariants should take?

Invariant: The returned value must be positive.

How should we know this?

2. Generate Invariants

• Where do we generate invariants? – (in the paper) load, store and function return

instructions. • Load: array[i] * 5 + 2• Store: array[i] = array[k] + 100;• Return: return x + y;

How would you justify this?What is the consequence?

2. Generate Invariants

• What form of the invariants should take?– (in the paper) a range invariant, e.g., x in [1..5]

How would you justify this?

4. Reduce Candidate Causes

• Using dynamic program slicing: given a statement S, the backward slice of S contains all statements which S depends on.– A data dependency is a situation in which S refers

to the data of a preceding statement.– S is control dependent on a preceding statement

if the outcome of latter determines whether S should be executed or not.

Remove all those candidate causes which the initial failure statement does not depend on.

Dynamic Program Slicing

int[] previous = new int[5];

public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } }

previous[0] = max; return max;}

public int max (int[] list)

int max = list[0];

int i = 0

i < list.length-1

if (max < list[i]) {

max = list[i]

i++

Previous[0] = max

i < list.length-1

return max

So if the value of returned max caused a failure, “previous[0] = max” should not be a candidate cause.

Exercise 1

int sum = 0;int i = 0;

while (i < 1100) { sum += i; i++;}

assert(sum >=0);

Use program slicing on the assertion.

4. Reduce Candidate Causes

• Using dependency filtering: if a faulty statement that is the bug’s root cause triggers an invariant failure, then any statement using the faulty value computed by that statement might also trigger an invariant failure.

• If statement T (control/data-)depends on S, remove T.

Is this justified?

Invariant failure here

Invariant failure here

dependency

4. Reduce Candidate Causes

• If there are multiple failed test cases, with the same cause of failure, intersect the candidate cause set for each failed test case.

Is this justified?

Case Study

• Objects of analysis– The Squid HTTP proxy server– The MySQL database server– The Apache HTTP web server

• Selected 8 real software bugs– Have to be software versions which can be

supported by the tool developed by the authors– No concurrency bugs. Why?– No missing code bugs. Why?

Case Study

Case Study: Effectiveness

Q1: whether the approach can find the true root causes of bugs?• For each bug, the correction patch in the bug

reports is used to identify the minimal statements which should be changed or deleted to remove the failure symptom.

Q2: how many false positives it generates?

Is this justified?

Case Study: Effectiveness Results

Given a set of remaining causes, find out the statements the causes depend on.

Compared with Tarantula

The range of source codes that have to be checked.

What are the limitations?

TZUYU: LEARNING STATEFUL TYPESTATES

Xiao et al. ASE 2013

Do you know why many languages are strong typed?

“Language type systems probably find more bugs on a daily basis than any other approach.”

--- Engler et al. SOSP 01

Typestate

• Typestates define valid sequences of operations that can be performed upon an instance of a given type.– method A must be invoked before method B is invoked,

and method C may not be invoked in between• Typestates associate state information with

variables of that type. This state information is used to determine at compile-time which operations are valid to be invoked upon an instance of the type.

Example: FileWriter

java.io.FileWriter:

FileWriter(File)

write(String)

close()

error

accepting state

write(String)

Exercise: Look at the Java API documentation, try to complete this typestate.

“Language type systems probably find more bugs on a daily basis than any other approach.”

--- Engler et al. SOSP 01

Motivation

Programmers don’t document the typestate when they define a data-structure. So we learn it!

Running Example

Class: java.util.Stack<E>Methods: • empty(): test if this task is empty• peek(): look at the top element in the stack• pop(): remove the top element in the stack• push(Object o): push an item onto the stack

The number of pop() must be no more than that of push()

Problem Definition

• Task: learn a model of Stack which tells what are good/bad sequences of method calls

• What models do we learn?

Deterministic Finite State Automata

Stateful Typestate

Learn DFA

• Assume that the typestate is in the form of a DFA.

• There are a number of algorithms desired to learn DFA efficiently.– Passive learning: use only existing test cases– Active learning: generate new test cases on

demand

The L* Learning Algorithm

Teacher knows the model which is a DFA.

Student asks two kinds of questions in order to learn.

Membership Query:Is <push, pop, push> valid?

Equivalence Query:Is this your DFA?

First Round: Member Queries

Is the sequence <> good?

Yes.

Is the sequence <push> good?

Yes.

Is the sequence <pop> good?

No.

Is the sequence <pop,push> good?

No.

Is the sequence <pop,pop> good?

No.

First Round: Observation Table

The table is closed and consistent. I think I know now.

Consistent: if tr = tr’, tr^<e> = tr’<e> for all e;Closed: for all tr above the blue line, tr^<e> = tr

First Round: Candidate

Bad

Good

First Round: Equivalence Query

No, <push, pop> is good

<push,pop> is represented by <pop> previously.It is obviously wrong

Second Round: Observation Table

The table is closed and consistent. I think I know now again.

Second Round: Candidate

Bad

Good

Second Round: Equivalence Query

No, <push, push, pop, pop> is good

And it continues …

First guess:

Second guess:

Third guess:

……

I will never learn

What is wrong with L*

• It is designed to learn DFA, whereas programs are beyond DFA.

• L* requires a perfect teacher, which is infeasible – What if the methods have non-trivial parameters?

The Approach

• Learn stateful typestates where the predicates are conjunctions of linear inequalities.

• Learn from test cases. – A test case is a failure if it causes an unhandled

exception or an assertion failure. • Learn using techniques from machine learning

community.

Is it justified?

TzuYu

Tester : I don’t know, let me test it out

Is <methodA,methodB> good or bad?

Yes, No, If x > 5, then yes; otherwise no.

First Round: Member Queries

Is the sequence <> good?

Yes.

Is the sequence <push> good?

Yes.

Is the sequence <pop> good?

No.

Is the sequence <pop,push> good?

No.

Is the sequence <pop,pop> good?

No.

First Round: Observation Table

The table is closed and consistent. I think I know now.

Consistent: if tr = tr’, tr^<e> = tr’<e> for all e;Closed: for all tr above the blue line, tr^<e> = tr

First Round: Equivalence Query

Heh, <push,pop> seemed good and <pop> seemed bad, they can’t be

both reaching state B, there must be something different before invoking

pop()!

Numerical Value Graph

How to distinguish these two objects?

What is a stack object

The stack object after <push>

The stack object after <>

Numerical Value Graph

• How to distinguish these two objects?stack after <push> Stack after <>

Level 0 features [not null] [not null]

Level 1 features [not null, eleCount= 1, array is not null]

[not null, eleCount= 0, array is not null]

Level 1 features distinguishable!

SVM: Supporting Vector Machine

• For the stack objects: 2*eleCount>= 1

X

XX

X

XX

O

O

OO

O

feature 1

feat

ure

2 Support Vectors

SVM: Supporting Vector Machine

X

XX

X

XX

O

O

OO

O

feature 1

feat

ure

2

OO

O

What do we do if the vectors are located like this?

First Round: Equivalence Query

I know now that whether eleCount>= 1 is important, can you restart the learning from the beginning using three events push, [eleCount>= 1]pop, [!(eleCount>=1)]pop

Second Round: Observation table

Second Round: Equivalence Query

Tester : I don’t know, let me test it out

…………..Yes

TzuYu: Workflow

Experiment Results

Experiment Results

Research Discussion

Is the typestate learned guaranteed to be correct?

If no, how do we make it correct or more likely correct?

Research Discussion

What if the vectors are located like above.

XXXXXX

O

O

OO

O

feature 1

feat

ure

2

O

O

O

O

O

O

OO

O

O

OO

Research Discussion

As an expert programmer, how do you learn what the programmer wants?

int[] previous = new int[5];

public int max (int[] list) { int max = list[0]; for (int i = 1; i < list.length-1; i++) { if (max < list[i]) { max = list[i]; } }

previous[0] = max; return max;}

What does this program do and how do you know?

Research Discussion

if (card == null) {printk (KERN_ERR, “capidrv-%d: … %d!\n”, card->contrnr, id);

}

How do you know there is a bug in the program?

Research Discussion

int mxser_write (struct tty_struct *tty, …) {struct mxser_struct *info = tty->driver_data;unsigned long flags;

if (!tty || !info->xmit_buf) {return (0);

}}

There is a potential problem and why?

Research Discussion

What else can we learn what the programmer really want from?

The Overall View

the behaviors we wanted the behaviors we have