semantics-aware trace analysis [pldi 2009]

Post on 29-Nov-2014

3.549 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

We present a novel dynamic program analysis that builds a semantic view of program executions. These views reflect program abstractions and aspects; however, views are not simply projections of execution traces, but are linked to each other to capture semantic interactions among abstractions at different levels of granularity in a scalable manner.We describe our approach in the context of Java and demonstrate its utility to improve regression analysis. We first formalize a subset of Java and a grammar for traces generated at program execution. We then introduce several types of views used to analyze regression bugs along with a novel, scalable technique for semantic differencing of traces from different versions of the same program. Benchmark results on large open-source Java programs demonstrate that semantic-aware trace differencing can identify precise and useful details about the underlying cause for a regression, even in programs that use reflection, multithreading, or dynamic code generation, features that typically confound other analysis techniques.

TRANSCRIPT

Kevin Hoffman, Patrick Eugster, Suresh Jagannathan

Roadmap

Motivation

Prior Approaches

Semantics-Aware Trace Analysis (SATA)

Applying SATA to Regression Analysis

Evaluation

Conclusions

Motivation

Apache XalanJ 2.4.1 works:java … xslt.Process -xsl case1.xsl -in test.xml

java … xslt.Process -xsl case2.xsl -in test.xml

java … xslt.Process -xsl case3.xsl -in test.xml

Upgrade to 2.5.1, now it‟s broken!java … xslt.Process -xsl case1.xsl -in test.xml

java … xslt.Process -xsl case2.xsl -in test.xml

java … xslt.Process -xsl case3.xsl -in test.xml

How to find the cause?

Manual inspection is hard

12 months of development from 2.4.1 to 2.5.1

79K new or changed lines of code

97 new features and bugfixes

How to find the cause?

Debugging is hard

Separation of cause and effect

○ e.g. in XalanJ, bug in XSLT compiler

Complex web of interacting components

Debugging requires in-depth domain-

specific knowledge (limited resource)

Roadmap

Motivation

Prior Approaches

Semantics-Aware Trace Analysis (SATA)

Applying SATA to Regression Analysis

Evaluation

Conclusions

Challenges: Static Analysis

Dynamically generated code

Advanced language features

Dynamic dispatch (e.g., Polymorphism)

Reflection

Advanced aspect-oriented language features

Challenges: Dynamic Analysis

Dynamic program slicing

Slices are still quite large (e.g. 1000s of events)

Control-flow similarity metrics

State-space exploration / refinement

Execution Indexing

Use structure/state of execution to compute

an „index‟ at each execution point

Find correlations between indices for

profiling, debugging, execution comparison

Roadmap

Motivation

Prior Approaches

Semantics-Aware Trace Analysis (SATA)

Applying SATA to Regression Analysis

Evaluation

Conclusions

Semantic Trace Views

Execution Trace

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Organize execution traces into “views”

Semantic Trace Views

Execution Trace (Thread View)

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Thread views based on thread ID

Semantic Trace Views

Execution Trace (and Thread View)

Method View for SP.setRequestType

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

<-- LOG-1.addMsg(..)

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Method views based on top of call stack

Semantic Trace Views

Method views based on top of call stack

Execution Trace (and Thread View)

Method View for LOG.addMsg

...

...

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Method View for NUM.new

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

Semantic Trace Views

Active object views based on top of call stack

Execution Trace (and Thread View)

Active Object View for LOG-1.addMsg

...

...

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Active Object View for NUM-1.new

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

Semantic Trace Views

Target object views

Execution Trace (and Thread View)

Target Object View for LOG-1

--> LOG-1.addMsg('Handling..')

<-- LOG-1.addMsg(..)

--> LOG-1.addMsg('Set req..')

<-- LOG-1.addMsg(..)

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Target Object View for NUM-1

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

Semantic Trace Views

Views are linked allowing for multilevel analysis

Execution Trace (and Thread View)

Target Object View for LOG-1

--> LOG-1.addMsg('Handling..')

<-- LOG-1.addMsg(..)

--> LOG-1.addMsg('Set req..')

<-- LOG-1.addMsg(..)

--> LOG-1.addMsg('Handling..')

...

<-- LOG-1.addMsg(..)

--> SP-1.setRequestType('text/html')

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

...

<-- LOG-1.addMsg(..)

<-- SP-1.setRequestType(..)

Target Object View for NUM-1

--> NUM-1.new(32, 127)

set NUM-1._minCharRange = 32

set NUM-1._maxCharRange = 127

<-- NUM-1.new(..)

Method View for SP.setRequestType

--> STR-1.equals('text/html')

<-- STR-1.equals(..) ret=true

--> NUM-1.new(32, 127)

<-- NUM-1.new(..)

set SP-1._binConv = NUM-1

...

--> LOG-1.addMsg('Set req..')

<-- LOG-1.addMsg(..)

Roadmap

Motivation

Prior Approaches

Semantics-Aware Trace Analysis (SATA)

Applying SATA to Regression Analysis

Evaluation

Conclusions

What if we just used diff?

Collect dynamic traces:2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml

2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml

Traces are about 48K entries

What if we just used diff?

Collect dynamic traces:2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml

2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml

Traces are about 48K entries

Run “diff” tool on traces:

Requires 25 minutes on a 1.8GHZ x64 CPU

Requires 27 GB of RAM

Produces 1594 differences (3.3% of trace)

Challenges of diff / LCS

Old:

New:

diff based on LCS algorithm:

Intractable on large traces: Ω(n2)

Can‟t detect moved sequences

Is not semantic-aware

diff produces too many differences

Leveraging Semantic Views

Use secondary views (method/object) to

find correlations in primary view (thread)

Robust against reorderings in other views

Correlations are semantically sound

Apply LCS/diff over fixed-sized

windows in primary view to find „best

overall correlation‟ in primary view

Recall: What LCS would produce

Old:

New:

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

Old:

New:

Main View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Secondary View

View construction

(only one of many

secondary views

displayed here)

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Secondary View

Lock-step scanning

of main view

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Secondary View

Lock-step scanning

of main view

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

View-based Semantic Differencing

Old:

New:

Main View

Secondary View

Discovery of

correlating

secondary views

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Secondary View

Exploration of

correlating

secondary views

Secondary View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Exploration of

correlating

secondary views

Secondary View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Exploration of

correlating

secondary views

Secondary View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Exploration of

correlating

secondary views

Secondary View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Exploration of

correlating

secondary views

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Lock-step scanning

of main view

Secondary View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Lock-step scanning

of main view

Secondary View

View-based Semantic Differencing

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Old:

New:

Main View

Lock-step scanning

of main view;

exploration of

secondary views

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Apply LCS over

fixed-size window in

main view to find the

next correlation

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Apply LCS over

fixed-size window in

main view to find the

next correlation

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Lock-step scanning

of main view

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Lock-step scanning

of main view

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Lock-step scanning

of main view

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Apply LCS over

fixed-size window in

main view to find the

next correlation

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

Lock-step scanning

of main view

C B D H X Y F E F Z

C A D F E F X Y Z

H X Y Z

D X Y Z

D

Secondary View

View-based Semantic Differencing

Old:

New:

Main View

View-based

differencing

identified moved

sequences properly

View-Based Differencing vs. LCS

Collect dynamic traces:2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml

2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml

Traces are about 48K entries

Run view-based differencing tool on traces:

Requires 0.3 minutes instead of 25 minutes

Requires 0.1 GB instead of 27 GB of RAM

Produces 598 differences (1.2% of trace)

○ vs 1594 differences (3.3% of trace) for LCS

Regression Analysis Process

Old Program

New Program

AspectJ

Load-time

Weaver

Tracing Aspects

Old Program w/

Instrumentation

New Program w/

Instrumentation

Trace

Regressing

Test Case

Trace

Working

Test Case(s)

(but similar)

Traces for 4

Cases

View Trace

Differencing

Likely

Regression

Causes

RPrism Analysis Algorithm

RPrism Analysis Algorithm

Old Program

Regressing Test Case

New Program

Regressing Test CaseVS

Suspected differences set:

Old Program

Working Test Case

New Program

Working Test CaseVS

Expected differences set:

New Program

Working Test Case

New Program

Regressing Test CaseVS

Regression differences set:

RPrism Analysis Algorithm

Suspected

differences setExpected

differences set

Regression

differences setResults

Roadmap

Motivation

Prior Approaches

Semantics-Aware Trace Analysis (SATA)

Applying SATA to Regression Analysis

Evaluation

Conclusions

4 Regressions on 3 Projects

Daikon Dynamic invariant detector from MIT

Used as a test subject in 11 other publications

Apache XalanJ Implements XML XPath and XSLT

Interprets XSLT or compiles XSLT to Java bytecode

Used in Sun JDK to implement javax.xml.* classes

Apache Derby (720 KLOC) Embedded or client/server relational DB

AKA Sun Java DB, included in JDK 6

Daikon Regression

About Daikon

169 KLOC, 1100 classes

Dynamic invariant detector from MIT

Used as a test subject in 11 other publications

About the Regression

Regression first studied by JUnit/CIA [FSE „06]

○ 1 week of differences

Execution traces about 15K entries in length

Daikon Regression

42 differences before, 3 after analysis

Same accuracy as LCS

12.9x speedup

12.1 times less memory

XalanJ-1725 Regression

About XalanJ

365 KLOC, 1500 classes

Implements XPath and XSLT for XML

Used by Sun to implement javax.xml.* classes

About the Regression

Regression from version 2.5.1 to 2.5.2

○ 4 months of code changes, 84 major changes

Execution traces about 98K entries in length

Regressing behavior exhibited within dynamically generated code

XalanJ-1725 Regression

296 differences before, 1 after analysis

LCS failed to find the regression cause

82.8x speedup

269 times less memory

XalanJ-1802 Regression

About XalanJ 365 KLOC, 1500 classes

Implements XPath and XSLT for XML

Used by Sun to implement javax.xml.* classes

About the Regression Regression from version 2.4.1 to 2.5.1

○ 79K changed code over 12 months

○ 97 bugfixes and feature enhancements

Execution traces about 44K entries in length

Regressing behavior exhibited within a completely rearchitected module

XalanJ-1802 Regression

184 differences before, 10 after analysis

Same accuracy as LCS

9.4x speedup

35.4 times less memory

Derby-1633 Regression

About Derby

720K lines of code

Embedded or client/server relational DB

AKA Sun Java DB, included in JDK 6

About the Regression

Regression from version 10.1.2.1 to 10.1.3.1

○ 7 months of changes, 9 enhancements, 97 bugfixes

Execution traces about 335K entries in length

Involves multiple threads, larger code base (2x), and longer running traces (3x)

Derby-1633 Regression

2663 differences before, 6 after analysis

LCS completely failed (out of memory

failure at 32 GB)

Roadmap

Motivation

Prior Approaches

Semantics-Aware Trace Analysis (SATA)

Applying SATA to Regression Analysis

Evaluation

Conclusions

Summary / Future Directions

New view-based model for traces

Facilitates semantics-aware dynamic analyses

One application is efficient trace differencing

Full formal framework in paper

Other potential applications:

Race detection

Object-protocol enforcement

Data-mining from traces

Malware detection

Download RPrism, try it out!

http://cs.purdue.edu/homes/kjhoffma/rprism/

Contact Information:

Kevin Hoffman

kjhoffma@cs.purdue.edu

View-based Diff vs LCS

Regression Cause Analysis

Factors affecting false negatives: Dynamic traces are complete, set A must contain cause

Differences in set B produced correct output, not likely to

contain the direct regression cause

Intersecting with set C can introduce false negatives (e.g.,

regression caused by code removal)

Factors affecting false positives: Choice of similar test case affects quality of set B

Intersecting/subtracting set C also helps

Set A is the suspected differences set

Set B is the expected differences set

Set C is the regression differences set

Lock-step Scanning of Main View

Lock-step Scanning of Main View

Exploration of Secondary Views with LCS

Apply LCS over Fixed-size Window in

Main View to Find the Next Correlation

Exploration of Secondary Views with LCS

Apply LCS over Fixed-size Window in

Main View to Find the Next Correlation

Lock-step Scanning of Main View

top related