semantics-aware trace analysis [pldi 2009]
Post on 29-Nov-2014
3.549 Views
Preview:
DESCRIPTION
TRANSCRIPT
Kevin Hoffman, Patrick Eugster, Suresh Jagannathan
Roadmap
Motivation
Prior Approaches
Semantics-Aware Trace Analysis (SATA)
Applying SATA to Regression Analysis
Evaluation
Conclusions
Motivation
Apache XalanJ 2.4.1 works:java … xslt.Process -xsl case1.xsl -in test.xml
java … xslt.Process -xsl case2.xsl -in test.xml
java … xslt.Process -xsl case3.xsl -in test.xml
Upgrade to 2.5.1, now it‟s broken!java … xslt.Process -xsl case1.xsl -in test.xml
java … xslt.Process -xsl case2.xsl -in test.xml
java … xslt.Process -xsl case3.xsl -in test.xml
How to find the cause?
Manual inspection is hard
12 months of development from 2.4.1 to 2.5.1
79K new or changed lines of code
97 new features and bugfixes
How to find the cause?
Debugging is hard
Separation of cause and effect
○ e.g. in XalanJ, bug in XSLT compiler
Complex web of interacting components
Debugging requires in-depth domain-
specific knowledge (limited resource)
Roadmap
Motivation
Prior Approaches
Semantics-Aware Trace Analysis (SATA)
Applying SATA to Regression Analysis
Evaluation
Conclusions
Challenges: Static Analysis
Dynamically generated code
Advanced language features
Dynamic dispatch (e.g., Polymorphism)
Reflection
Advanced aspect-oriented language features
Challenges: Dynamic Analysis
Dynamic program slicing
Slices are still quite large (e.g. 1000s of events)
Control-flow similarity metrics
State-space exploration / refinement
Execution Indexing
Use structure/state of execution to compute
an „index‟ at each execution point
Find correlations between indices for
profiling, debugging, execution comparison
Roadmap
Motivation
Prior Approaches
Semantics-Aware Trace Analysis (SATA)
Applying SATA to Regression Analysis
Evaluation
Conclusions
Semantic Trace Views
Execution Trace
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Organize execution traces into “views”
Semantic Trace Views
Execution Trace (Thread View)
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Thread views based on thread ID
Semantic Trace Views
Execution Trace (and Thread View)
Method View for SP.setRequestType
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
<-- LOG-1.addMsg(..)
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Method views based on top of call stack
Semantic Trace Views
Method views based on top of call stack
Execution Trace (and Thread View)
Method View for LOG.addMsg
...
...
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Method View for NUM.new
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
Semantic Trace Views
Active object views based on top of call stack
Execution Trace (and Thread View)
Active Object View for LOG-1.addMsg
...
...
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Active Object View for NUM-1.new
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
Semantic Trace Views
Target object views
Execution Trace (and Thread View)
Target Object View for LOG-1
--> LOG-1.addMsg('Handling..')
<-- LOG-1.addMsg(..)
--> LOG-1.addMsg('Set req..')
<-- LOG-1.addMsg(..)
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Target Object View for NUM-1
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
Semantic Trace Views
Views are linked allowing for multilevel analysis
Execution Trace (and Thread View)
Target Object View for LOG-1
--> LOG-1.addMsg('Handling..')
<-- LOG-1.addMsg(..)
--> LOG-1.addMsg('Set req..')
<-- LOG-1.addMsg(..)
--> LOG-1.addMsg('Handling..')
...
<-- LOG-1.addMsg(..)
--> SP-1.setRequestType('text/html')
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
...
<-- LOG-1.addMsg(..)
<-- SP-1.setRequestType(..)
Target Object View for NUM-1
--> NUM-1.new(32, 127)
set NUM-1._minCharRange = 32
set NUM-1._maxCharRange = 127
<-- NUM-1.new(..)
Method View for SP.setRequestType
--> STR-1.equals('text/html')
<-- STR-1.equals(..) ret=true
--> NUM-1.new(32, 127)
<-- NUM-1.new(..)
set SP-1._binConv = NUM-1
...
--> LOG-1.addMsg('Set req..')
<-- LOG-1.addMsg(..)
Roadmap
Motivation
Prior Approaches
Semantics-Aware Trace Analysis (SATA)
Applying SATA to Regression Analysis
Evaluation
Conclusions
What if we just used diff?
Collect dynamic traces:2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml
2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml
Traces are about 48K entries
What if we just used diff?
Collect dynamic traces:2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml
2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml
Traces are about 48K entries
Run “diff” tool on traces:
Requires 25 minutes on a 1.8GHZ x64 CPU
Requires 27 GB of RAM
Produces 1594 differences (3.3% of trace)
Challenges of diff / LCS
Old:
New:
diff based on LCS algorithm:
Intractable on large traces: Ω(n2)
Can‟t detect moved sequences
Is not semantic-aware
diff produces too many differences
Leveraging Semantic Views
Use secondary views (method/object) to
find correlations in primary view (thread)
Robust against reorderings in other views
Correlations are semantically sound
Apply LCS/diff over fixed-sized
windows in primary view to find „best
overall correlation‟ in primary view
Recall: What LCS would produce
Old:
New:
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
Old:
New:
Main View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Secondary View
View construction
(only one of many
secondary views
displayed here)
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Secondary View
Lock-step scanning
of main view
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Secondary View
Lock-step scanning
of main view
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
View-based Semantic Differencing
Old:
New:
Main View
Secondary View
Discovery of
correlating
secondary views
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Secondary View
Exploration of
correlating
secondary views
Secondary View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Exploration of
correlating
secondary views
Secondary View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Exploration of
correlating
secondary views
Secondary View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Exploration of
correlating
secondary views
Secondary View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Exploration of
correlating
secondary views
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Lock-step scanning
of main view
Secondary View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Lock-step scanning
of main view
Secondary View
View-based Semantic Differencing
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Old:
New:
Main View
Lock-step scanning
of main view;
exploration of
secondary views
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Apply LCS over
fixed-size window in
main view to find the
next correlation
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Apply LCS over
fixed-size window in
main view to find the
next correlation
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Lock-step scanning
of main view
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Lock-step scanning
of main view
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Lock-step scanning
of main view
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Apply LCS over
fixed-size window in
main view to find the
next correlation
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
Lock-step scanning
of main view
C B D H X Y F E F Z
C A D F E F X Y Z
H X Y Z
D X Y Z
D
Secondary View
View-based Semantic Differencing
Old:
New:
Main View
View-based
differencing
identified moved
sequences properly
View-Based Differencing vs. LCS
Collect dynamic traces:2.4.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml
2.5.1: sssdyntracer … xslt.Process -xsl case2.xsl -in test.xml
Traces are about 48K entries
Run view-based differencing tool on traces:
Requires 0.3 minutes instead of 25 minutes
Requires 0.1 GB instead of 27 GB of RAM
Produces 598 differences (1.2% of trace)
○ vs 1594 differences (3.3% of trace) for LCS
Regression Analysis Process
Old Program
New Program
AspectJ
Load-time
Weaver
Tracing Aspects
Old Program w/
Instrumentation
New Program w/
Instrumentation
Trace
Regressing
Test Case
Trace
Working
Test Case(s)
(but similar)
Traces for 4
Cases
View Trace
Differencing
Likely
Regression
Causes
RPrism Analysis Algorithm
RPrism Analysis Algorithm
Old Program
Regressing Test Case
New Program
Regressing Test CaseVS
Suspected differences set:
Old Program
Working Test Case
New Program
Working Test CaseVS
Expected differences set:
New Program
Working Test Case
New Program
Regressing Test CaseVS
Regression differences set:
RPrism Analysis Algorithm
Suspected
differences setExpected
differences set
Regression
differences setResults
Roadmap
Motivation
Prior Approaches
Semantics-Aware Trace Analysis (SATA)
Applying SATA to Regression Analysis
Evaluation
Conclusions
4 Regressions on 3 Projects
Daikon Dynamic invariant detector from MIT
Used as a test subject in 11 other publications
Apache XalanJ Implements XML XPath and XSLT
Interprets XSLT or compiles XSLT to Java bytecode
Used in Sun JDK to implement javax.xml.* classes
Apache Derby (720 KLOC) Embedded or client/server relational DB
AKA Sun Java DB, included in JDK 6
Daikon Regression
About Daikon
169 KLOC, 1100 classes
Dynamic invariant detector from MIT
Used as a test subject in 11 other publications
About the Regression
Regression first studied by JUnit/CIA [FSE „06]
○ 1 week of differences
Execution traces about 15K entries in length
Daikon Regression
42 differences before, 3 after analysis
Same accuracy as LCS
12.9x speedup
12.1 times less memory
XalanJ-1725 Regression
About XalanJ
365 KLOC, 1500 classes
Implements XPath and XSLT for XML
Used by Sun to implement javax.xml.* classes
About the Regression
Regression from version 2.5.1 to 2.5.2
○ 4 months of code changes, 84 major changes
Execution traces about 98K entries in length
Regressing behavior exhibited within dynamically generated code
XalanJ-1725 Regression
296 differences before, 1 after analysis
LCS failed to find the regression cause
82.8x speedup
269 times less memory
XalanJ-1802 Regression
About XalanJ 365 KLOC, 1500 classes
Implements XPath and XSLT for XML
Used by Sun to implement javax.xml.* classes
About the Regression Regression from version 2.4.1 to 2.5.1
○ 79K changed code over 12 months
○ 97 bugfixes and feature enhancements
Execution traces about 44K entries in length
Regressing behavior exhibited within a completely rearchitected module
XalanJ-1802 Regression
184 differences before, 10 after analysis
Same accuracy as LCS
9.4x speedup
35.4 times less memory
Derby-1633 Regression
About Derby
720K lines of code
Embedded or client/server relational DB
AKA Sun Java DB, included in JDK 6
About the Regression
Regression from version 10.1.2.1 to 10.1.3.1
○ 7 months of changes, 9 enhancements, 97 bugfixes
Execution traces about 335K entries in length
Involves multiple threads, larger code base (2x), and longer running traces (3x)
Derby-1633 Regression
2663 differences before, 6 after analysis
LCS completely failed (out of memory
failure at 32 GB)
Roadmap
Motivation
Prior Approaches
Semantics-Aware Trace Analysis (SATA)
Applying SATA to Regression Analysis
Evaluation
Conclusions
Summary / Future Directions
New view-based model for traces
Facilitates semantics-aware dynamic analyses
One application is efficient trace differencing
Full formal framework in paper
Other potential applications:
Race detection
Object-protocol enforcement
Data-mining from traces
Malware detection
Download RPrism, try it out!
http://cs.purdue.edu/homes/kjhoffma/rprism/
Contact Information:
Kevin Hoffman
kjhoffma@cs.purdue.edu
View-based Diff vs LCS
Regression Cause Analysis
Factors affecting false negatives: Dynamic traces are complete, set A must contain cause
Differences in set B produced correct output, not likely to
contain the direct regression cause
Intersecting with set C can introduce false negatives (e.g.,
regression caused by code removal)
Factors affecting false positives: Choice of similar test case affects quality of set B
Intersecting/subtracting set C also helps
Set A is the suspected differences set
Set B is the expected differences set
Set C is the regression differences set
Lock-step Scanning of Main View
Lock-step Scanning of Main View
Exploration of Secondary Views with LCS
Apply LCS over Fixed-size Window in
Main View to Find the Next Correlation
Exploration of Secondary Views with LCS
Apply LCS over Fixed-size Window in
Main View to Find the Next Correlation
Lock-step Scanning of Main View
top related