building a better backtrace: techniques for postmortem program

Building a Better Backtrace: Techniques for Postmortem

Program Analysis

Ben Liblit & Alex Aiken

A Few Grim Realities

• Programs fail post-deployment– Ship with known bugs– Users discover new bugs

• Users are lousy testers– Never do the same thing twice– Wild variation in execution environment– Poor bug reporting, if any

• Users’ bugs are the ones that really matter

Program Analysis for Pessimists

• Assume & prepare for postmortem analysis– Compile-time analysis, stashed away for later– Lightweight (deployable) instrumentation

• Analyze failed program instances– Mix of automated / interactive tools– Not quite static analysis, not quite dynamic

• Help humans find and fix bugs that matter

This Talk: Reconstructing Execution Chronologies

• Control flow decision history captures important properties

• Fundamental questions– “How in the world did it get here?”– “What happened just before this point?”– “How can I make this happen again?”

• Broader interest than just crashes

Striking a Compromise

• Heavyweight approaches– Replay debugging– Program tracing

• Lightweight approaches– Examine stack trace in debugger– printf() debugging

• Middleweight (our) approach– “How might we have gotten here, given …?”

The Big Idea: “Gotten Here” is Control Flow Reachability

The Big Idea: “Gotten Here” is Control Flow Reachability

• Interested in paths– “How”, not just “yes/no”

• Transitive paths within one function

• Multiple functions?– Matched call/return paths– This is a form of context

free language reachability

?

?

( )

[ ]

Global Control Flow Graph

call return

entry exit

call return

Variations in Matching Grammar

• Complete execution– All calls & returns must be matched

{()(){()}[{}(())]}

Variations in Matching Grammar

• Aborted execution– Some calls without returns– We use a variant of this

{()(){()}[{}(())]}

CFL Reachability Algorithm

• Similar to transitive graph search– Use a work list to incrementally extend frontier– Forward from α or backward from ω– Transitively adding flow edges is one case

• Several additional cases for calls/returns• Complexity

– O(N3) for arbitrary grammar and graph– O(E) for our analyses (and many others)

Reconstruction WithCrash Site Only

• Work backward from crash site• Remember why each path is extended

– Record justifications in route map– route(x, z) = { r1, …, rn }

– ri = cross from x to y, then see route(y, z)• x and y must be “adjacent”: one of four cases

• route(α, ω) defines possible chronologies


• One case, unmatched call, determines stack

(


• One case, unmatched call, determines stack– Unmatched parens: {()(){()}[{}(())]}– Stack trace: {[(

(


• One case, unmatched call, determines stack– Unmatched parens: {()(){()}[{}(())]}– Stack trace: {[(

• But we probably havea specific stack tracein mind…

(

Reconstruction WithCrash Site + Stack Trace

• S ::= vector of call edges• Build |S + 1| clones of

global flow graph



global flow graph• Two types of call edge

– (i must match )i

• Stays on same layer



global flow graph• Two types of call edge

– (i must match )i

• Stays on same layer

– ci must be unmatched• Only way to next layer• Determined by S

c6

c3

c14


• Possible histories– Start at α on top layer– End at ω on bottom layer– route(α, 0, ω, |S|)

• Backward, not forward– More deterministic

• Complexity– O(E) work, |S + 1| times

c6

c3

c14

Reconstruction WithCrash Site + Event Trace

• V ::= vector of trace nodes• Use |V + 1| layered clones, as before• Must report event when crossing trace node

– On each layer, knock out all trace nodes but one• On bottommost layer, no trace nodes at all!

– Further restricts set of possible paths• Complexity: O(E|V|)

Reconstruction With …

• Stack trace + event trace• Multiple event traces• Ambiguous traces• Incomplete event trace

– Recent-branch registers• Program counter sampling• Finite state machine of your choosing…

Practical Considerations

• Dynamic dispatch / function pointers– Usual static techniques (points-to, receiver-class, etc.)– Event tracing can help– Note: stack trace is never dynamic

• Interactivity– Backward analysis is best: most bugs are close to crash– FIFO work list, demand-driven search– Deterministic versus non-deterministic state machines

Areas For Future Exploration

• Sparsity of trace information– Identify state-preserving regions– Explore such regions only once

• Summarization / visualization– Basis: dominator tree walk-back– Opportunity for novel algorithms here

Areas For Future Exploration

• Adaptive Gap Reduction– Programmer inquiries guide future annotation

• “Which way did this branch really go?”• “How many times did this loop really execute?”

– Identification of key inflection points– Insert lightweight event tracing nodes

• Related work in efficient path profiling– More evidence for future reconstructions

Summary and Conclusions

• Program analysis in an imperfect world– Post-crash: unique challenges / leverage points

• CFL path recovery as basis for analysis– Efficient, demand-driven, adaptable

• Future work– Adaptive annotation to fill in gaps– Leveraging multiple runs– Data value modeling

building a better backtrace: techniques for postmortem program

Documents