abstract domains for the static analysis of programs manipulating complex...
TRANSCRIPT
Abstract domains for the static analysis
of programs manipulating complex data-structures
Xavier Rival
INRIA
August, 26th. 2011
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 1 / 47
Introduction
Software verification
My research topic: software verification
Safety (absence of runtime errors, non desired behaviors), functional
properties...
Those are all undecidable properties
Abstract interpretation approach: sound, automatic but incomplete
Abstraction:
use of conservative approximationsb
b
b
bb
b
bb
α
b
b
b
bb
b
bb
Abstract transfer functions,
preserving conservativeapproximations b
b
b
b
b
b
f
f ♯
Widening: terminating over-approximation of concrete join
Abstract domain: abstraction + transfer function + widening
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 2 / 47
Introduction
Verification of numerical properties
Numerical abstraction
Abstraction of P(X→ Z) or in P(X→ F)Transfer functions: assignments, numerical conditions...
A great variety of abstractions available:
b
b
b
b
b
b
bb
b b
domain of intervals
b
b
b
b
b
b
bb
b b
octagons
b
b
b
b
b
b
bb
b b
convex polyhedra
Available as libraries of domains: Apron...Astrée analyzer: verification of highly critical softwares: avionics
◮ up to 1 Million lines C applications◮ modular numerical abstract domain: reduced product of simpler
abstractions, tailored to applications
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 3 / 47
Introduction
Memory abstraction difficulties
Memory abstraction
Abstraction of P( Environment × Stack × Heap )
Operations to analyze: C statementsincluding pointer arithmetic, memory management...
Operations to verify: memory safety, structure preservation...
Difficulties:
1 Complex concrete semantics: memory states, pointers...
2 Huge variety of properties to abstract:
◮ linked structures: lists, trees...◮ arrays◮ structures involving relations (sorted lists, balanced trees) or sharing◮ combination of numerical and memory properties
3 Expensive analysis algorithms to infer such abstractions
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 4 / 47
Introduction
...
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 5 / 47
Introduction
Main contributions
Long term goal
Set up a general purpose library of memory abstract domains
1 Choice of a concrete semantics
should not limit the scope of the analysis
2 Abstraction based on inductive structural properties
expresses a wide range of properties, incl. shape + numerical
3 Static analysis algorithms to infer complex propertiesinstantiations for low level programs and recursive proceduresprototype implementation
Collaboration with Bor-Yuh Evan Chang (U. of Colorado, USA)
Internships of Vincent Laviron, Suzanne Renard and Antoine Toubhans
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 6 / 47
A parametric abstraction
Outline
1 Introduction
2 A parametric abstraction
3 Static analysis algorithms
4 Instantiation to the analysis of low-level programs
5 Instantiation to interprocedural analysis
6 Perspectives and implementation
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 7 / 47
A parametric abstraction
Overview of the abstraction
Memory partitioning into regions
&t 0x...
0x...
24
0x...
22
0x0
64
Graph abstraction:
{
values, addresses −→ nodescells −→ edges
= &t24 42
0x0
32
next
data
next
data
next
data
Region summarization:
= &t24
next
data
list
◮ abstraction parameterized by a set of inductive definitions
Defines a concretization relation
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 8 / 47
A parametric abstraction
Concrete semantics
stack
x
y
z
nx
ny
nz
Heap0x...
n0
0x...
n1
free
free
A very concrete concrete semantics:
division of the heap in blocks
either allocated or free
memory cells have a numeric address
and a numeric size
pointers are numeric values
base + offset
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 9 / 47
A parametric abstraction
Contiguous regions
Shape graphs
Edges: denote memory regions
Nodes: denote values, i.e. addresses or cell contents
Points-to edge, denote contiguous memory regions
Notation: α · f 7→ β
Abstract and concrete views:
α βf
ν(α)
f ν(β)
Concretization:
γS(α · f 7→ β) ={([ν(α) + offset(f) 7→ ν(β)], ν) | ν : {α, β, . . .} → N}
◮ ν: bridge between memory and values
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 10 / 47
A parametric abstraction
Separating conjunction
Shape graphs and separation
Distinct edges denote disjoint heap regions ∗ conjunction
Advantage: allows local reasoning
e.g., easier analysis of updates
Disadvantage: maintaining separation makes some analysisoperations harder to design
We use a field splitting model
i.e., separation impacts edges / fields, not pointers
e.g.,α
β0
β1
f
g stands for stores like:
ν(α)
offset(f)
offset(g)
ν(β1)
ν(β1)
ν(α)
offset(f)
offset(g) ν(β0) = ν(β1)
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 11 / 47
A parametric abstraction
Inductive structures I: need for summarization
Infinitely many concrete states
A global structure invariant: x points to a list
Concrete: all lists pointed to by x
&x 0x0
&x
0x0
&x
0x0
&x
0x0
Abstraction:
x 7→ α ∗ α · list
Graph:
&x
αlist
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 12 / 47
A parametric abstraction
Inductive structures II: inductive definitions in separation
logic
List definition
α · list ::= (emp, α = 0)| (α · next 7→ β0 ∗ α · data 7→ β1 ∗ β0 · list, α 6= 0)
where emp denotes the empty heap
List structures abstracted by α · list
Summarization: a finite formula describes infinitely many heaps
Practical implementation in verification/analysis tools
Verification: hand-written definitions
Analysis: either built-in (Smallfoot, Space Invader) or user-supplied(TVLA, Xisa, [POPL’08]), or partly inferred (Xisa, [POPL’11])
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 13 / 47
A parametric abstraction
Inductive structures III: concretization
Concretization as a least fixpoint
Given an inductive def ι
γS(α · ι) =⋃
{
γS(F ) | α · ιU
−→ F}
Alternate approach:index inductive applications with induction depth
allows to reason on length of structures
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 14 / 47
A parametric abstraction
Inductive structures IV: a few instances
More complex shapes: trees
αtree
U−→ι
α
β0
β1
left
right
tree
tree
Relations among pointers: doubly-linked lists
αdll(δ)
U−→ι α
β
δ
next
prev
dll(α)
Relations between pointers and numerical: sorted lists
αlsort(δ)
U−→ι α
β0
β1
next
data
lsort(β1)
δ ≤ β1
All together: red-black trees with parent pointers [POPL’08]
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 15 / 47
A parametric abstraction
Inductive segments
A frequent pattern
ν(π)
&x
&y
0x0
Could be expressed directly as an inductive with a parameter:α · list_endp(π) ::= (emp, α = π)
| (α · next 7→ β0 ∗ α · data 7→ β1
∗ β0 · list_endp(π), α 6= 0)
This definition would derive from list
Thus, we make segments part of the fundamental predicates of
the domain
&x
&y
list list list
Multi-segments: possible, but harder for analysis
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 16 / 47
A parametric abstraction
Product abstraction
How to abstract both memory and contents properties ?
We need a sort of a product abstraction
Example: partly unfolded sorted list
β0 β1
next
data
next
data
lsort(β1)
β0
β1
Numerical domain variables: (some) nodes
Cofibered domain structure [Venet, 1996]
◮ a lattice D♯S
of shape graphs
◮ for each shape graph S ∈ D♯S, a distinct numerical
lattice D♯N〈S〉
◮ abstract values are pairs (S ,N) where N ∈ D♯N〈S〉
◮ analysis should use conversion functions
bC
S0
bCS1
bCS2
D♯N〈S0〉
b
b
bb b
D♯N〈S1〉
b
b b
b
D♯N〈S2〉
b
b
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 17 / 47
Static analysis algorithms
Outline
1 Introduction
2 A parametric abstraction
3 Static analysis algorithms
4 Instantiation to the analysis of low-level programs
5 Instantiation to interprocedural analysis
6 Perspectives and implementation
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 18 / 47
Static analysis algorithms
Static analysis overview
A list insertion function:
list ⋆ l assumed to point to a listlist ⋆ t assumed to point to a list elementlist ⋆ c = l;while(c != NULL && c -> next != NULL && (. . .)){
c = c -> next;}t -> next = c -> next;c -> next = t;
list inductive structure def.
Abstract precondition:
&l
&c
&t
next
data
list
Result of the (interprocedural) analysis
Over-approximations of reachable concrete statese.g., at the loop exit:
&l
&c
&t
next
data
listlist
next
data
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 19 / 47
Static analysis algorithms
The algorithms for static analysis
Unfolding: cases analysis on summariesx y
list list=⇒
x y
list
next
data
list∨
x y
= 0x0list
Abstract postconditions, on “exact” regions, e.g. insertion0x0
x y
list
next
data
list
next
data
=⇒x y
list
next
data
list
next
data
Widening: builds summaries and ensures termination
x y
list list▽
x y
list
next
data
list
=⇒x y
list list
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 20 / 47
Static analysis algorithms
Unfolding as a local case analysis
Unfolding principle
Case analysis, based on the inductive definition
Generates symbolic disjunctions
analysis performed in a disjunction domain
Example, for lists:α
list
U−→ α α = 0
αlist
U−→
α α′
β
next
data
listα 6= 0
Numeric predicates: approximated in the numerical domain
Soundness: by definition of the concretization of inductive structures
γS(S) ⊆⋃
{γS(S0) | SU
−→ S0}
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 21 / 47
Static analysis algorithms
Local reasoning
Before the assignment c = c → next;:
α
l, c
listα 6= 0
1 Result of the unfolding: two rules to consider◮ empty list does not need be considered
contradiction with num. invariant α 6= 0◮ non-empty list case:
α α′
β
l, cnext
data
list
2 Result of the assignment:
α α′
β
l cnext
data
list
note: sound analysis of the assignment in itself is trivial (frame rule)
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 22 / 47
Static analysis algorithms
Unfolding and degenerated cases
assume(l points to a dll)c = l;① while(c 6= NULL && condition)
c = c -> next;② if(c 6= 0 && c -> prev 6= 0)
c = c -> prev → prev;
at ①:α0
dll(δ1)l, c
at ②:α0 α1
dll(δ0) dll(δ1) dll(δ1)l c
⇒ non trivial unfolding
Materialization of c -> prev:α α′ α′′
β′
dll(β) dll(β′)
next
prev
dll(β′)
Segment splitting lemma: basis for segment unfolding
α α′
ι ι′
i + j
describes the same set of stores as α α′′ α′
ι ι′′i
ι′′ ι′
j
Materialization of c -> prev -> prev:α β′
α′ α′′
β′′
dll(β) dll(β′′)
nextnext
prevprev
dll(β′)
Implementation issue: discover which inductive edge to unfoldnon decidable !
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 23 / 47
Static analysis algorithms
Widening I: need for a folding operation
Back to the list traversal example... assume(l points to a list)c = l;while(c 6= NULL){c = c → next;
}
First iterates in the loop:◮ at iteration 0 (before entering the loop):
α0
l, c
list
◮ at iteration 1:α0 α1
β1
l cnext
data
list
◮ at iteration 2:α0 α1 α2
β1 β2
l cnext
data
next
data
list
How to guarantee termination of the analysis ?
How to introduce segment edges / perform abstraction ?
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 24 / 47
Static analysis algorithms
Widening II: algorithm overview
Takes two abstract values as inputs
x y
list list▽
x
y
list
next
data
list
next
data
Region matching (non unique choice: use of strategies)
x y
list list▽
x
y
list
next
data
list
next
data
Semantic rules for per region weakening
x y
list▽
x
y
list
next
data
=x y
list
Widening:x y
list list
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 25 / 47
Static analysis algorithms
Widening III: an example of a widening meta-rule
Segment introduction meta rule (for all ι)
if
α
β0 β1 β0 β1
ι ι
Slft = emp
Srgh ⊑
then Slft▽Srgh = δ0 δ1ι ι
Application to list traversal, at the end of iteration 1:◮ before iteration 0:
α0
l, c
list
◮ end of iteration 0:
β0 β1
β2
l cnext
data
list
◮ join, before iteration 1:
δ0 δ1
l c
list list list
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 26 / 47
Static analysis algorithms
Widening IV: properties
Rewrite system properties
A set of sound pair-wise weakening rules
Weakening terminates: widening is computable
Weakening rules are not confluent: need for adequate strategies
otherwise: risk of very imprecise results
Widening properties
Sound: returns an over approximation of concrete join
Terminating:◮ introduction of segments / inductive edges consumes points-to edges◮ after finitely many iterates, the size of graphs decreases
Soundness and termination also hold in the cofibered domain
assumption: use of a widening in the numeric domain
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 27 / 47
Static analysis algorithms
Widening and other folding operators
Canonicalization (TVLA, SmallFoot)
Another kind of widening operator:
Ignores its first argument: unary weakening operatorthus, does not depend on history
Returns values in a finite lattice
Applies not only for fixpoint computation: incremental weakening
Widening (Xisa) exploits the analysis history◮ quick convergence, few disjuncts◮ applies mainly on fixpoint computation
Ideally, a local abstraction would be useful together with historydependeng widening
◮ e.g., to weaken locally abstract elements◮ no requirement to return in a finite lattice◮ still benefit from widening increased precision at loop heads
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 28 / 47
Instantiation to the analysis of low-level programs
Outline
1 Introduction
2 A parametric abstraction
3 Static analysis algorithms
4 Instantiation to the analysis of low-level programs
5 Instantiation to interprocedural analysis
6 Perspectives and implementation
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 29 / 47
Instantiation to the analysis of low-level programs
Issues inherent in the analysis of C programs
Toward the analysis of a more complex language
So far, we assumed a simple, Java-like memory model
Yet, we are interested in complex C applications...
An abstract syntax tree type
typedef struct arith {char op;union {
struct{double v; }cststruct{
struct arith ⋆ l;struct arith ⋆ r;
} bin;} n;
} arith;
0x0
0x...
op
n · cst · v
64
112
24
op
n · bin · l
b · bin · r
Nested aggregates
Size of fields to take intoaccount
Existence of pointers to fields
Memory management
Vincent Laviron’s master internship, [ESOP’10]
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 30 / 47
Instantiation to the analysis of low-level programs
Abstraction of contiguous regions (II)
Points-to edges are of the form α · f 7→ β
How to choose instantiate our framework / choose f ?
typedef struct {int a;struct {
int x;int y;
} b;} tt
0x...
0x...
0x...
a
b · x
b · y
Offsets as sequences
o ::= ǫ null offset| o · f field dereference
int[]
0x...
0x...
0x...
0
4
8
Numeric offsets
o ∈ N numerical value
Only offsets need be modified (instead of field names)
Changes impact points-to edges only
Inductive structures and analysis algorithms are unchangedXavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 31 / 47
Instantiation to the analysis of low-level programs
Pointer models
How to model precisely and concisely pointers to fields ?
Solution 1 (poor): rely on the numerical domain for offset relations
Our solution: label edges destinations with offsetsν(α)
offset(f)
ν(β)
offset(g)α · f 7→ β · g α β
f gs
Similar techniques used by Kreiker for an analysis based on TVLA
Classification, with Pascal Sotin and Bertrand Jeannet [NSAD’10]
Two independent choices, four models:◮ numeric or symbolic offsets◮ pointers to fields allowed/disallowed
Xisa allows all four models
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 32 / 47
Instantiation to the analysis of low-level programs
Abstracting multiple views
Analysis of unions/pointer casts in real applications
Distinct access may not invalidate other views in user semantics
How to maintain several views on a memory chunk ?
struct {char c;short s;
} a;
0x..
0x....
a · c
a · s
struct {int i;
} b;
0x........b · i
Mine [LCTES’06], Astrée: analysis maintains dynamic sets of views
Extends to our abstraction: local non separating conjunction
i.e., ∧ applies only to points-to edges
Limited fragment of separation logic with inductive structures
◮ local conjunctions preserveanalysis efficiency
∗
pt pt ∧ ind pt ∧ ind
pt pt pt pt pt
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 33 / 47
Instantiation to the analysis of low-level programs
Dynamic memory management
How to verify the safety of free(p) ?Only base addresses of allocated regions can be freed safely
Concrete states:allocation
table
x0 n0
x1 n1
Heapx0
n0
x1
n1
free
free
Static analysis:
Abstract the allocation table
Track the location of nodes
representing valid addresses
Track the size and base address ofheap allocated regions
Inductive structures also need tosummarize such properties
Benefit from the very concrete base
semantics
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 34 / 47
Instantiation to interprocedural analysis
Outline
1 Introduction
2 A parametric abstraction
3 Static analysis algorithms
4 Instantiation to the analysis of low-level programs
5 Instantiation to interprocedural analysis
6 Perspectives and implementation
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 35 / 47
Instantiation to interprocedural analysis
Approaches to interprocedural analysis
“relational” approach “inlining” approach
analyze each definition
abstracts P(S̄→ S̄) analyze each call
abstracts P(S)+ modularity - not modular
+ reuse of invariants - re-analysis in 6= contexts- deals with state relations + deals with states
- complex higher order + straightforward iterationiteration strategy
challenge: frame problem challenge: unbounded calls
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 36 / 47
Instantiation to interprocedural analysis
Challenges in interprocedural analysis
void main(){dll ⋆ l ; //assume l points to a slll = fix(l , NULL);
}dll ⋆ fix(dll ⋆ c ,dll ⋆ p){
dll ⋆ ret;if(c != NULL){
c -> prev = p;◮ c -> next = fix(c -> next, c);
if(check(c -> data)){ret = c -> next;remove(c);
⊲ } else ret = c ;}return ret;
}
{
turns a linked list into a doubly linked list
removes some elementsmain l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
pfp
fp
fp
∅
∅ 3
8
11
2
Heap is unbounded, needs abstraction (shape analysis)
But stack may also grow unbounded, needs abstraction
Complex relations between both stack and heap
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 37 / 47
Instantiation to interprocedural analysis
Calling contexts as shape graphs
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
stack heap
Concrete assembly call stack modelled in a separating shapegraph together with the heap
◮ one node per activation record address
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 38 / 47
Instantiation to interprocedural analysis
Calling contexts as shape graphs
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
stack heap
Concrete assembly call stack modelled in a separating shapegraph together with the heap
◮ one node per activation record address
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 38 / 47
Instantiation to interprocedural analysis
Calling contexts as shape graphs
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
stack heap
Concrete assembly call stack modelled in a separating shapegraph together with the heap
◮ one node per activation record address◮ explicit edges for frame pointers
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 38 / 47
Instantiation to interprocedural analysis
Calling contexts as shape graphs
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
stack heap
Concrete assembly call stack modelled in a separating shapegraph together with the heap
◮ one node per activation record address◮ explicit edges for frame pointers◮ local variables turn into activation record fields
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 38 / 47
Instantiation to interprocedural analysis
Calling contexts as shape graphs
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
stack heap
Concrete assembly call stack modelled in a separating shapegraph together with the heap
◮ one node per activation record address◮ explicit edges for frame pointers◮ local variables turn into activation record fields
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 38 / 47
Instantiation to interprocedural analysis
Calling contexts as shape graphs
main l
fix
ret
c
p
?
∅
fix
ret
c
p
?
fix
ret
c
p
?
fp
fp
fp
∅
∅
?
3
8
11
2
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
stack heap
Concrete assembly call stack modelled in a separating shapegraph together with the heap
◮ one node per activation record address◮ explicit edges for frame pointers◮ local variables turn into activation record fields
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 38 / 47
Instantiation to interprocedural analysis
Inference of a call-stack inductive structure
Second and third iterates: a repeating pattern
main
fix
fix
fp
fp
0x0l
p
c
p
c
prev
nextprev
next
list
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
Computing an inductive rule for summarization: subtraction
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 39 / 47
Instantiation to interprocedural analysis
Inference of a call-stack inductive structure
Second and third iterates: a repeating pattern
main
fix
fix
fp
fp
0x0l
p
c
p
c
prev
nextprev
next
list
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
Computing an inductive rule for summarization: subtraction◮ subtract top-most activation record
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 39 / 47
Instantiation to interprocedural analysis
Inference of a call-stack inductive structure
Second and third iterates: a repeating pattern
main
fix
fix
fp
fp
0x0l
p
c
p
c
prev
nextprev
next
list
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
Computing an inductive rule for summarization: subtraction◮ subtract top-most activation record◮ subtract common stack region
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 39 / 47
Instantiation to interprocedural analysis
Inference of a call-stack inductive structure
Second and third iterates: a repeating pattern
main
fix
fix
fp
fp
0x0l
p
c
p
c
prev
nextprev
next
list
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
Computing an inductive rule for summarization: subtraction◮ subtract top-most activation record◮ subtract common stack region◮ gather relations with next activation records: additional parameters◮ collect numerical constraints
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 39 / 47
Instantiation to interprocedural analysis
Inference of a call-stack inductive structure
Second and third iterates: a repeating pattern
main
fix
fix
fp
fp
0x0l
p
c
p
c
prev
nextprev
next
list
main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
Computing an inductive rule for summarization: subtraction
Inferred inductive rule
stk(β1, β2)
fix::ctx
β1 β2
U−→
fix
stk(β0, β1)
ctx
β0
β1
β2
c
pprev
next
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 39 / 47
Instantiation to interprocedural analysis
Inference of a call-stack summary: widening iterates
Fixpoint at function entry:
first iterate:main
fix
fix
fp
fp
0x0l
p
c
p
c
prev
nextprev
next
list
second iterate:main
fix
fix
fix
fp
fp
fp
0x0l
p
c
p
c
p
c
prev
nextprev
nextprev
next
list
widened iterate:main
fix
fix
stk(β2, β3) stk(β0, β1)
fix⋆
0x0
β0
β1
β2
β3
l
c
p
c
p
prev
next
prev
next
list
Fixpoint reached
Fixpoint upon function return:
◮ function return involves unfolding of stack summaries◮ simpler widening sequence: no rule to infer
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 40 / 47
Perspectives and implementation
Outline
1 Introduction
2 A parametric abstraction
3 Static analysis algorithms
4 Instantiation to the analysis of low-level programs
5 Instantiation to interprocedural analysis
6 Perspectives and implementation
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 41 / 47
Perspectives and implementation
Foundations of our analysis
Concrete level choices have a huge impact on the analysisComponents that do no actively contribute to abstraction(summarization)
◮ separation: allows simpler analysis algorithms◮ contiguous regions: very concrete model
High expressive power of inductive structures◮ wide range of structures with or without pointer/numerical relations◮ though, not adapted for everything: arrays, graphs...
Induction in the abstraction and in the analysis◮ unfolding (aka focus, local abstraction): case analysis◮ widening (aka folding): reverse operation and support for induction
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 42 / 47
Perspectives and implementation
References
Abstraction: [SAS’07], [POPL’08]
Analysis algorithms: [SAS’07], [POPL’08]
C memory model: [ESOP’10], [NSAD’10]
Call stack abstraction: [POPL’10]
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 43 / 47
Perspectives and implementation
Implementation
Xisa prototype (eXtensible Inductive Shape Analyzer)
takes user-supplied inductive definitions [SAS’07,POPL’08,ESOP’10]
infers inductive definitions for call-stack [POPL’11]
Example size (locs) time (ms) peak ∨ iters
list reverse 19 7 1 3list remove element 27 16 4 6list insertion sort 56 21 4 7
dll copy 50 53 2 3dll insert 40 38 2 4binary search tree find 23 10 2 4binary search tree insert 150 83 5 5
distribution over arithmetic trees 41 144 14 2move negations up in arith. trees 120 488 38 2
scull driver 894 9 710 16 4
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 44 / 47
Perspectives and implementation
Infering appropriate inductive definitions
Choice of inductive definitions
User-supplied in Xisa
Ideally: it should be automatic
Solution 1: scan type definitions and assume no sharing◮ e.g., type definition t, with one pointer to type t: assumed list...◮ sound yet imprecise; the analysis will fail in presence of sharing
Solution 2: infer inductive definitions◮ generalize the call-stack inference algorithm◮ should apply well to constructor code, e.g. in object oriented
programs◮ instance of a cofibered abstract domain [Venet, SAS, 1996]
two levels: one for inductive rules, and one for numerical values
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 45 / 47
Perspectives and implementation
Ongoing works and extensions
Decide implication between different, folded inductive predicates
an abstract interpretation algorithm
Internal reduction:convert equivalent abstract predicates
Generalize inductive definitions inference scheme:e.g., for object languages, use constructor code
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 46 / 47
Perspectives and implementation
Longer term: towards modular heap abstraction
Notion of reduced product (non separating conjunction):Ongoing Master internship by Antoine Toubhans
Splitting combination (separating conjunction)...
1
“high”b
12
“low”
0x0
-
(old msg)
(invalid)
8
“middle”b
-
(old msg)
(invalid)
otherdata-structures
abstraction
contiguous region, sub-store
splitting into two sub-regions
sorted listspecific
abstraction
cells of the form?
(old msg)?
independentabstraction
actual
memorystate
abstractdomains
combination
simple
abstractdomains
MemCAD project: five years contract, Post-Doc positions
Xavier Rival (INRIA) Abstract domains for the static analysis of programs manipulating complex data-structuresAugust, 26th. 2011 47 / 47