abstraction and approximation via abstract interpretation:
DESCRIPTION
a systematic approach to program analysis and verification. Giorgio Levi Dipartimento di Informatica, Università di Pisa [email protected] http://www.di.unipi.it/~levi.html. Abstraction and Approximation via Abstract Interpretation:. Abstract interpretation. - PowerPoint PPT PresentationTRANSCRIPT
1
Abstraction and Approximation via Abstract
Interpretation:a systematic approach to program analysis
and verification
Giorgio Levi
Dipartimento di Informatica, Università di Pisa
http://www.di.unipi.it/~levi.html
2
Abstraction and approximation
two relevant concepts in several areas of computer science (and engineering)
– to reason about complex systems– to make reasoning computationally feasible
3
Abstract Interpretation(Cousot & Cousot, POPL 77 & 79)
a 20-years old technique to systematically handle abstraction and approximation
– born to describe (and prove correct) static analyses (for imperative programs)
– popular mainly in declarative paradigms
– viewed today as a general technique to reason about semantics at different levels of abstraction
– successfully applied to distributed and mobile systems and to model checking
– recently applied to program verification
4
Abstract Interpretation, Semantics, Analysis
Algorithms how abstract interpretation is often used in
static program analysis
– a semantics– an analysis algorithm developed by ad-hoc
techniques– the A.I. Theory (definition of an abstract domain) is
used to prove that the algorithm is correct, i.e., that its results are an approximation of the property to be analyzed
5
Abstract Interpretation, Semantics, Analysis
Algorithms the abstract interpretation I like
– a semantics– an abstract domain designed to model the
property to be analyzed– the A.I. Theory is used to systematically derive the
abstract semantics– the analysis algorithm is exactly the computation
of the abstract semantics and is correct by construction
6
Abstract InterpretationTheory in 4 Steps
concrete and abstract domain the Galois insertion abstract operations from the concrete to the abstract semantics
7
Concrete and Abstract Domains
two complete partial orders– the partial orders reflect precision
smaller is better concrete domain (C(C, ,
– CC has the structure of a powerset abstract domain (A(A, ,
– each abstract value is a description of “a set of” concrete values
8
The Sign Abstract Domain
concrete domain (P(Z)(P(Z), , – sets of integers
abstract domain (Sign(Sign, ,
0-
top
0 - +
bot
0+
9
Galois insertions(C, (A, : A C (concretization)
:C A (abstraction)
monotonic
xCx xyAyy
mutually determine each other
C A
10
The sign examplesign(x)
– , if x= bot– {y|y>0}, if x= +– {y|y0}, if x= 0+– {0}, if x= 0– {y|y0}, if x= 0-– {y|y<0}, if x= -– Z, if x= top
signy) = glb of– bot , if y= – - , if y {y|y<0}– 0- , if y {y|y0}– 0 , if y {0}– 0+ , if y {y|y 0}– + , if y {y|y0}– top , if y Z
0-
top
0 - +
bot
0+
11
Abstract Operations
the concrete semantic evaluation function is defined in terms of primitive semantic operations fi on C
for each fi we need to provide a corresponding fi
defined on A
fi must be locally correct, i.e. x1,..,xn
Cfi x1,..,xn) fix1,..,xn
the optimal (most precise) abstract operator is
fiy1,..,yn)= fi y1,.., yn the operator is complete (precise) if x1,..,xn Cfi
x1,..,xn)) fix1,.., xn
12
Times Sign
bot - 0- 0 0+ + top
bot bot bot bot bot bot bot bot - bot + 0+ 0 0- - top 0- bot 0+ 0+ 0 0- 0- top 0 bot 0 0 0 0 0 0 0+ bot 0- 0- 0 0+ 0+ top + bot - 0- 0 0+ + toptop bot top top 0 top top top
13
Plus Sign
bot - 0- 0 0+ + top
bot bot bot bot bot bot bot bot - bot - - 0- top top top 0- bot - 0- 0- top top top 0 bot 0- 0- 0 0+ 0+ top 0+ bot top top 0+ 0+ + top + bot top top 0+ + + toptop bot top top top top top top
14
The Sign example Times and Plus are the usual operations lifted to
P(Z) both Timessign and Plussign are optimal (hence
correct) Timessign is also complete (no approximation) Plussign is necessarily incomplete
sign(Times({2},{-3})) = Timessign(sign({2}),sign({-3}))
sign(Plus({2},{-3}))
Plussign(sign({2}),sign({-3}))
15
The Abstract Semantics F = concrete semantic evaluation function
– if we start from a standard semantic definition, the lifting to the powerset (collecting semantics) is simply a conceptual operation
lfp F = concrete semantics F= abstract semantic evaluation function
– obtained by replacing in F every concrete semantic operation by a corresponding (locally correct) abstract operation
lfp F = abstract semantics
global correctness (lfp F) lfp F
– the abstract semantics is less precise than the abstraction of the concrete semantics
16
Where does the approximation come from?
incomplete abstract operations more execution paths in the abstract control flow
– the abstract state has not enough information to make deterministic choices
– conditionals, pattern matching, etc.
the set of resulting abstract states is turned into a single abstract state, by performing an abstract lub operation
17
Approximation in abstract Sign computations
concrete state [x={3}] if x>2 then y:=3 else y:=-5; concrete state [x={3}, y={3}]
abstract state [x=+] if x>2 then y:=3 else y:=-5;
– the abstract guard “can be both true and false”
– both paths need to be abstractly evaluated
– the two resulting abstract states are merged by performing a lub in Sign
abstract state [x=+,y=top]
0-
top
0 - +
bot
0+
18
(lfp F) lfp F
why computing lfp F?
lfp F cannot be computed in finitely many steps – steps are in general
required lfp Fcan be computed in
finitely many steps, if the abstract domain is finite or at least noetherian– no infinite increasing
chains
static analysis 1– noetherian abstract domain
– termination, approximation
static analysis 2– non-noetherian domain
– termination via widening– further approximation
comparative semantics– non-noetherian domain
– abstraction without approximation
(completeness) (lfp F) lfp F
19
Static Analysis
abstract domain and Galois connection to model the property
(possibly optimal) correct abstract operations
F
the analysis is the
computation of lfp F
if the abstract domain is non-noetherian, or
if the complexity of lfp Fis too high– use a widening operator
– which effectively computes an (upper) approximation of
lfp F
one example later
20
Comparative Semantics (lfp F) lfp F
none of the two fixpoints is finitely computable
useful to reason about different semantics and to systematically derive more abstract semantics
– choice of the most adequate reference semantics for analysis and verification
Fis less expensive than F in computing the observable property modeled by – no junk
hierarchy of transition systems semantics (P. Cousot, MFPS 97)
– trace, big-step operational, denotational, relational, predicate transformer, axiomatic, etc.
systematic reconstruction of several fixpoint (TP-like) semantics
for (positive) logic programs (Comini, Levi & Meo, Info. & Comp. 00)
– applied in Pisa also to finite failure & infinite computations, CLP, CCP, Prolog, -Prolog, sequent calculi
21
Polymorphic type inference
in ML-like functional languages
the ad-hoc solution– Milner’s algorithm, specified
by a set of inference rules an elegant, well-understood,
universally accepted semantic formalization
the systematic derivation via abstract interpretation– provides a better insight– shows how to improve
precision
inference rules mimic the concrete semantics
– in the structure of the semantic evaluation function
– in the semantic domains (environment)
semantics to well-typed programs only introduces approximation
– if true then 2 else false the most general polymorphic type
for recursive functions is not computable
– the inferred type may not be the most general
– some type-correct programs cannot be typed
22
Polymorphic type inference via Abstract
Interpretation abstract values = pairs of
– a term (with variables) type expression
– a constraint (on variables) set of term equalities in
solved form
partial order (on terms only)– top is “no type”– bottom is “any type”
– t1 t2, if t2 is an instance of t1
the domain is non-noetherian– there exist infinite increasing
chains
an optimal abstract operation
– +((t1,c1),(t2,c2)) = (int, c1c2 {t1=int ,t2=int})
abstracting functional values
– the concrete semantics E x.e = v. E e (bind x v)
– the abstract value let v1 = newvar() in let (v2,c2) = E e (bind x (v1,{})) in (v1c2 -> v2,c2)
23
Recursion and Widening
the abstraction of recursive functions is similar to the one of regular functions, but– a fixpoint computation is
required– the first approximation of
the abstract value of the function is bottom
since the abstract domain is non-noetherian the fixpoint computation may diverge
the solution in Milner’s algorithm– take the results of the first two
iterations and compute their lub (most general common instantiation, computed through unification)
– if the lub is top (unification fails), the program is not typable (type error)
this is exactly a widening operator, which returns a (correct) upper approximation of the lfp (Furiesi, Master Thesis Pisa. 99)
24
How to improve precision straightforward!
– perform at most k iterations of the fixpoint computation– if we reach a fixpoint, it is the most general type– otherwise, we apply Milner’s widening to the last two results
we succeed in typing more functions we get more precise types
one example (due to Cousot) CaML
– # let rec f f1 g n x = if n=0 then (g x) else (((((f f1)(fun x -> (fun h -> (g(h x)))))(n - 1))(x))(f1));; This expression has type ('a -> 'a) -> 'b but is here used with type 'b
our answer (the fixpoint is reached in 3 iterations)– val f : ('a -> 'a) -> ('a -> 'b) -> int -> 'a -> 'b = <fun>
25
Abstract Interpretation vs. Type Systems
Patrick Cousot has reconstructed a hierarchy of type systems for ML-like
languages by using abstract interpretation (Cousot, POPL 97)
type systems have been proposed to cope with other static analyses
(strictness, various properties related to security)
type systems need to be proved correct wrt a semantics abstract semantics are systematically derived from the semantics and
are correct by construction two related open interesting problems
– comparison of the two approaches from the viewpoint of expressive power and analysis precision (and complexity)
– definition of methods to automatically translate formalizations from one approach to the other
26
Static Analysis of Logic Programs
abstract Interpretation is very popular in logic languages– the computational model has several opportunities for optimization, based on analysis
results– it is (relatively) easy to define, because the standard semantics is collecting and the
concrete domain (sets of substitutions) is quite simple
several important properties (groundness, freeness, sharing, depth(k)) for some properties (i.e., groundness and sharing) a lot of different
abstract domains– techniques to compare the relative precision of abstract domains– important results on techniques for the systematic design of abstract domains, which
can probably be applied to other paradigms as well abstract compilation in CLP (Giacobazzi, Debray & Levi, JLP 95)
– the program is transformed by syntactically replacing concrete constraints by abstract constraints
– the abstract computation is a standard CLP computation on a different constraint system
27
Groundness in Logic Programs
CLP version concrete domain
– (P(Eqns), ), sets of sets of term equations in solved form concrete semantics
– the CLP version of the s-semantics (answer constraints) 3 abstract domains
– G: the property of being ground – DEF: functional groundness dependencies– POS: DEF + some disjunctive information
lattices shown in the 2-variables case
28
An example the program
p(X,Y) :- X=a.p(X,Y) :- Y=b.q(X,Y) :- X=Y.r(X,Y) :- p(X,Y),q(X,Y).
the concrete semanticsp(X,Y) -> {{X=a},{Y=b}}q(X,Y) -> {{X=Y}}r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
in the concrete semantics of r– both the arguments are bound
to ground terms (in all the answer constraints)
29
The domain G
the programp(X,Y) :- X=a.p(X,Y) :- Y=b.q(X,Y) :- X=Y.r(X,Y) :- p(X,Y),q(X,Y).
the concrete semanticsp(X,Y) -> {{X=a},{Y=b}}q(X,Y) -> {{X=Y}}r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
G(v) =– , if v= bot– {e Eqns | X is bound to a ground term in e }, if v= X
X is always ground– Eqns, if v= true no groundness information
X
true
X&Y
bot
Y
the abstraction of the concrete semanticsp(X,Y) -> trueq(X,Y) -> truer(X,Y) -> X & Y
the abstract programp(X,Y) :- lubG (X,Y).q(X,Y) :- true.r(X,Y) :- glbG (p(X,Y),q(X,Y)).
the abstract semantics
p(X,Y) -> trueq(X,Y) -> truer(X,Y) -> true
30
The domain Def
the programp(X,Y) :- X=a.p(X,Y) :- Y=b.q(X,Y) :- X=Y.r(X,Y) :- p(X,Y),q(X,Y).
the concrete semanticsp(X,Y) -> {{X=a},{Y=b}}q(X,Y) -> {{X=Y}}r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
Def(v) =– {e Eqns | X = Y e},
if v= X Y X is ground if and only if Y is ground
– {e Eqns | X = t e and Y occurs in t}, if v= X Y if X is ground then Y is ground
– …..
X
true
X&Y
bot
Y
Y→ X
X↔Y
X→ Y
the abstraction of the concrete semanticsp(X,Y) -> trueq(X,Y) -> X Yr(X,Y) -> X & Y
the abstract programp(X,Y) :- lubDef (X,Y).q(X,Y) :- X Y.r(X,Y) :- glbDef (p(X,Y),q(X,Y)).
the abstract semantics
p(X,Y) -> trueq(X,Y) -> X Yr(X,Y) -> X Y
31
The domain Pos
the programp(X,Y) :- X=a.p(X,Y) :- Y=b.q(X,Y) :- X=Y.r(X,Y) :- p(X,Y),q(X,Y).
the concrete semanticsp(X,Y) -> {{X=a},{Y=b}}q(X,Y) -> {{X=Y}}r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}
pos(v) =– {e Eqns | either X or Y is bound to a ground
term in e }, if v= X Y either X or Y is ground
– ….X
true
X&Y
bot
Y
Y→ X
X↔Y
X→ YX∨Y
the abstraction of the concrete semantics
p(X,Y) -> X Yq(X,Y) -> X Yr(X,Y) -> X & Y
the abstract programp(X,Y) :- lubpos (X,Y).q(X,Y) :- X Y.r(X,Y) :- glbpos (p(X,Y),q(X,Y)).
the abstract semantics
p(X,Y) -> X Yq(X,Y) -> X Yr(X,Y) -> X & Y
32
Program Verification byAbstract Interpretation
F = concrete semantic evaluation function– concrete enough to observe the property
the property is modeled by an abstract domain
(A, and a Galois insertion , F= abstract semantic evaluation function S = specification of the property, i.e., abstraction
of the intended concrete semantics partial correctness: (lfp F) S
sufficient partial correctness condition:
F( S)S(Comini, Levi, Meo & Vitiello, JLP 99)
– if F(S)S
– then S is a prefixpoint of
F
– hence
(lfp F) lfp F S
33
Analysis and Verification F = concrete semantic evaluation function F= abstract semantic evaluation function
analysis: compute lfp F
– we need to compute a fixpoint– noetherian domain or widening
S = specification of the property verification: prove F(S)S
– no fixpoint computation and no need for noetherian domains
– finite representation of the specification
– decidability of
34
Completeness of the proof method
assume the program to be partially correct wrt the specification S, i.e., (lfp F) S
then there exists another specification T, stronger than S, such that the sufficient condition F(T) Tholds
we have shown that the proof method is complete if and only if the abstraction is complete (precise) (Levi & Volpe, PLILP 98)
35
Proof methods and the reference semantics
one can be interested in establishing different kinds of properties
– of the final state– of the relation between initial and final state– of the relation between specific pairs of
intermediate states, e.g., procedure calls– ….
there exist different corresponding proof methods
all the proof methods are instances of F(S)Sfor different choices of the concrete semantic evaluation function F
F can be derived by abstract interpretation (comparative semantics) from the most concrete semantics, i.e., a trace semantics
first step of abstraction = choice of the “right” semantics
in (positive) logic programming, all the known verification methods have been reconstructed (Levi & Volpe, PLILP 98)
36
Making F(S)S effective
extensional specifications– typical analysis properties described by
noetherian abstract domains– properties such as polimorphic types which
lead to finite abstract semantics, even with non-noetherian domains
intensional specifications, specified by means of assertions
assertions are abstract domains
– a formula describes the set of all the concrete states which “satisfy” it (concretization)
– if the specification language is closed under conjunction, it is easy to define the abstraction function
we can derive an abstract function F, which computes on the domain of assertions and instantiate the verification condition (Comini, Gori & Levi, MFCSIT 00)
the relation on the domain of assertions must be decidable
an open problem: completeness of the abstract semantics associated to a specific language of assertions
37
Specification Languages decidable specification languages have been proposed for functional
programming and logic programming– one example: a powerful language which allows one to express several properties of
logic programs, including types, freeness and groundness (Volpe, SCP 00)
experiments using Horn Clause Logic as specification language (Comini, Gori & Levi, AGP 00)
– it is not decidable– most of the verification conditions can be proved without using a theorem
prover simple logic program transformation techniques, which can be
partially supported by an automatic tool
38
Systematic abstract domain design
once we have the abstract domain, the design of the abstract semantics is systematic abstract interpretation theory provides results which can be exploited to make the design
of abstract domains (more) systematic– to compare and combine domains– to refine domains so as to improve their precision
reduced product (of domains A and B)– allows one to analyze (together) the
properties modeled by A and B– often delivers better results than the
separate analyses because of domain interaction
lifting to the powerset (and disjunctive completion )– roughly speaking, transform A
into P(A)– better precision
no loss of information in computing lub’s
39
Operations on Abstract Domains
several useful operators on abstract domains (refinements)– a survey in (File’, Giacobazzi & Ranzato, ACM Comput. Surv. 96)
linear completion (Giacobazzi, Ranzato & Scozzari, SAS 98)
– functional dependencies modeled by linear implication
reconstruction of all the known domains for groundness analysis (Scozzari, SAS 97)– DEF = G -> G – POS = DEF -> DEF– POS = POS -> POS
optimality of POS
successfully applied to other domains for logic programs
– types (Levi & Spoto, PLILP 98)
– sharing and freeness (Levi & Spoto, PEPM 00)
open problems
– do the same refinements apply to other programming paradigms?
– can refinements be extended to domains of assertions and to type systems?
40
Abstract Interpretation
a mathematically simple and solid foundation for– comparative semantics– static analysis– verification
a methodology for the systematic derivation of– abstract domains from the property
complexity issues? quantitative analyses?
– abstract semantics from the concrete semantics and the abstract domain