the benefits of exposing calls and returns rajeev alur university of pennsylvania concur/spin,...

41
The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Post on 21-Dec-2015

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

The Benefits of Exposing Calls and Returns

Rajeev AlurUniversity of Pennsylvania

CONCUR/SPIN, August 2005

Page 2: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Software Model Checking

Code Abstractor Model

Verifier Specification

Yes

Counter-example

Predicate abstraction

Control flow graph +Boolean vars

(Pushdown automata)

On-the-fly explicit stateSymbolic fixpoint evaluation

Temporal logics/Automata Regular!

Observables

Page 3: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Regular specifications not expressive enough

Classical Hoare-style pre/post conditions If p holds when procedure A is invoked, q holds upon return Total correctness: every invocation of A terminates Integral part of emerging standard JML

Stack inspection properties: security/access control If a setuuid bit is being set, process root must be in the call

stack

Inter-procedural data-flow analysis An expression e is very busy at a control point p if on all

paths from p, e will be used before any of its variables (possibly local in current procedure) are modified

Need matching of calls with returns, or finding pending calls, or local paths --- Context-free properties!

Page 4: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Checking Context-free Specifications

Obstcales Context-free languages are not closed under intersection Checking context-free properties against context-free models

is undecidable

However, many such properties are verifiable Existing work in security that handles some stack inspection

properties [JMT99,JKS03] Adding assert statements in the program (with additional

local variables, if needed), and then checking regular properties (e.g. reachability) amounts to checking context-free properties

Inter-procedural data-flow analysis algorithms [RHS95]

Page 5: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Exposing Calls and Returns

What’s common to the checkable properties? Both model and property have their own stack, but the two

stacks are synchronized and grow/shrink together!

As a generator, program exposes its calls and returns, and as an acceptor, property pushes on calls and pops on returns

Formalization of this intuition: Visibly Pushdown Languages

A surprisingly robust class of languages with properties like the regular languages and potentially many applications

Page 6: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Talk Outline

Visibly Pushdown Languages

Temporal Logic CaRet and Model Checking

Ongoing Work

References:

Visibly pushdown languages

Alur, Madhusudan; STOC’04 A temporal logic of nested calls and returns

Alur, Etessami, Madhusudan; TACAS’04 Congruences for visibly pushdown languages

Alur, Kumar, Madhusudan, Viswanthan; ICALP’05

Page 7: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Context-free Languages: Recap (1/2)

Given an alphabet , a language L is a set of finite words over

A pushdown automaton (PDA) has a finite control and a stack, and while reading a word, it can push/pop stack symbols while updating control state

Configuration of a PDA: control state + a string of stack symbols

Acceptance defined by empty stack or final state A language L is a context-free language (CFL) if

there is a pushdown automaton that accepts it Sample CFLs

All regular languages Set of words of the form an bn, for some n Set of words with equal number of a and b symbols

Non-CFL: Set of words of the form an bn cn

Page 8: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Context-free Languages: Recap (2/2)

Alternative characterization: Context-free grammars Natural and popular for defining syntax Nondeterministic PDAs are more expressive than

deterministic ones Emptiness of a PDA solvable in polynomial-time Closed under union, but not closed under

intersection or complementation Language inclusion, emptiness of intersection

undecidable Applications: Parsing, Natural language processing,

Program analysis…

Page 9: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Exposing Calls and Returns

Pushdown alphabet: partitioned into 3 disjoint sets

Σ = push pop local

Pushdown words: finite words over pushdown Σ

A visibly pushdown automaton over a pushdown alphabet Σ is a pushdown automaton that pushes a symbol onto the stack on a symbol in push

pops the stack on a symbol in pop

cannot change the stack on a symbol in local

Key: Stack size at any time is determined by the input wordbut not control state or stack content

Page 10: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

A language L is a VPL over a pushdown alphabet Σ, if there is a visibly pushdown automaton that accepts it (acceptance by final state)

The language {an bn | for some n} VPL if a is in push and b is in pop

Not a VPL for other partitions

The language of words with equal number of a and b symbols is not a VPL (independent of partition)

Every regular language L is a VPL independent of partitioning

Dyck language (words with well-balanced parantheses) is a VPL provided left/right parantheses are in push/pop resp

Visibly pushdown languages (VPL)

Page 11: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

VPLs in Program AnalysisProgram

bool P(u:int) { global int x; local int y; … a: if Q { x = (x+y) }; …}

bool Q { local int y; if { …. y++; return 1;} else return P(x)}

To figure out whether the expressione=(x+y) is very busy at program point a,

push = {call-p, call-q}pop = {ret-p, ret-q}local = {used-e, mod-x, mod-y, skip}

Executions are pushdown words, e.g.

call-q, skip, mod-y, ret-q, used-e, mod-x, skip, ret-p

Set of executions starting at a location a is a VPL: LaSet of executions in which e is very-busy is also a VPL: Le e is very busy at a if La is included in Le

Analysis

Page 12: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

VPLs for Document ProcessingXML Document

<conference> <name> CONCUR 2005 </name> <location> <city> San Francisco </city> <hotel> Stanford Court </hotel> </location> <sponsor> CISCO </sponsor> <sponsor> Microsoft </sponsor> …</conference>

Pushdown alphabet

push = {<name>, <location>, …}pop = {</name>, </location>, …}local = {San Francisco, Microsoft, …}

A document d is a pushdown word

Sample Query: Find documents related to conferences sponsored by Microsoft inSan Francisco

Specify query as a VPL: LAnalysis: Membership question Does document d satisfy query L ?

Use VPAs instead of tree automata!(typically, no recursion, but only hierarchy)

Query Processing

Page 13: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Note: can’t combine languages wrt different partitions

Closed under intersection: Given two VPAs A and B, build a product C accepting intersection of L(A) and L(B) State of C: (state of A, state of B) Stack symbol of C: (stack symbol of A, stack symbol of B) C can simulate the stacks of A and B together

Closed under union Closed under complementation Closed under concatenation and Kleene-* Closed under partition-preserving homomorphisms

Closure Properties

Page 14: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Given a nondeterministic VPA A, we can construct a deterministic VPA B that accepts the same language and has size exponential in A

Potentially useful for building runtime monitors for checking program executions, and online algorithms for XML query processing

VPLs are a subclass of DCFLs (languages defined by deterministic PDAs) DCFLs not closed under union Equivalence problem for DCFLs decidable, but complex

Determinization

Page 15: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Determinization of nondeterministic automata uses subset construction: a state R of B is a set of states of A (the states that A can be, having read the word w so far)

Subset construction does not apply to stack But we can do subsets of summaries: if w is a well-

matched word, (q,q’) is a summary of A on w, if A can go from (q,$) to (q’,$), where $ is stack bottom

More precisely, if w=w1c1w2c2…cnwn+1, where ci’s are calls and wi’s are well-matched words, then after reading w, determinized automaton B has Stack is (Sn,Rn,cn),….(S1,R1,c1)$

Control state is (Sn+1,Rn+1)

Ri = Set of all states A can be in after w1c1…wi

Si= Set of all summaries of A on the segment wi

Determinization: Sketch of the construction

Page 16: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Emptiness: Given a VPA A, is its language empty? Same as for PDAs: Polynomial-time complete (cubic)

Language inclusion (or equivalence): Given VPAs A and B, is language of A contained in that of B? Determinize B, take its complement, take product with A,

and test for emptiness Exponential-time complete Recall: Inclusion is PSPACE-complete for (nondeterministic)

finite automata, and undecidable for PDAs

Decision Problems

Page 17: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

VPL Properties Summary

Regular

CFL

DCFL

VPL

L Emptiness Inclusion

Yes Yes Yes

Yes No No

No

UndecPTIME

No Yes PTIME Undec

Yes Yes Yes PTIME Exptime

NLOG Pspace

Page 18: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Pushdown Words as Binary Trees

Let w = i5 c1 i1 c2 i4 i3 i3 r2 c1 i1 r1 r1 i5 i3

i5

c1

r1i1

c2

i4

i3

i3

i5

i3r2

Stack trees

r1

c1

i1

Page 19: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

VPL: Connection to tree languages

Tree-language characterization:

Let L be a set of pushdown words and let ST(L) be the set of stack trees that correspond to L.

Theorem: L is a VPL iff ST(L) is a regular tree language

Note: It is well-known that the set of parse trees correspondingto a context-free grammar is a regular tree language

Page 20: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Finite word automata that can jump

Let w = i5 c1 i1 c2 i4 i3 i3 r2 c1 i1 r1 r1 i5 i3

Summary Automata Finite-state automaton that reads pushdown word While reading a call, can send a copy to matching return (q,a) is a set of pairs of states if a is in push

Nondeterministic summary automata are expressively equivalent to VPAs

Deterministic VPA (= VPL)> Deterministic summary automata > Deterministic tree automata (on stack trees)

Page 21: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Robustness: Alternative Characterizations

Monadic second order logic with matching predicate (x,y) means x is a call and y is matching return Sample formula:

forall x. if p(x) then exists y,z. ( q(y) and x<y<z and (x,z) )

Thm. MSO + matching predicate interpreted over pushdown words is expressively equivalent to VPLs

Thm: Every CFL is a homomorphic image of a VPL

Context-free grammar based characterization Two types of non-terminals V0 (matched words) and V1

All productions are of the formX a if X is in V0 then a must be local

X a Y b Z a is a call, b is a return, Y is in V0

if X is in V0 then Z must be in V0

Page 22: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

“Regular-like” properties continue..

Congruences and minimization (Myhill-Nerode Theorem) central to theory of regular languages

Given a language L, for well-matched words u and v, define u ~L v iff for all words x and y, xuy in L iff xvy in L

Theorem: A language L of well-matched words is a VPL iff the congruence ~L is of finite index

Minimization No unique minimal deterministic VPA in general, but… Minimization of (single-entry) RSMs (i.e. procedural

boolean programs) possible. Partitioning into k procedures/modules is adequate to get canonicity!

Page 23: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

ω-VPL - Extension to Infinite Words

A Büchi VPA: VPA over infinite pushdown words A word is accepted if along a run, the set F is seen infinitely often

ω-VPL – class of languages accepted by Büchi VPAs

ω-VPL is closed under all Boolean operations

Characterization using regular trees and MSO characterization hold.

However, ω-VPLs are not determinizable!

Let L be set of all words such that the stack is repeatedly bounded i.e. for some n, the stack depth is n infinitely often. L is an ω-VPL but there is no deterministic (Muller) VPA for it Language inclusion and equivalence are still decidable

Page 24: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Talk Outline

Visibly Pushdown Languages

Temporal Logic CaRet and Model Checking

Ongoing Work

Page 25: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Software Model Checking

Code Abstractor Model

Verifier Specification

Yes

Counter-example

Predicate abstraction

Control flow graph +Boolean vars

(Pushdown automata)

CaRet/VPAs

Observables

Page 26: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Abstracting Software

int x, y;

if x>0 { ……. y=x+1 .……}else { …… y=x+1 ……}

bx: x>0

by: y>0

Program

bool bx, by;

if bx { ……… by=true ………}else { ………… by={true,false} ……….}

Boolean Program

Page 27: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Abstracting Modular Programs

main() { bool y; … x = P(y); … z = P(x); …}bool P(u: bool) {…return Q(u);}bool Q(w: bool) { if … else return P(~w)}

A2

A1

A3

A2

A2

A3

A3

A1Entry/Inputs

Exit/outputs

Box (function-calls)

Program Recursive State Machine (RSM)/ Pushdown automaton

Page 28: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Linear-time Propositional Temporal Logic

Q ::- p | not Q | Q or Q’ | Next Q | Always Q | Eventually Q | Q Until Q’

Interpreted over (infinite) sequences.Models of an LTL formula is a -regular language.Useful for stating sequencing properties:

If req happens, then req holds until it is granted: Always ( req → (req Until grant) )

An exception is never raised: Always ( not Exception )

Page 29: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET

CARET: A temporal logic for Calls and Returns Expresses context-free properties

A

B

C

A

Global successor used by LTL

………….

Page 30: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET

CARET: A temporal logic for Calls and Returns Expresses context-free properties

A

B

C

D

Global successor used by LTL

………….

Local successor: Jump from calls to returns Otherwise global successor at the same level

Page 31: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET

CARET: A temporal logic for Calls and Returns Expresses context-free properties

A

B

C

A

Global successor used by LTL

………….

Local successor: Jump from calls to returns Otherwise global successor at the same level

Page 32: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET

CARET: A temporal logic for Calls and Returns Expresses context-free properties

A

B

C

A

Global successor used by LTL

………….

Local successor: Jump from calls to returns Otherwise global successor at the same level

Local path

Page 33: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET

CARET: A temporal logic for Calls and Returns Expresses context-free properties

A

B

C

A

Global successor used by LTL

………….

Local successor: Jump from calls to returns Otherwise global successor at the same level

Caller modality: Jump to the caller of the current module Defined for every position except top-level ones

Page 34: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET

CARET: A temporal logic for Calls and Returns Expresses context-free properties

A

B

C

A

Global successor used by LTL

………….

Abstract successor: Jump from calls to returns Otherwise global successor at the same level

Caller modality: Jump to the caller of the current module Defined for every position except top-level ones

Caller path gives the stack content!

Page 35: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

CARET Definition

Syntax: Q ::- p | not Q | Q or Q’ |

Next Q | Always Q | Eventually Q | Q Until Q’

Local-Next Q | Local-Always Q Local-Eventually Q | Q Local-Until Q’

Caller Q | CallerPath-Always Q CallerPath-Eventually Q | Q CallerPath-Until Q’

Local- and Caller- versions of all temporal operators All these operators can be nested

Page 36: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Expressing properties in Caret

Pre-post conditions:If P holds when A is called, then Q must hold when

the call returns

Always ( (P and call-to-A) Local-Next Q )

A

PQ

Integrating Manna/Pnueli-style reasoning for reactive computationswith Hoare-style reasoning for structured programs

Page 37: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Expressing properties in Caret

If A is called with low priority, then it cannot access the file Always ( call-to-A and low-priority

Local-Always ( not access-file ) )

Alow-priority

Ahigh-priority access-file

Page 38: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Expressing properties in Caret

Stack inspection properties

If a variable x is accessed, then A must be on the call stack Always ( access-to-x

CallerPath-Eventually call-to-A )

access-to-x

A

Page 39: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Model checking CARET

Given: A (boolean) recursive state machine/ visibly pushdown automaton M A CARET formula Q Model-checking: Do all runs of M satisfy the specification Q?

CARET can be model-checked in time that is polynomial in M and exponential in Q.

|M|3 . 2O(|Q|)

Complexity class same as that for LTL !Generalization of Vardi-Wolper construction

Page 40: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Model-checking CARET: Intuition

The specification matches calls and returns of the program, so the runs of the program and models of the formula are both visibly pushdown languages

Given M and formula Q, Build a Buchi pushdown automaton that accepts words exhibited by M that satisfy (not Q) Check this pushdown automaton for emptiness Construction builds on the classical tableaux for LTL

s, Q1

sPush s and Q1

Local-Next Q1 Pop s and Q1 ;

Check Q1

Page 41: The Benefits of Exposing Calls and Returns Rajeev Alur University of Pennsylvania CONCUR/SPIN, August 2005

Conclusions and Ongoing Work

Exposing calls and returns lets you hide the stack! VPLs seem robust and adequate to model software

analysis problems VPL-triggered research

Dynamic logic with VPL (Loding,Serre) Visibly pushdown games (Loding,Madhusudan,Serre) XML query processing (Pitcher) Third-order Algol with iteration (Murawski,Walukiewicz)

Active area of current research DTDs, XML, and query languages Branching-time logics, Fixpoint calculus, and visibly

pushdown tree automata (Alur, Chaudhuri, Madhusudan) Expressive completeness of temporal operators Implementing a model checker for VPL monitors