a denotational framework for data flow analysis

Acta Informatica 18, 265-287 (1982)

�9 Springer-Verlag 1982

A Denotational Framework for Data Flow Analysis

Flemming Nielson

Department of Computer Science, James Clerk Maxwell Building, The King's Buildings Mayfield Road, Edinburgh EH9 3JZ, GB

Summary. It is shown how to express data flow analysis in a denotational framework by means of abstract interpretation. A continuation style formulation naturally leads to the MOP (Meet Over all Paths) solution, whereas a direct style formulation leads to the M F P (Maximal Fixed Point) solution.

O. Introduction

In this paper data flow analysis is treated from a semantic point of view. Data flow analysis is formulated in a denotational framework and in doing so the method of abstract interpretation is used.

Data flow analysis ([1, 7, 15]) associates properties (data flow information) with program points. The intention is that a property associated with some point must be satisfied in every execution of the program. This is because data flow information usually is used for applications like transforming a program to improve its efficiency. Therefore semantic considerations are needed to ensure the safeness of using the data flow information, e.g. to assure that the program transformation is meaning preserving. Data flow analysis is usually treated in an operational approach that is not syntax-directed, although there are exceptions (e.g. [13]). In some papers, e.g. [3] and [2], the semantics of data flow analysis is considered in this setting.

Here a denotational approach will be used where semantic functions are "homomorphisms" from syntax to denotations. The motivation for doing so is two-fold. First, the relative merits of an operational versus a denotational approach are not obvious. Previous formulations of data flow analysis in a denotational setting ([5, 4]) have been more ad hoc than the existing operational methods. Also the connection with the usual solutions (MFP and MOP) has not been established. In this paper these two issues are addressed. A second motivation for the denotational approach is to use the data flow information to validate program transformations. The literature on data flow analysis and "program optimization" hardly considers the issue at all. It turns out ([12, 11])

0001-5903/82/0018/0265/$04.60

266 F. Nielson

that the approach to data flow analysis developed in this paper forms a suitable platform for doing so.

In Sect. 2 it is described how to systematically develop non-standard denotational semantics specifying data flow information. This is achieved by a series of non-standard semantics leading to a formulation in continuation style. In Sect. 3 it is shown that this formulation yields the M O P (" meet over all paths") solution. Also the MF P (" maximal fixed point") solution is considered and it is shown how it can be obtained using a direct style formulation. Section 4 contains the con- clusions.

1. Preliminaries

This paper builds on concepts from denotational semantics and data flow analysis (including abstract interpretation). The presupposed knowledge of data flow analysis is modest and some of the essential concepts will be reviewed before they are used. In defining semantic equations the notation of [14] and [10] is used, although the domains are not complete lattices but cpo's (as in [9]). Complete lattices are used when data flow analysis is specified. Below some fundamental notions and non-standard notation (__> __>, - - , - t > , - m > , - c > , - a > ) are explained.

A partially ordered set (S, _ ) is a set S with partial order ___, i.e. _ is a reflexive, antisymmetric and transitive relation on S. For S' _c S there may exist a (necessarily unique) least upper bound US' in S such that V s ~ S : ( s ~ l lS '~Vs '~S ' : s~_s ' ) . When S ' = {s 1, s2} one often writes s l u s 2 instead of [AS'. Dually a greatest lower bound RS ' may exist. A non-empty subset S'_~ S is a chain if S' is countable and sl, s2~S' =~ (sl ~_s 2 v s2E s O. An element s~S is maximal if V s' ~S: (s'~_s =~ s'=s). A partially ordered set is a cpo if it has a least element ( l = J-IS) and any chain has a least upper bound. It is a complete lattice if all US ' and r-Is' exist; then there also is a greatest element (T = IIS). The word domain will be used both for cpo's and complete lattices, and elements of some domain S are denoted s, s', sl etc. A domain is f la t if any chain contains at most 2 elements, and is of f ini te height if any chain is finite.

Domains N, Q and T are flat cpo's of natural numbers, quotations and truth values. F rom cpo's $1 . . . . . S, one can construct the separated sum $1 + ... +S , . This is a cpo with a new least element and injection functions in S i, enquiry functions E S/and projection functions [S i. The cartesian product S 1 x ... x S, is a cpo with selection functions ~,i. The cpo S* of lists is {( )} + S + ( S x S)+ .. . . Function :~ yields the length of a list, function -~i removes the first i elements and w concatenates lists. The complete lattice S O is obtained from a set S by adjoining least and greatest elements. Byr is meant the power-set of S with set inclusion as partial order. Sometimes a set is regarded as a partially ordered set whose partial order is equality.

All functions are assumed to be total. For partially ordered sets S and S' the set of (total) functions from S to S' is denoted S - t > S'. The set of monotone (isotone) functions from S to S' is denoted S - m > S ' and consists of those

f ~S - t > S' that satisfy s 1 _ s 2 ~ f (sl)-- f (s2). A function f ~S - t > S' is continuous if f ( I IS")=I I { f ( s )]s~S"} holds for any chain S"c_S whose least upper bound

A Denotational Framework for Data Flow Analysis 267

exists. The set of continuous functions from S to S' is denoted by S - c > S'. A function f ~ S - t > S' is additive (a complete-u-morphism) if f (l lS") = U { f (s) l s~S"} for any subset S" _ S whose least upper bound exists. The set of additive functions from S to S' is denoted by S - a > S'. Any subset of S - t > S' is partially ordered by f l ~ - j 2 ~ V s ~ S : fl(s)EJ2(s), and if S' is a cpo or complete lattice the same holds for S - t > S', S - m > S', S - c > S', S - a > S'.

An element s ~S is a f ixed point of f s S - t > S if f ( s )= s. When S is partially ordered it is the least f ixed point provided it is a fixed point and s' = f(s') ~ s'~_ s. For S a complete lattice and f ~ S - m > S the least fixed point always exists and is given by L F P ( f ) = i--]{s[f(s)r-s}. If f ~ S - c > S then L F P ( f ) = F I X ( f ) where F I X ( f ) = l l { f " ( l ) l n > O }. If S is only a cpo but f e S - c > S then F I X ( f ) still is the least fixed point of f .

The symbol = = denotes a continuous equality predicate (S x S - c > T), whereas = is reserved for true equality. So true = = • is • whereas true = • is false. When S is of finite height it is assumed that s x = = s 2 is _1_ if one of s 1, s 2 is non-maximal and equals sl =s2 otherwise. Similarly > > is the continuous ex- tension of > (the predicate 'greater than or equal to ' on the integers). The condi- tional t --*s~, s 2 is sl, s 2 or l depending on whether t is true, false or • By f ly~x] is meant 2z . z = = x ~ y , f ( z ) . Braces { and } are used to construct sets and to enclose continuations but the context should make clear which is intended.

2. The Framework

In this section it is shown how data flow analysis can be expressed in a denotational framework. The development is performed for a toy language and consists of defining a series of non-standard semantics:

7W6-] ........ ~ ~ 7 7 n ~

Fig. 1.

All the semantics are in continuation style. Semantics ind is the desired formulation of data flow analysis. Semantics sto is a store semantics [10] that is taken to be the canonical definition of the language considered. It will later be motivated why store semantics and continuation style are used.

The toy language consists of commands (syntactic category Cmd), expressions (Exp), identifiers (Ide), operators (Ope) and basic values (Bas). The syntax of commands and expressions can be deduced from Table 1, that will be explained shortly. For the remaining syntactic categories it is left unspecified. The syntactic categories are viewed as being sets or partially ordered sets whose partial order is equality.

Store Semantics

The store semantics (sto) is given by Tables 1, 2 and 3 together. The semantic functions cg (for commands) and 8 (for expressions) are defined in Table 1. The domains and auxiliary functions needed in Table 1 are defined in Tables 2 and 3.

268 F. Nie lson

Table 1. Semant ic funct ions

~ � 9 c > C - c > C

cg l lcmdt ; cmd2]] c =

<gllcmdl]] {~fllcmd2]] {c}}

cgllide: = e x p ] c =

gllexpll {assign []-ide]l {c}}

cs exp then cm d l else cmd2 fill c =

g~expll {cond(~ll-cmdl]l {c}, cgllcmd2]l {c})}

llwhile exp do cm d odll c =

F IX (2 c'. g llexpll {cond( cg llcmdll {c'}, c)})

:g llwrite exp]l c =

g llexp]l {write {c}}

cg llread idell c =

read {assign llidell {c}}

o ~ � 9 C - c > C

[exp~ ope exp211 c =

811expl]l {d~llexp2]l {apply llope]l {c}}}

d ~ llide]] c =

content llide]] {c}

g [[basll c =

push llbas]J {c}

Table 2. Some d o m a i n s and auxil iary funct ions

D o m a i n s Val = T + N + . . . + {" nil"} Env = Ide - c > Val Inp = Val* Out = Val* Tern = Val* Sta = E n v x Inp x Out x T e m

Occ = N* Pla = O c c x Q

Auxi l ia ry funct ions

apply [lope[[ �9 C - c > C apply [lope[[ = do (V apply llope], B apply llope]l)

V apply llope]] �9 Sta - c > T ("verify") V apply llope]l = 2 (env , inp, out, t e m ) . # t e r n > > 2 B apply llope]] �9 Sta - c > Sta (" body") B apply llope]] = 2 (env , inp, out , t e rn ) .

( env , inp, out, @) llope]] (tern,L2, t e m + l ) ) w

ass ign llide]], content llide]], push llbas]], read, write defined similarly

Auxi l iary funct ions used in the condi t ional

V cond �9 Sta - c > T (" verify") V cond = 2 (env , inp, out , t e m ) - ~ t e m > > 1 --* tem~ 1 E T, false

S c o n d � 9 Sta - c > T ("select") S cond = 2 (env , inp, out , t e r n ) . ( t eml 1) I T

B cond �9 Sta - c > Sta (" body" ) B c o n d = 2 (env , inp, out, t e rn ) . ( env , inp, out, tern ? 1)

values env i ronmen t s inputs ou tpu t s t empora ry result s tacks states

occurrences places

A D e n o t a t i o n a l F r a m e w o r k for D a t a F l o w Ana lys i s

Tab le 3. I n t e r p r e t a t i o n sto

269

D o m a i n s

S = Sta s ta tes

A = O u t + {" e r r o r " } a n s w e r s

C = S - c > A c o n t i n u a t i o n s

C o n s t a n t

fin ~ C fin = ). ( env , inp, ou t , t e m ) - ou t in A

A u x i l i a r y func t i ons

c o n d c C • C - c > C

c o n d (cl , c2) = 2 s t a . F c o n d (sta) ~ IS c o n d ( s t a ) ~ c l , c2] (B c o n d sta)

, " e r r o r " in A

a t t a c h ~ P la - c > C - c > C

a t t a c h (pla) c = c

d o ~ ( S t a - c > T) • (Sta - c > S t a ) - c > C - c > C

d o ( V g , Bg) c = 2 s t a . Vg(sta) ~ c ( B g sta), " e r r o r " i n A

(For the present ignore domains Occ and Pla and function attach.) The definition of apply (and push) makes use of the undefined semantic function (9 for operators (and ~ for basic values). Domain Sta is the domain of states. The connection between identifiers and their values is established without using locations but this is not important for the development to go through. The component Tern is used to hold temporary results arising during evaluation of expressions. This is done by the ordinary method of evaluating expressions on a stack. As an example consider apply[ope]](c)(env, inp, out, tem). If two temporary results are on top of tern they are replaced by their result and the new state is supplied to c. If tern is not of the expected form an error occurs. The reader acquainted with store semantics [10] should find it straight-forward to supply the omitted definitions.

Table 3 is labelled "interpretation sto" because it is essentially by supplying replacements for Table 3 that the remaining semantics (col, sts and ind) are defined. To indicate which semantics is meant, a suffix may be used, e.g. cg sto. So Cg sto[[cmd~ (fin-sto) (2 ide �9 "nil" in Val, inp, ( ) , ( ) ) specifies the effect of executing program cmd with input inp. For Tables 1, 2 and 3 it is presumably obvious that the functionalities shown are correct. For the remaining semantics the proofs of correctness of functionalities are straight-forward and therefore omitted.

Collecting Semantics

For data flow analysis purposes an important concept is that of associating information with a program point. This concept is not expressed in the store semantics and it is therefore convenient to add it to sto, yielding a collecting semantics (col).

270 F. Nielson

A program point can be specified by a tuple (occ, q)ePla. All the tuples to be considered will be maximal (with respect to the partial ordering). Occurrences (like occ) are used to label nodes of the parse tree. The root is labelled ( ) and the i'th son of a node labelled occ is labelled occ w (i) . The quotation q is useful for specifying whether the program point is to the left or to the right of the node:

o c c

t' t <OCC w "L"> <occ w < i > , "R">

Fig. 2.

In the usual view of parse-trees the nodes are not labelled by occurrences. To be able to let the semantic equations associate information with program points (represented by tuples like (occ, q)) it is necessary to supply the appropriate occurrence as an additional argument to the semantic functions ~ and ~. In this way occurrences are used much like the positions of [6]. Furthermore the semantic equations must be augmented with functions associating information with the program points. The function attach (occ, q) is used for associating information with the program point specified by (occ, q). The result of performing these changes is sketched in Table 4. The proofs of Theorems 4 and 5 benefit from the chosen placement of attach. Because of attach in Table 3 the store semantics can be defined using Table 4 instead of Table 1.

In the collecting semantics (Tables 4, 2 and 5) the information to be associated with a program point is the set of states the program can be in whenever control reaches that point. Domain A - col = P l a - c >~(Sta) is used for that and attach -col(pla) is defined accordingly. This choice of A is not the only possibility, e.g. A =(Pla x Sta)* of [5] can be used instead. While this may give more information it is not needed for a large class of data flow analyses (including those usually handled by means of abstract interpretation). It will later be discussed why the continuations ( C - col) are not continuous.

Intuitively sto and col are closely related, because essentially only domain A is different. A connection is formally expressed by:

Theorem 1. For any staeSta, cmdeCmd and out6Out: <g sto ~cmd]] ( ) fin-sto sta = out in A 3 sta'eSta: sta'J,4 =ou t A cg CO1 [[cmd]] ( ) fin-col sta ( ( ) , " R " ) = {sta'}

A Denotational Framework for Data Flow Analysis

Table 4. Modified semantic functions

271

c g e C m d - c > O c c - c > C - c > C

cr Rif exp then cmdl else cmd 2 ti]l occ c =

attach (occ, "L"){

~ [[exp]l occw (1){

cond(~g I[cmdl] occ w ( 2 ) {attach (occ, " R " ) {c}}

, cg [[cmd2]l occ w ( 3 ) {attach (occ, " R " ) {c}})}}

~while exp do cmd od] occ c =

attach (occ, "L"){

FIX(2c ' �9 d" I[exp]] occw (1){

cond (~ I]-cmd]l occ w ( 2 ) {c'}

, attach (occ, " R " ) {c})})}

the remaining clauses are changed similarly to the one for expl ope exp2

g ~ E x p - c > O c c - c > C - c > C

8 nexpx ope exp2]] occ c =

attach (occ, "L"){

8 [[expl] occ w (1){

8 I[exp2]l occ w (3){

apply [[ ope]l {

attach (occ, " R " ) {c}}}}}

the remaining clauses are changed similarly to the one for exp~ ope exp2

Table 5. Interpretation col

Domains

S = Sta

A = P l a - c >~(Sta)

C = S - t > A

Constant

f ineC f i n = Z

Auxiliary functions

c o n d e C x C - c > C

cond (cl, c2) = 2 s ta . V cond (sta) -~ IS cond (sta) -~ cl, c2] (B cond sta), 3_

a t tache P l a - c > C - c > C

attach (pla) c = 2 s ta . c (sta) ~ 3_ [{sta}/pla]

do e(Sta - c > T) x (Sta - c > S t a ) - t > C - c > C

do(Vg, Bg) c = 2 s ta . Vg(sta)-~c(Bg sta), l

Proof. The proof is by structural induct ion [14] making use of the predicate P - C s C - s t o x C - c o l - t > {true, false} defined by P - C ( c - s t o , c - c o l ) = V s t aeS ta : V o u t e O u t :

[c - sto (sta) = out in A r

sta' e Sta: sta' $4 = out ^ c - col(sta) ( ( ( ) , " R " ) ) = {sta'}].

272

By structural induction on cmd (and likewise exp) it is shown that

P - C ( c - sto, c - co l ) /x occ #:( )

=> P - C(Cs sto [[cmd]l occ c - sto, cg col I[cmd]] occ c -col) .

Similarly, by case analysis of cmd it can be shown that

P - C(Cg sto [[cmd]] ( ) fin - s t o , cg col I[cmd]] ( ) f i n -co l )

using that

P - C (attach - sto ( ( ( ) , "R ")) {fin - sto}, attach - c o l ( ( ( ) , "R") ) {fin - col}).

The interesting case of the proof is that of the while loop. Abbreviate a

g [cl] = 2 c 2 .8 [[exp]] occw ( 1 ) {cond(Cg [[cmd]] occw (2 ) {c2}, cl) }

and assume P - C(c 1 - s t o , c a -col) . It is to be shown that

P - C(FIX ( g - s to [c 1 - sto]), FIX (g - col [cl - col])).

It is easy to establish Yn>0: P - C ( (g - s t o [c 1 -sto])"_t_, ( g - c o l [c 1 - c o l ] ) " / ) but this does not immediately yield the result. The result follows from

V sta 3 no V n > n o : (g - s t o [ c I - sto])" • sta = (IA {(g - sto [-C 1 - - sto])" • I n > 0}) sta

and a similar equation for col. To establish the above equation it suffices to show (dropping suffix sto)

V sta: [(g [cl])" • sta # • ~ r e 2 : (g [cl])" c2 sta = (g [cl])" l sta].

This is because c 2 =g [cl] l implies that (g [cl])" c 2 is (g [cl]) "+ 1 L. Proof is by induction in n and the case n = 0 is trivial so consider the induction step. It is straight-forward to see that (g [ c l ] ) "+ l (c z ) ( s ta ) independently of c2 is either • "e r ro r" in A, cl(sta' ) or (g [ct])" (c2)(sta") where sta' and sta" are independent o f c z . In the first 3 cases the equation is immediate and in the latter case it follows from the induction hypothesis for n. []

Example . Consider analysing the program x.'= 1 with some input inp~Inp. Abbre- viate env = 2 ide. "ni l" inVal and a - col = ~ col[[x." = 1]] ( ) fin - col (env, inp, ( ) , ( ) ) . The result of analysing x. '= 1 is a - col, and

a - c o l = a t t a c h ( ( ) , "L" ) {

attach ( ( 1 ) , "L" ) {push [1]] {attach ( ( 1 ) , " R " ) {

assign [I-x~ {attach ( ( ) , " R " ) {fin}}}}}} (env, inp, ( ) , ( ) )

This means that

a - col ( ( ) , "L" ) = a - col ( ( 1 ) , "L" ) = {(env, inp, ( ) , ( ) )}

a - c o l ( ( 1 ) , " R " ) = {(env, inp, ( ) , (1 inVal))}

a - c o l ( ( ) , " R " ) = {(env [1 inVal/x], inp, ( ) , ( ) )}

and a - col(pla) =0 otherwise. []

F. Nielson

For typographical reasons go, is written g [cl]


Static Semantics

There are two ways in which the col lect ing semant ics is no t the des i red fo rmula t ion of da t a flow analysis . One is tha t the p r o g r a m is only executed for one pa r t i cu la r input ra ther than a set of inputs (e.g. Inp). This is r emedied by the s tat ic semant ics (sts) to be cons idered below. Ano the r is tha t sets of states are cons idered ra ther than a p p r o x i m a t e descr ip t ions of sets of states. The la t ter will be r emedied by the induced semantics.

In the s tat ic semant ics (Tables 4, 2 and 6) d o m a i n S and funct ions do and cond have been changed. The in ten t ion with d o ( V g, Bg) (c) is to supply to c a set of t r ans fo rmed states. Exc luded f rom cons ide ra t ion are states tha t wou ld no t be suppl ied to the con t i nua t i on in the col lect ing semantics. The cond i t iona l cond is modi f ied so as to " t r a v e r s e " bo th the t rue and false b r anch with an a p p r o p r i a t e set of states. The pa r t i a l answers f rom the two branches are then combined . The connec t ion be tween col and sts is expressed by the t he o re m below. A more or less s imilar resul t appea r s in I-2] but there for an ope ra t iona l semantics.

Theorem 2. For all c m d ~ C m d and s~r

cg sts [[cmd]l ( ) f i n - s t s s - - I I {~ col ~cmd]] ( ) f i n - c o l s t a l s t a~s}

Proof. The p r o o f is by s t ruc tura l induc t ion m a k i n g use of the p red ica te P - C s C - col x C - s t s - t > {true, false} defined by P - C ( c - col, c - sts) =VsEC~(Sta): c - s t s ( s ) = U { c - c o l ( s t a ) l s t a s s } . By s t ruc tura l induc t ion on cmd (and l ikewise exp) it is shown tha t P - C ( c - c o l , c - s t s ) ~ P - C ( C g c o l l [ c m d ] l o c c c - c o l ,

sts I[cmd]] occ c - sts). The p r o o f of the while case makes use of the fact tha t for a comple te lat t ice II {U {xi, j l i~I} I j e J } = LJ {El {x M I j~J} l i~I} which follows f rom the fact [-14] tha t [_J{l l{x iaI ieI} I j e J } = LJ(u { { x i a l i ~ I } [jeJ}).

A more de ta i l ed p roo f of this theorem, as well as Theorems 3 and 4 later, can be found in [11] in a s l ightly different nota t ion . [ ]

Table 6. Interpretation sts

Domains

S = ~(Sta) A = Pla - c > ~(Sta) C = S - a > A

Constant

fineC fin=&

Auxiliary functions

condeC• C - c > C cond(q, c2)=2s, c I {B cond(sta)[ V cond(sta) =true ^ S cond(sta) = true ^ staes}

uc 2 {B cond(sta) l V cond(sta) = true ^ S cond(sta) =false ̂ staes}

attache P la - c> C - c > C attach(pla) c = ~ s . c(s)u/[s/pla]

do e(Sta - c > T) x (Sta- c > Sta) - t > C - c > C do (Vg, B g) c = ~. s. c {B g (sta) I Vg (sta) = true ^ sta e s}

274 F. Nielson

It appears to be mandatory to work from a store semantics in order for this theorem to hold. (In an approach based on a standard semantics presumably only _~ holds [11].) The techniques of [10] can be used to transform a standard semantics into a store semantics.

The Method of Abstract Interpretation

Data flow analysis is almost always expressed in terms of approximate descriptions of sets (of states or values) rather than the sets themselves. It is necessary to work with approximate information if data flow analysis is to be carried out automati- cally, because otherwise the data flow information might not be computable. The approximate data flow information must be "safe", i.e. describe a set that is not smaller than the precise set.

Example (preparing for constant propagation) Define the complete lattice Val = {valeVallval is maximal} ~ An element of

Val is to describe a set of values, i.e. an element of ~(Val). The method of abstract interpretation ([3, 2]) makes use of a concretization function 7 r E V a l - m > ~ ( V a l ) to express the subset of Val that some element of Val describes. A natural choice is yv(_l_)=r yv ( T)= Val and 7v(val)= {val} otherwise.

The abstraction function e v e ~ ( V a l ) - m > V a l can be used to approximate sets of values. It is natural to define ev (v) = [-] {val I ~'v (val) ~ v } because then ev (v) is the "bes t" approximation of the set v of values. As an example ev({true}) = t rue and ev({true, false})=-r. []

It is useful when the concretization function (y) and the abstraction function (e) are related as described by one of the two concepts defined below.

Definition. (~, 7) is a pair of semi-adjoined functions between partially ordered sets L and M iff

(i) e e L - m > M and y e M - m > L, (ii) 7 o ~ 2 l . I. Furthermore, (~, y) is a pair of adjoined functions [3] between L and M iff in

addition to (i) and (ii) also (iii) ~ o 7 ~- 2 m . m . [ ]

The pair (~v, Yv) of the example is a pair of adjoined functions between complete lattices ~(Val) and Val. Condition (ii) expresses the intention that a set of values is approximated by a not smaller set of values so that one can only make "errors on the conservative side" [1]. Condition (i) relates the partial orderings so that obtaining more information in ~(Val) = L corresponds to obtaining more information in Val = M and vice versa. Condition (iii) is usually satisfied but it is not needed for the substance of this development. An analysis of the concepts semi-adjoined and adjoined is given in [11]. Whenever y ~ M - m > L satisfies VM' _ M : y(RM') = rq {y(m)lmEM'} there is a unique o ~ L - m > M so that (~ ,y) is a pair of adjoined functions. If (~, y) is a pair of adjoined functions (between complete lattices L and M) then ~ = 2 I. [-I {m 17 (m)-7 l}, c~ is additive and y satisfies the condition displayed above [3].


There is an i m p o r t a n t difference be tween Val on the one h a n d and Val and N(Val) on the other. The par t i a l o r d e r of Val means " less defined t h a n " (in the sense of Scott), so a na tu ra l cond i t i on to impose is tha t the doma ins mus t be cpo's. The pa r t i a l o rders of Val and ~ ( V a l ) mean someth ing like " log ica l ly impl ies" , e.g. val 1 ----- val2 means tha t if a set of values is a p p r o x i m a t e l y descr ibed by va l l then val 2 is also a safe desc r ip t ion of the set. The cond i t ion tha t will be imposed is tha t the d o m a i n s are comple t e lattices. Whi le this can be weakened i t is i m p o r t a n t tha t a greates t e lement is present . This is con t r a ry to Val where a greates t e lement wou ld be artificial. I t is a lso this difference in pa r t i a l o rde rs tha t accounts for why the con t inua t ions of col are not cont inuous .

Induced Semantics

The stat ic semant ics is no t the des i red fo rmula t ion of da t a flow analysis because it does no t work with a p p r o x i m a t e descr ip t ion elements. Let S ta be a comple te la t t ice deemed to be more a p p r o x i m a t e than r Also let there be a concret i - za t ion funct ion 7 ~ Sta - m > ~ ( S t a ) and an abs t r ac t ion funct ion a E ~ ( S t a ) - m > S t a such tha t ( a , ? ) is a pa i r of s emi -ad jo ined functions. The induced semant ics (ind (~, ?)) is ob t a ined by modi fy ing the s tat ic semant ics in essential ly two ways. One is to use Sta ins tead of ~ ( S t a ) . T h e o ther is to redefine the auxi l iary funct ions so tha t their effect upon some sta is ob t a ined by first app ly ing the ana logous m a p p i n g of the s tat ic semant ics to 7(sta) and then app ly ing a to the result.

Table 7. Interpretation ind (~, 7)

Domains

S = Sta (~, ?) is a pair of semi-adjoined functions

A = Pla - c > S between complete lattices ~(Sta) and Sta

C = S - m > A

Constant

fin~C fin=•

Auxiliary functions

cond~Cx C - c > C cond (c 1, c2) = Z s. cl (~ {B cond (sta) I V cond (sta) = true A S cond (sta) =

true ^ staE?(s)})

,t c2 (~ {B cond (sta) I V cond (sta) = true A S cond (sta) = false A staE7(s)})

attach~ P l a - c > C-c> C attach(pla) c = ~t s. c (s) u / [s/pla]

do e (Sta - c > T) • (Sta - c > Sta) - t > C - c > C do(V g, Bg) c =2 s. c(cr {Bg(sta)l V g(sta)=true ̂ sta~7(s)})

Tables 4, 2 and 7 con ta in the detai ls . Tha t the induced semant ics is " sa fe" follows f rom:

Theorem 3. For all c m d 6 C m d and s t a ~ S t a "

sts [[cmd]] ( ) fin - sts (7 (sta))___ ? o (cg ind Ircmd]l ( ) fin - ind sta)

276 F. Nielson

whenever (a, Y) is a pair of semi-adjoined functions (between complete lattices ~(Sta) and Sta) and ind is ind (a, y).

Proof. The proof is by structural induction making use of the predicate P - C~ C - s t s x C - i n d - t > {true, false} defined by P - C(c - s t s , c - i n d ) = [ c - s t s o y ~ 2s ta . 7o(c-ind(sta))] . By structural induction on cmd (and likewise exp) it is shown that P - C (c - sts, c - ind) ~ P - C (cg sts [[cmd] occ c - sts, cg ind [[cmd] occ c - ind). The proof makes use of 2 f . y o f being monotone, that 7 o a___ 2 states �9 states and that continuations of sts are monotone. []

It is also possible to specify data flow analysis by means of approximate semantics other than ind (a, 7). Then an analogue of Theorem 3 can be used to relate such semantics [11].

Example (constant propagation). One way to specify the data flow analysis "constant propagat ion" is by the induced semantics ind (as, Ys). Domain Sta is ( I d e - c > Val) x Va]| where Va] | is ~ 1 " augmented with a greatest element (T). The concretization function ~s s Sta - m > N(Sta) is given by:

7s (env, tern) = {(env, inp, out, tern) e Stal

V ide e Ide: env l[ide]] e 7v (env [ide])/x (i)

[ t em= Y v Item r {_L, T } ̂ # tern =# tem/x (ii)

V j ~ { 1 . . . . . ~ tern} : tern Sj s 7v (tern Sj)]] } (iii)

For a state (env, inp, out, tern) to be one of those described by (env, tern) the environment env must be one of those described by env (condition (i)) and the temporary result stack tern must be one of those described by tem (conditions (ii) and (iii)). Conditions (i) and (iii) are reasonably straightforward. Condition (ii) is somewhat more "technical": The greatest element of V ~ | describes any stack in Val*, the least element of V ~ | describes no stacks in Val* and ( ~ 1 . . . . , val,) eVa] | describes some stacks of n elements. The abstraction function ase~(Sta) - m > Sta is defined by as(States)= [-] {sta [ 7s(Sta ) ~ states}. Then (as, Ys) is a pair of adjoined functions between complete lattices ~(Sta) and Sta.

In data flow analysis it is usually assumed that the description lattices are of finite height (to ensure computability of the data flow information). This is the case when Ide is finite. For an example constant propagation analysis consider the program x." = 1. Abbreviate env = 2 ide. "ni l" inVal and a - i n d =Cgind [Ix: = 1~ ( ) f i n - i n d (env, ( ) ) . The result of analysing x: = 1 then is a - i n d , and

a - i n d ( ( ) , "L" ) = a - i n d ( ( 1 ) , "L" ) = (env, ( ) )

a - ind ( ( 1 ) , " R " ) = (env, (1 inVal))

a - i n d ( ( ) , " R " ) = ( e n v [1 inVal/x], ( ) )

and a - ind (p l a )=_L otherwise. Note that information is obtained about the evaluation of expressions (here 1) without the need for converting the program to a sequence of one-operator assignments. This is because attach has been placed in the semantic clauses for expressions. []

In summary, the development performed has succeeded in formally validating the data flow information (specified by the induced semantics) with respect to


the collecting semantics (Theorems 2 and 3). This is essentially similar to what is done in [5], although the present development is more systematic: a data flow analysis (ind(~, 7)) is obtained merely by specifying an abstraction function (e) and a concretization function (7), such that the pair ((~, 7)) of functions is a pair of semi-adjoined functions. However, the semantics of a program is really given by the store semantics. None of the theorems (including Theorem 1) can be used to validate data flow information with respect to the store semantics. (A similar defect holds for [5].) It seems impossible to do so directly because two programs may have the same denotation in the store semantics and yet different denotations in the induced semantics. An indirect way of relating the collecting semantics (and by Theorems 2 and 3 also the induced semantics) to the store semantics is considered in [12] and [11] where the collecting semantics is used to validate program transformations.

3. The MOP and MFP Solutions

In this section the data flow information specified in section 2 is related to the traditionally considered MOP and MFP solutions. " I t appears generally true that for data flow analysis problem, we search for the [MOP] solution" [7], and Theorem 4 will show that U ind n-cmd~ ( ( ) ) ( f in - ind) ( s ) yields the MOP solution. By using a direct style formulation instead of the continuation style formulation it is possible to obtain the MFP solution (Theorem 5).

To compare the traditional (operational) approach to data flow analysis with that of Sect. 2 it is necessary to super impose a flow chart view upon programs. The flowcharts to be considered will have arcs to correspond to "places". The flowchart constructed from program cmd has unique entry arc ( ( ) , " E " ) , unique exit arc ( ( ) , " R " ) and is represented by a set of tuples of the form (pla 1, pla z, tf). Such a tuple is intended to express that when "going" from pla 1 to pla2 the data flow information is changed as specified by t f e T F = S - m > S . Thus tf is a transfer function [1] associated with the basic block (node) that has pla 1 leading in and pla 2 leading out. The basic blocks are smaller than is usual (in fact many are "empty") because attach functions have been placed as they have; while this can be remedied, the placement used makes the proofs of Theorems 4 and 5 less involved and simplifies the presentation below.

To construct a flowchart from a program the semantic functions FU and F~ are used (Tables 8, 2, 7 and 9). As an example consider the flowchart in Fig. 3 (ignoring the dotted rectangle). It has arcs labelled pla o . . . . . pla4 and basic blocks with transfer functions fro, ..., tf3 associated and is represented by fc= {(pla i, plai+l, tf i)10<i<3}. This representation is obtained from the program x :=l using the function FU. Since FUI[x: = 1]] ( ) is

F attach (pla 1) * FN n- 1 ] ( 1 ) * F do ( V assign I[x]], B assign [[xl]) * F attach (pla4)

and FS[1] l (1 ) is

F attach(pla2) �9 F do(Vpush[[1]], B push[Jill) �9 F attach(pla3)

the result of FU ind [Ix: = 1]] ( ) (plao, fro) is ((pla4, tf4), fc).

278

Table 8. Semantic functions specifying the flowchart

F. Nielson

F c g ~ C m d - t > O c c - t > F G

Fog [ i f exp then cmd 1 else cmd z fi]] o c t = F attach (occ, "L") * F g [[exp] occ w (1 ) * F cond(F~C[cmda]] occ w (2 ) *F attach (oct , " R ' )

, Fog [[cmd2]] occw ( 3 ) * F attach (occ, " R ' ) )

Fcg [while exp do cmd od]] occ= F attach (occ, " L ' ) * [Fo v iexp~ occ w ( 1 ) *F cond(FCg [cmdll occw (2) , F attach (occ, " R ' ) ) F fix ( ( o c t w (2) , " R ' ) , ( oc t w (1) , " L ' ) , ,t s . s)]

the remaining clauses are similar to that of exp~ ope exp2

F g ~ E x p - t > O c c - t > FG

F g Iiexp 1 ope exp2]] occ = F attach (occ, "L") * F d' gexp 1]] occ w ( 1 ) * f 8 I]'exp2]] occ w ( 3 ) * F do (V apply fiope]l, B apply lope]) * F attach (occ, " R ' )

the remaining clauses are similar to that of expt ope exp2

Table 9. Additions to interpretation ind (for F :4 and F ~)

Domains

T F = S - m > S

FC = ~'(Pla x Pla • TF) PI = Pla x TF

F G = P I - t > P I x FC

Combinator

transfer functions (representation of) flowcharts partial information flowchart generators

fgl *fg2 =2 pi. (fg2(fgx(pi)~l),H, fg2 (fg,(pi)$1)+2ufgl(pi)J.2)

Auxiliary functions

F cond~FG • F G - t > FG F cond(fgl, fg / )=2 (pla, t f ) . (fg2 (pla, tffotf)~ 1

, fgl (pla, tftotf)+2~fgz (pla, tffotf)+2)

tft = 2 s. ~ {B cond (sta) [ V cond (sta) = true ^ S cond (sta) = true ^ sta~ y (s)} tff = 2 s. c~ {B cond (sta) I V cond (sta) = true A S cond (sta) = false ̂ sta ~ 7 (s)}

F attache Pla - t > FG F attach(pla) = 2 (pla' , t f ) . ( (pla , 2s . s), {(pla', pla, tf)})

F d o ~ ( S t a - c > T) • ( S t a - c > S t a ) - t > FG F do(V g, Bg) = 2 (pla, t f ) - ( (pla , 2 s. ~ {Bg(sta)[ V g(sta) = true ^ sta Ey (tf(s))})

, r

F f i x sFG x (Pla x Pla x T F ) - t > FG (infixed) fg F fix (pla~, pla2, tf) =2 pi- (fg(pi)~ 1, fg(pi)$2 ~ {(pla 1, pla2, tf)})

H e r e t h e ( ) is t h e o c c u r r e n c e t h a t is n e e d e d a s u s u a l . T h e ( p l a o , f r o ) is n e e d e d

b e c a u s e t h e r e p r e s e n t a t i o n o f t h e b a s i c b l o c k t o w h i c h t f o is a s s o c i a t e d c a n o n l y

b e c o n s t r u c t e d o n c e p l a l is k n o w n . T h e r e f o r e F o v is d e f i n e d s o a s t o t a k e ( p l a o , f r o )


J, plao

plat

pla2

plas

pla~

-A

pla0 = ( ( ) , " E " ) plal = ( ( ) , "L")

pla: = ( ( 1 ) , "L") pla 3 = ( ( 1 ) , " R " ) pla4 = ( ( ) , " R " )

tfo = tft = tf4 = 2 s . s

tf 2 =As. ct {B push [1] stal V p u s h [ l ] sta =true ^ staeT(S)} tfa = ~ s. ct {B assign ~x] sta{V assign Ix ] sta = true a s t a 6y (s)}

F i g . 3 .

as a parameter and construct the relevant tuple ((pla o, plal, fro)), although it is only the part of the flowchart enclosed in the dotted rectangle that intuitively "corresponds" to x." = 1. In order for sequencing (by means of *) to work it is necessary to let FU produce not only a flowchart but also a tuple like the (pla4,tf4) above; indeed (plao, fro) could in principle have been produced by some FU [[...~ if x: = 1 had occurred in some context. One way to read (plao, fro) and similarly (pla4, tf4), is that since arc plao was traversed the (as yet unrecorded) transfer function tf 0 has been encountered.

In general the flowchart associated with program cmd is specified by FUind[[cmd~( )(plao, tfo)~2. As a further explanation of FU consider the clause for while exp do cmd od that is illustrated in Fig. 4.

. . . . . . . i

/ - [

I PIQo

tf~ I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

l plo~

Xs.s }

exp

[ pta6

cmd

t f t

I pLa~

Fig. 4

pla 1 =(occ , "L") pla 2 = (occ w (1) , "L") pla 3 = ( o c c w (1) , " R " ) pla4 = (occ w (2) , "L") pla 5 = (occ w (2) , " R " ) pla 6 = (oec, " R " )

tff, tft as in Table 9

280 F. Nielson

The dotted rectangles correspond to syntactic subphrases. The flowcharts for exp and cmd have not been shown in detail. The entire flowchart FOg indl[while exp do cmd od]] occ<plao, tfo>+2 is

{<plao, plal, fro)} F~ indl[exp] occw (plal , 2s. s>$2 u

{(pla3, pla6, tff>} u F ~ ind [[cmd]] occw (pla3, tft>+2

{<plas, pla2,2s, s>}

It is the tuple (pla 5, pla2,2s, s> that accounts for the "iterative nature" of the while construct.

Some properties of F ~ and F g are mentioned below. They are needed in the proofs of Theorems 4 and 5 and may help in giving a better understanding of FOg and F & The proofs of these properties are omitted since they are straight- forward (e.g. by structural induction). The formulation uses the phrase "pla is a descendant of occ", which means that pla+l =occw occ' for some maximal occ', and that pla$2~ {"L", "R"}. Consider FOg ind[cmd]] occ<pla o, fro> =(pi , fc> or similarly F g ind [[exp~ occ <plao, tf o > = (pi, fc>. Let (pla, pla', tf) be an arbitrary element offc, and assume (as will always be the case) that occ and pla o are maximal and that pla o is not a descendant of occ. Then,

�9 pi = ( ( o c c , "R">, 2s . s> �9 pla' is maximal and is a descendant of occ �9 pla is maximal and is either pla o or a descendant of occ, but it is not

(occ ,"R">. Intuitively this means that the flowchart fc is immediately left after traversing <occ, "R">.

�9 <plao, (occ, "L">, fro> is the only tuple offc with pla o as the first component. This means that the first arc of fc to be traversed is (occ ,"L"> and also that fc\{<plao, <occ, "L">, fro)} is independent of (plao, fro> (i.e. independent of "how the flowchart is entered").

The M O P Solution

For a flowchart fc with entry arc pla and description element s holding there let MOPS (fc, pla, s)e P l a - t > S denote the MOP ("meet over all paths") solution. It is defined by MOPS (fc, pla, s) = 2pla' . II {tf,(... (tfl(s)))[<pla o, pla 1, tfl) , ..., <pla,_ 1, pla,, tf,) Efc A pla o = pla A pla, = pla'}. This formulation essentially is that of [7]. One difference is that 1-7] uses [7 instead of Ll, i.e. uses the dual lattice, but this is not crucial [13]. Another is that MOPS (fc, pla, s) (pla) need not be s but is likely to be • The similarity between the approach of Sect. 2 and the traditional approach is expressed by:

Theorem 4. For all cmdeCmd, s~S and plaePla:

c~ indl[cmd]] ( ) f i n - ind s p la= MOPS (FOg indl[cmd~ (< >)(<<< ), "E" ) , ~.s. s))+2, << ), "E" ) , s)(pla)


where ind = ind (c~, 7) for ( ~, 7) a pair of semi-adjoined functions between complete lattices r and S.

Proof See Appendix 1. []

Note that both sides of the equality sign yield _1_ when pla is not maximal or not a descendant of occ.

The MFP Solution

Denote by MFPS (fc, pla, s)e P l a - t> S the MFP ("maximal fixed point") solution for flowchart fc with entry arc pla and description element s holding there. It is defined by MFPS(fc, pla, s)=LFP(step(fc, pla, s)) where step(fc, pla, s)~ ( P l a - t > S ) - m > ( P l a - t > S ) is the "data flow equations", i.e. specifies the effect of advancing data flow information along one basic block. It is defined by step(fc, pla, s)= 2a . 2 pla'. II {tf(a [-s/pla] pla")l(pla", pla', tf)~fc}. Here pla, pla' and pla" are going to be maximal so that a[s/pla] pla is s and a[s/pla] pla" is a(pla") otherwise. Again, this formulation is essentially the one given in [-7] (where greatest fixed points and [-] are used instead of least fixed points and II).

To obtain a denotational semantics specifying the MF P solution, new semantic functions DOg ind and Dg ind are defined (Tables 10, 2, 7, and 11). They are in "direct style" and the functionality for DCgind~cmd]] (occ) is S - m > S x A. From description element seS is obtained the tuple (s', a) where the component a specifies partial data flow information for cmd. When occ is maximal it is easy to show that s '=a(occ , " R " ) (see Appendix 2, Lemma 3).

In the clause for Dog ind[[if exp then cmd 1 else cmd 2 fi]] use is made of the function Dcond. The approximate description element supplied to "the rest of the program" is stus2, where s t is produced along one branch and s 2 along the other. This differs from cg ind where "the rest of the program" is analysed with s a and s 2 separately and only the resulting data flow information is combined.

Note in the clause for Dog ind [[while exp do cmd od~ that 2s' . Dg(s'~s) rather than e.g. 2s'- Dg(s') is used. Consider some iteration of the while construct where s holds at pla 2 (see Fig. 4). Then some s' will be computed to hold at pla 5. It is then s'us (not s') that holds at pla 2 and must be used in the next iteration if the MFP solution is to be obtained.

The connection between the MFP solution and Dog ind is given by:

Theorem 5. For all cmdeCmd, seS and plaePla:

DC~ ind[[cmd]] ( ) s~2 p la= MFPS(FC~ ind[cmd]](( ) ) ( ( ( ( ) , "E ' ) , 2s.s))$2, ( ( ) , "E") , s) pla

where ind=ind(ct , 7) for (ct, 7) a pair of adjoined functions between complete lattices ~(Sta) and S, where S is of finite height.

Proof When (e, Y) is as above then e is additive and hence continuous. Also 7 is continuous when S is of finite height. The only properties of (e, 7) to be used in the proof is that (e, 7) is a pair of semi-adjoined functions between complete

282

Table 10. Semantic functions in direct style

F. Nielson

D c g e C m d - c > O c c - c > S - m > S x A

DOg[if exp then cmd 1 else cmd 2 fi] occ=

D attach (occ, "L") *

O ~f[exp]] occ w ( 1 ) *

D cond(D cr ~[cmdl ] occ w ( 2 ) * D attach (occ, "R ")

, O cr i[cmd2 ~ occ w (3 ) * D attach (occ, "R "))

D cr [while exp do cmd od]l occ =

D attach (occ, "L" ) *

FIX (2 D g. 2 s. [D ~ I[exp]] occw (1 ) �9

D cond(D ~ [cmd]l occ w (2 ) * (2s'. Dg(s'us))

, D attach (occ, "R "))] s)

the remaining clauses are similar to that of expl ope exp2

D ~ E x p - c > O c c - c > S - m > S x A

Dgflexpl ope exp2]l occ=

D attach (occ, "L") *

Dr occw (1 ) *

D8 I[exp2]] occ w (3 ) *

D do(V apply I[ope]l, B apply lope]l) �9

D attach (occ, "R ")

the remaining clauses are similar to that of expi ope exp2

Table I 1. Additions to interpretation ind (for D cr and D ~)

Auxiliary functions

D c o n d ~ ( S - m > S • A ) • S • A ) - c > ( S - m > S • A)

D cond(dgl, dg2) = 2 s. dg I (tft(s))udg2 (tff(s))

fit, tff as in Table 9

D attach ~ P l a - c > (S - rn > S x A)

D attach (pla) = 2 s. (s, Z [s/pla])

D do e(Sta - c > T) x (Sta - c > Sta) - t > (S - rn > S x A)

D do (V g, B g) = 2 s. (ct {Bg(sta) I V g(sta) = true ^ staey(s)}, •

Combina to r , as in Table 9.

lattices ~(Sta) and S such that a and y are continuous. For the proof proper see Appendix 2. []

Both sides of the equality are _L when pla is not maximal. In the literature it is usually assumed that S is of finite height because then the MFP solution is computable (provided the transfer functions are). This holds even if S is infinite (as in the constant propagation analysis of Sect. 2). In contrast, the MOP solution need not be computable even when S is of finite height [8]. It follows from a theorem of J.B. Kam I-7] that the constant propagation analysis is an example of this.

That ~r ind [cmd]] ( ( ) ) ( f in - ind) (s )__D~ ind I[cmd]] (())(s)~2 follows from Theorems 4 and 5 when (e, 7) satisfies the conditions of(the proofo0 Theorem 5. This is because MOPS(fc, pla, s)__MFPS(fc, pla, s) (see e.g. [3]). Since sts is a

A Denotat ional Framework for Data Flow Analysis 283

special case of ind (with S = ~(Sta) and 0t = 7 = Z states, states) it is meaningful to consider D~f sts. The transfer functions of sts are additive and it therefore follows from [3] that cg sts [[cmd]] ( ( ) ) ( f in- sts) (s) = Dcg sts [[cmd~ ( ( ) ) (s) ~ 2.

4. Conclusion

It has been shown how data flow analysis can be specified in a denotational approach, by systematically transforming a store semantics to an "induced semantics" parameterized by a pair of semi-adjoined functions. It is claimed that the approach is no less systematic than existing operational methods. Semantic characterizations are with respect to the collecting semantics. To give a "semantic characterization" of the collecting semantics in terms of a store semantics it seems to be necessary to consider program transformations (as is done in [12, 11]).

The data flow analyses considered in this paper could be called "history- insensitive". In [11] a similar development yields a semantic characterization of "available expressions" (a "history-sensitive" analysis). Unfortunately the machinery required to establish the semantic characterization is somewhat complex. More research is needed to handle "live variables" (a "future-sensitive" analysis) and languages with arbitrary jumps and procedures.

The relationship between continuation-style and MOP and between direct- style and M F P shows that for data flow analysis purposes continuation-style is inherently more accurate than direct style. However, it should be kept in mind that the MOP solution can always be specified as the MFP solution to a different data flow analysis problem (whose MFP solution need not be computable) [3].

Acknowledgement. I should like to thank Neil Jones for his continuing interest in and helpful comments upon this work, and Patrick Cousot, Gordon Plotkin and Hanne Riis for helpful comments.

Appendix 1 : Proof of Theorem 4

In this appendix suffix ind is omitted. For the proof the function Close e FC • Pla x ~ ( S ) - t > P l a - t >~(S) is useful. It is defined by Close(fc, pla', S') (pla)=

{tf,(...(tfl(s)))lseS' ^ (plao, plal , t f t ) . . . . . (plan- 1, plan, t f , ) e

fc A pla o =pla ' A plan = pla}

Then MOPS(re, plao, so) ='1 pla. I II-Close(fc, plao, {So} ) (pla)] = I Io Close(fc, plao, {So} ).

The proof of the theorem is by a structural induction showing P - C m d ( c r n d ) and P - E x p ( e x p ) . Predicate P - C m d e Cmd - t > {true, false} is defined by P - Cmd (cmd) = V c e C: V S' _~ S:

pla' maximal and occ maximal and pla' not a descendant of occ ==~ II {~ I[cmdl] (occ) (c) (s) lseS'} = II {c(s) lsecl (occ, "R " )} ~ (ll o cl) where cl = C l o s e ( F f f [[cmd]] occ (pla ' , ,t s . s)$2, pla', S').

Predicate P - Exp is defined similarly. Clearly the theorem follows from P - Cmd (cmd) because fin = Z. Since this proof is long, only the most difficult case will be considered. It amounts to showing

P - C m d (while exp do crnd od) assuming P - C m d ( c m d ) and P - E x p ( e x p ) . So let occ be maximal

284 F. Nielson

and abbreviate plao, ..., pla 6 and fit, tff as in Fig. 4. Further abbreviate g [c] = 2 c'- gl {cond(gz c', g3 c)} where g l=Sn-exp] o c c w and gz=(g[fcmdll occw ( 2 ) and g3=attach(pla6). Similarly fg=fgl * F cond (fg2, fg3) where fgl = Fd~ [[exp]l occ w ( 1 ) and fgz = F ~ [[cmd]l occ w ( 2 ) and fg3 = Fat tach (pla6).

Lemma 1. For arbitrary c, c' ~ C and S' ~ S :

I_1 {g [c] c' sis ES'} = II {e(s)[s~cl(pla6) } t~ II {c'(s)[s~cl(plas) } ~(Uocl)

where cl =Close(fg (p la l , 2 s. s),L2, pla 1, S').

Proof. Abbreviate c12 IS"] = Close(fg2 (pla 3 , 2 s . s)~,2, pl%, S'). Then by P - C m d ( c m d ) , [..l{g2(c')(s)ls~S"} = U {c'(s)[sEcl2 [S ' ] plas}u(l_Jocl 2 [S"]). Abbreviate cl 3 IS"] = Close(fg 3 (pla3, 2 s. s)~2, pla3, S") so that kl {ga (c) (s)lseS"} = II {c(s)ls~cl a IS"] pla6} t~(IAo cl 3 [S']). Then

IA {cond(ga c', g3 c) slseS"} = IA{c'(s)lsecl 2 [ { t f t ( s ) l seS"}]p las} , , ( IAoc l2[{ t f t ( s ) l seS"}] ) , ,

II {c(s) lsecl 3 [{tff(s) l seS"}] pla6} u ( l l o cl 3 [{tff(s) l seS"}])

= IA {c'(s)l s~(cl~ [S"] ~, cl; [-S'-I) plas} u

IA {c(s) lse(cr2 [S"] ~,cl; IS"]) pla6} u (U o(cll [s"] ucl; [s"]))

where cl~ IS"] = Close (fgz (pla3, fit)J,2, plaa, S")= cl z [{tft(s) ls eS'}] and cl; IS"] =Close(fg3 (pla~, tff)~,2, pla3, S")=cl 3 [{ t f f (s) lseS '}] .

The last step is by the definition of Close and the properties of FOg mentioned in Sect. 3. It will now be shown that cl; IS"] ~ cl; IS"] = cl' IS"] where cl' IS"] = Close(Fcond (fg2, fg3)

(p la 3, 2 s . s),~2, plaa, S")=Close(fg z (p la 3, tft),[2 ~ fg 3 (pla3, tff)J,2, pla3, S"). Inequality r- is immediate since Close is monotone in its first argument. To establish the converse inequality consider any sequence (p la ; , pla't, tf'l) . . . . . (pla ;_ 1, pla'., tf,) that is in fg2 (pla3, tft)J,2ufg3 (pla3, tff)12 and has p la ;=pla3 . Such a sequence is said to be "ment ioned" in cl' IS"] (pla',). It is not difficult to see (using properties of F ~ ) that the sequence is entirely either in fgz (pla3, tft)$2 (hence "ment ioned" in cl~ [S"](pla~,)) or in fg3(pla3, tff)J,2 (hence "ment ioned" in c l ; [S"] (pla'~)). Thus c l ' [S ' ] (p la ; )

cll IS"] (pla;) ~ cl; IS"] (pla;). Next, abbreviate cl~ [S'] = Close(fg I (p la l , 2 s. s)J,2, pla~, S'), Then P - E x p ( e x p ) asserts (with S"

above being cll [S'] (pla3)) that kl {g [c] c'(s)lseS'}

= I I { c ' ( s ) l s e ( C l l [ S ' ] u c l ' [ c l l [ S ' ] p l a 3 ] ) p l a s } , ,

I f {c(s) lse(cl l [S'] u c l ' [cll [S'] pla3]) pla6} u

(I /o(cl~ [S'] t~cl' [cl~ [S'] (pla3)]))

where properties of F g and Close assure cl~ [S'] (plas)= r = cl~ [S'] (pla6). Finally, it must be shown that el lIS '] , ~cl'[cl~ IS'] pla3] =c l [S ' ] . Consider the inequality ~ .

Again the key to the proof is to consider a sequence (pla; , pla~, t q ) .... ,(pla'._~, pla;, f t .) that is "ment ioned" in cl [S'] (pla',) and has plab =plaa. If pla; is a descendant of occw ( 1 ) the properties of F d o and F ~f assert that all of pla[ . . . . . pla', are, so the entire sequence is "ment ioned" in cl~ IS'] (pla',). Otherwise there is some m < n so pla[ . . . . , pla~ are all the descendants ofocc w ( 1 ) (again by properties of F g and FOg). Then (pla~, plal, tf~ ) . . . . . (pla;._ ~, pla~,, tf,.) is "ment ioned" in cl~ [S'] (pla'). Also pla" =pla3 (because (pla~,, pla~,+ ~, tf~, + ~) is in fg2 (pla3, tft),~2 ~ fg3 (pla3, tff)~,2) so exp is left through pla~ and not entered again. Also (plato, p l a ' . ~, tf',,+ ~) . . . . . (pla'._ x, pla',, tf,) is "ment ioned" in cl' [cl 1 IS'] (pla3)] (pla'.). From this cla [S'] (pla',)wcl' [cl~ [S'] (pla3) ] (pla'.)_cl [S'] (pla',), The converse inequality is similar. []

In the next lemma the iterative nature of while is dealt with. The proof makes use of Luk~FC x Pla x ~(S) x Integer x FC - t > P l a - t > ~ ( S ) that is like Close but constrains the number of times

some part of the flowchart is traversed. It is defined by Luk(fc t, pla', S', k, fc~)(pla)=

{tf'. (... (tf'~ (s))) [ s~S' A (pla~, pla'~, tf~) . . . . .

(pla'._ l, pla'., tf.)efc~ ~ fc 2 A pla~) = pla'/x pla'. = pla

/x [{il (pla'~_ ~, pla'~, tf~)6 fc2}[ =k}.

A Denotat ional Framework for Data Flow Analysis 285

Lemma 2. For any c~C and S' ~_S:

L_I {FIX(g I-c]) s IsES'} = LI {c(s) l secl(pla6) } u(I Iocl)

where cl = Close ((fg F fix (p las , pla2, 2 s . s)) (p la l , 2 s . s)$2, plal , S').

Proof. Abbreviate lu [k] = Luk(fg (p la l , 2 s . s)$2, plal , S', k, {(plas, pla2, 2 s . s)}). It corresponds to executing the body of the while k + 1 times. Since cl = II {lu [k] I k > 0} it suffices to show by induction in k that kl {(g [c]) k + 13- sis e S'} = LJ {c (s) l s ~lu [k] (pla6) } ~ (11 o lu [k]) u . . . u (kl o lu [0]).

The case k = 0 is by Lemma 1. For the inductive step the hypothesis and Lemma 1 establish

t_l {(g [c]) ~+ ~ • sis eS'}

= I_l{c(s)lsecl '(pla6)}~(l_Jocl ') , ,( l lolu[k])t~.. . , ,( l lolu[O])

for cl' = Close(fg (plax, 2 s . s)$2, plal , lu [k] (pla6)). The result follows from cl' = lu [k + 1] which can be shown by the methods used in the proof of Lemma 1. []

F rom Lemma 2 P - Cmd(while exp do cmd od) easily follows.

Appendix 2 : Proof of Theorem 5

In this appendix the suffix ind is omitted. The proof is by structural induction showing P - Cmd (cmd) and P - Exp(exp). Predicate P - C m d e C m d - t > {true, false} is defined by P - Cmd(cmd) =V sES:

plao maximal and occ maximal and pla o not a descendant of occ (i) Dcg I[cmd]] (occ) (s)~ 1 = Dcg I[cmd]] (occ) (s)~2 (occ, " R " ) (ii) D ~ I[cmd~ (occ) (s)~2 = MFPS (F ~g [cmd]l (occ) (pla0, 2 s . s)~2, plao, s).

Predicate P - Exp is defined similarly�9 Since the proof is long only the most difficult case will be considered. It amounts to showing

P - C m d ( w h i l e e x p d o c m d o d ) assuming P - C m d ( c m d ) and P - E x p ( e x p ) . Assume that occ is maximal and let pla 1 . . . . . pla6, fit, tff, fgl, fg2, fg3 and fg be as in Appendix 1. Also abbreviate dgl = D ~ l [ e x p ] o c c w and dg2=DCgl [cmd] loccw and dg3=Dat tach(p la6) and d g [ d g ' ] = 2 s �9 [dg~ * D cond (dg2 , (2 s ' , dg '(s 'us)) , dg3) ] s. The following fact is frequently used without explicit mentioning. .

Fact. s t ep ( f c, . . . . . . . ) (a)(pla)= • if there is no pla' and tf so (pla ' , pla, t f ) � 9 fc.

Lemma 3. V s ES: FIX (2 dg ' . dg [dg']) s,~ 1 = FIX (4 dg ' . dg [dg']) s$2 pla 6

Proof. The lemma follows from

V s ~S: (4 dg ' . dg [dg'])" 3- s~ 1 = (4 dg ' . dg [dg'])" Z s J,2 pla 6

which is proved by induction in n. The case n = 0 is trivial so consider the inductive step. For arbitrary s eS it follows from P - C m d ( c m d ) and P - E x p ( e x p ) that (4 dg ' . dg [dg ' ] )n+13-s=(2dg ' .dg [dg']) ~ Z(s, laz(plas))udg3(tff(al(pla3))),l(3-,alUJaz) where a l = d g l ( s ) ; 2 and a2=dgz(tft(al(pla3)))~2. It is easy to see that a l ( p l a 6 ) = • =a2(pla6) because of the fact mentioned above. Hence the result follows. []

From Lemma 3 condition (i) of P - C m d (while exp do cmd od) easily follows. For condition (ii) it is useful to abbreviate fc I =fg l (p la l , 2 s . s)J,2, fc 2 =fg2 (pla3, tft)~2, fc 3 =fg3 (pla3, tff)~2 and fc~ = {(pla 5, pla 2, 2 s . s)}. Then define

Mo (s) = MFPS (fg (p la l , 2 s . s)+2, plal, s) = MFPS (fc I w fc z w fc3, plal, s).

Intuitively Mo(s ) is the effect of the while construct when no iterations are performed.

Lemma 4. For arbitrary dg' e S - m > ( S x ( P l a - c > S ) ) and s~S: (dg [dg'] (s))~2 =dg ' (Mo(s ) (p las)us)~2uMo(s) .

Proof. From P - E x p ( e x p ) follows dgl(s)=(al(pla3) ,a l ) where a l = M F P S ( f c l , p l a l , s ). F rom P - Cmd (cmd) follows dg2 (tft(al (pla3))) = (a2 (plas), a2) where a 2 = MFPS (fc2, pla3, a l(pla3) ). It is easy to see dg3(tff(al(pla3)))=(a3(pla6) , a3) where a 3 =MFPS(fc3 , pla3, al(pla3) ). Since al(plas) = A_ = a 3 (plas) this yields (dg [dg'] (s)) ~2 = a t II a2 Lt a 3 u (dg' ((a 1 t_3 a2 u a3) (plas) u s).[2).

286 F. Nie lson

It r emains to be sho w n tha t a x u a 2 u a 3 = Mo(s). First it will be shown tha t a 2 u a 3 = a ' , where a ' = M F P S (fc 2 w fc 3, pla 3, a t (pla3)). T hen it will be shown tha t a 1 u a ' = M o (s).

Tha t a 2 LJ a 3-7 a' follows f rom (a2 u a3) pla___ step(fc 2 w fc3, pla3, a 1 (pla3)) (a2 u a3) pla which is shown by cases of pla. Cons ider the case where pla is a descendant of occ w (2 ) . Then (using the fact and propert ies of Fcg)

(a 2 u a3) pla = step(fc2, pla3, a I (pla3)) a 2 pla

= step(fc2, pla 3, a t (plaa)) (a 2 u a3) pla = step(fc 2 w lea, plaa, al (plaa)) (a2 u a3) pla.

The r emain ing cases are similar. T h e converse inequal i ty a 2 u a 3 ~ a ' follows f rom a 2 ~ a ' and a 3 ~_a', which are easy to es tabl ish because s tep is m o n o t o n e in its first a rgument .

Tha t a 1 u a '-nMo(s ) follows from (a 1 u a') p l a ~ step(re I w fc 2 w fc a, plal , s) (a I ~ a ' ) pla which is shown by cases of pla. Cons ider the case where pla is a descendan t of occw (2 ) . T h e n

(a t u a') pla = step (fc 2 w fc 3, pla a, al (pla3)) a ' pla = step (re 2 w fc3, p la l , s) (al u a') pla = step (fcl w fc: w f%, plal , s) (az L~ a') pla.

The r emain ing cases are similar. T ha t a 1 ~ , a ' _ M o(s) is by a I ~-Mo(s) (which is easy) and a ' ~ Mo(s). The lat ter follows f rom Mo(s ) (p l a )~s t ep ( f c 2 w f%, plaa, al(pla3))(Mo(s))(pla). This is p roved by cases of pla us ing air ' - Mo(s) and mono ton ic i ty of the t ransfer funct ions. [ ]

To express the effect of the while loop when it is i terated a n arb i t rary n u m b e r of t imes it is useful to define M| (s) = M F P S ((fg F fix ( p l a s , pla2, 2 s . s)) ( p l a l , 2 s . s)+2, plax, s) = MFPS( fc 1 w fc 2 ~ fc 3

fc4, pla~, s),

L e m m a 5. For arbitrary s eS : FIX(2 dg ' . dg [dg']) s+2 = M~(s).

Proof. It simplifies the p roof to omi t the first c o m p o n e n t of dg [dg ' ] s f rom considerat ion. To this end define M [ M ' ] s = M ' ( M o ( s ) ( p l a s ) u s ) u M o ( s ) . T h e n us ing L e m m a 4 it is no t difficult to show FIX (2 dg ' - dg [dg ']) s$ 2 = FIX (2 M ' . M [M' ] ) s = L F P (2 M ' . M [M' ] ) s.

T h a t F IX (2 M ' . M [M' ] ) s ~_ Moo (s) is ob ta ined f rom m ~ (s) _~ m ~ (M o (s) (plas) u s),~ M 0 (s). Since step(fc 1 u fc 2 u f% w fc4, p la l , s)~_ step(fc 1 w fc 2 w fc 3, pla 1, s) it follows tha t M| (s)_~M o (s). By the m e t h o d s used in L e m m a 4 it can be shown tha t

Mo~ (s)__= step (fc 1 w fc 2 u f% ~ fc4, p la l , Moo (s) (pla s) u s) (M~ (s))

so tha t m ~ (s) ~ M~ ( m ~ (s) (plas) u s) ___ M~ ( m o (s) (plas) ~ s). The converse inclus ion FIX(2 M ' . M [M'] ) s___M~(s) follows f rom F I X ( 2 M ' . M [M' ] ) s

step (fc t u fc 2 ~ f% u fc4, p la t , s) (FIX(2 M ' . M [M'] ) s). By the cont inui ty of ~t and y it follows tha t s tep( . . . , . . . , ...) is con t inuous . It then suffices to show

(2 M ' . M [ M ' ] )" + 1 1 s ~ step (fc i ~ fc2 u fc 3 w fc 4, pla 1, s) ((2 M ' . M [M ' ] )" A_ s).

Define b l ( s ) = M o(s ) and b,+ l ( s )= Mo(b,(s)(plas)~s) . Since it is easily seen tha t b, is m o n o t o n e and b, + ~ b, it can be sho w n (by induc t ion in n) tha t V s: b, (Mo(s) (p las)u s)= M o (b,(s)(plas)~ s) so tha t V s: (2 M ' - M [M'])" l s = b, (s) follows. Final ly b, + 1 (s) (pla)_~ step (fcl u fc 2 u fc 3 u fc4, plax, s)(b,(s)) (pla) is shown by cases of pla. [ ]

F r o m L e m m a 5 condi t ion (ii) of P - C m d (while exp do cmd od) easily follows.

References

1. Aho, A.V., Ul lman , J.D.: Principles of Compi le r Design. L o n d o n : Addison-Wes ley , 1977 2. Couso t , P., Couso t , R.: Abs t rac t In te rpre ta t ion : a unified lattice mode l for static analys is of

p rog rams by cons t ruc t ion or app ro x im a t ion of fixpoints. Proc. 4th A C M Symp. Principles Progr. Lang. 238-252 (1977)

3. Couso t , P., Couso t , R.: Sys temat ic design of p ro g ram analysis f rameworks . Proc. 6th A C M Symp. Principles Progr. Lang. 269-282 (1979)

4. D o n z e a u - G o u g e , V.: Ut i l isa t ion de la s eman t ique denota t ionel le pour l '6tude d ' in te rpre ta t ions non- s t anda rd . Repor t no. 273, I N R I A , France, 1978


5. Donzeau-Gouge, V.: Denotational definition of properties of program computations. In: Program Flow Analysis: Theory and Applications. Muchnick, S.S., Jones, N.D. (eds.). New Jersey: Prentice- Hall, pp. 343-379, 1981

6. Gordon, M.J.C.: The Denotational Description of Programming Languages: An Introduction. Berlin, Heidelberg, New York: Springer, 1979

7. Hecht, M.S.: Flow Analysis of Computer Programs. North-Holland, New York, 1977 8. Kam, J.B., Ullman, J.D.: Monotone data flow analysis frameworks. Acta Informat. 7, 305-317

(1977) 9. Milner, R.: Program semantics and mechanized proof. In: Foundations of Computer Science II.

K.R. Apt, and J.W. de Bakker, (eds.). Amsterdam: Mathematical Centre Tracts 82, pp. 3-44, 1976 10. Milne, R., Strachey, C.: A Theory of Programming Language Semantics. London: Chapman and

Hall, 1976 11. Nielson, F.: Semantic foundations of data flow analysis. M. Sc. Thesis, Report no. PB-131. Den-

mark: Aarhus University, 1981 12. Nielson, F.: Program transformations in a denotational setting. Report no. PB-140. Denmark:

Aarhus University, 1981 13. Rosen, B.K.: Monoids for rapid data flow analysis. SIAM J. Comput. 9, 159-196 (1980) 14. Stoy, J.E.: Denotational Semantics: The Scott-Strachey Approach to Programming Language

Theory. Cambridge: MIT Press, MA, 1977 15. Ullman, J.D.: A survey of data flow analysis techniques. 2nd USA-Japan Comput. Conf. pp. 335-

342, 1975

Received February 22, 1982/June 1, 1982

a denotational framework for data flow analysis

Documents