a functional approach to database updates

Pergamon Information Systems Vol. 18, No. 8, pp. 581-595, 1993

Copyright 0 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved

03064379194 $6.00 + 0.00

A FUNCTIONAL APPROACH TO DATABASE UPDATES

CAROL SMALL

Department of Computer Science, Birkbeck College, Malet St, London WCIE 7HX

(Received 25 May 1992; in revised form 28 February 1993)

Abstract-PFL is a functional database language in which functions are defined equationally and bulk data is stored using a special class of functions called selectors. It is a lazy language, supports higher-order functions, has a strong polymorphic type inference system, and allows new user-defined data types and values to be declared. All functions, types and values persist in a database. Functions can be written which update all aspects of the database: by adding data to selectors, by defining new equations, and by introducing new data types and values. PFL is “semi-referentially transparent”, in the sense that whilst updates are referentially opaque and are executed destructively, all evaluation is referentially transparent. Similarly, type checking is “semi-static” in the sense that whilst updates are dynamically type checked at run time, expressions are type checked before they are evaluated and no type errors can occur during their evaluation.

In this paper we examine the expressiveness of PFL with respect to updates, and illustrate the language by developing a number of general purpose update functions, including functions for restructuring selectors, for memoisation, and for generating unique system identifiers. We also provide a translation mechanism between Datalog programs and equations, and show how different Datalog evaluation strategies can be supported.

1. INTRODUCTION

1.1. Background

A deductive database can be viewed as comprising a conventional database containing factual data, a knowledge base containing rules, and an inference engine which allows the derivation of information implied by the rules and facts. Most research over the past decade has been focussed on deductive databases where the knowledge base is expressed in a subset of first-order logic [l, 21 and either an SLDNF [3] or Datalog [4] inference engine is used. The limitations of this approach have recently led to a number of extensions to these systems such as sets and functions [5], higher-order syntactic features [6], nondeterminism [7] and type systems [8].

Alternative approaches to deductive databases based upon production rules [9], procedural extensions [lo] and functions [ 1 I] are also possible, although less well explored. In particular, we are concerned with deductive databases in which rules are expressed as functions, and in which function evaluation is used for inference [12]. PFL [13, 121 is a functional database language which supports higher order functions, lazy evaluation, and has a polymorphic type inference system which provides strong type checking [15]. In addition, functions are defined through the incremental insertion and deletion of equations; and new types and values can be declared. An integrated model of data and computation is provided via functions called selectors which support the associative retrieval of tuples from a nonfirst-normal form relation. All functions, types and values persist in a database.

Functions can be written to update all aspects of the database-to insert and remove types, values and equations, and to add and delete tuples from selectors. Although operations which modify the database are executed destructively, and hence are referentially opaque, referential transparency is maintained within evaluation. Similarly, although update operations are type checked at run time, expressions are type checked before evaluation and no type error can occur during evaluation.

In this paper we examine the expressiveness of PFL as an update language. We do not consider its expressiveness as a query language since this is discussed elsewhere at some length [12, 131. Although our presentation is deliberately informal a more rigorous treatment (including the denotational semantics of the language) can be found in Ref. [14]. The remainder of the paper is

581

582 CAROL .%fALL

organized as follows. Section 2 describes PFLs type system, its support for user-defined types and the definition of functions and selectors. The operational semantics of expression evaluation, an understanding of which is necessary for the definition of update functions, is also described in some detail. Section 3 provides a number of examples which show how selectors can be re-structured, functions memo&d, and object-identifiers generated. Section 4 examines the expressiveness of PFL, showing how an arbitrarily complex transfo~ation on a group of selectors can be achieved. It also gives a mecha~sm for translating Datalog rules into equations: any Datalog evaluation strategy can be implements in PFL, and we illustrate this by showing how naive evaluation [16] can be defined. Comparison with related work is described in Section 5, and finally in Section 6 we present our conclusions. Since the salient features of PFL are that it is a functional language, and that it maintains to a considerable extent both static type checking and referential transparency, the remainder of this introductory section motivates the importance of these aspects.

1.2. Comparison of logic and functional computation

Research over the past 15 yr into deductive databases has primarily been concentrated upon logic and imperative database languages, although more recently there has also been considerable interest in the development of functional and obj~t-oriented languages. In contrast to imperative languages which specify how to compute the desired output from the stored data, functional and logic languages are declarative and simply specify the reZat~on~~~ between the desired output and the stored data-it is a matter for the DBMS to develop an execution plan to obtain the output. The key concepts of object-oriented languages--encapsulation, inheritance and object identity-are largely orthogonal to whether the language is imperative or declarative. We are interested in exploring declarative as opposed to imperative languages for the reasons given in Section 1.4 below. Since a key feature of PFL is its foundation of functional as opposed to logic programming, in this section we review the comparative advantages of the two paradigms.

The main advantage of logic languages is that predicates are theoretically “invertible” in the sense that they can be used with any combination of their arguments uninstantiated-although in practice inve~ibility may be limited to satisfy the instantiation patterns expected by built- in predicates. A second advantage is that facts and rules are more naturally represented as horn clauses, rather than as functions and their inverses. Nevertheless, we believe that the advantages of functional computation (outlined below) are significant, and that consequently it is worthwhile exploring deductive databases based upon this form of computation. Furthermore, equations and selector functions (see Sections 2.2-2.4) provide a natural representation for rules and facts, and furnish PFL with the ability to pose invertible queries over these rules and facts.

With respect to computation, the most significant advantage of functional languages is that they are higher order. Consequently, functions can be written which abstract out recursion patterns, thus allowing further functions to avoid explicit use of recursion. For example, the fold function-defined below-when applied to a 2-ary function f, an end element e and a list [al,a2,..., an] yields the expression f al (f a2 (. . . (f an e))). Thus, fold can be used to sum a list of numbers, to logically or a list of Booleans, and to find the maximal element of a list of arbitrary type:

fold f end [] =end fold f end (h:t) =f h (fold f end t) sum (h:t) =fold (+) ht or (h:t) =fold (v) h t max (h:t) = fold greater h t

(assuming that greater is a functional which returns the larger of its two arguments). A further advantage is that function languages do not need to communicate the results of interim computations via intermediate variables, and consequently the resulting programs are in general more succinct and unders~ndable. Finally, lazy evaluation can be used to model infinite processes and to avoid unn~essary computations. Further motivation can be found in the paper by Hughes [ 171.

Functional approach to database updates 583

With respect to data manip~ation, the dete~inistic semantics of functional evaluation can be exploited to provide a natural representation of defaults. For example, given the equations:

taxcode Jim = 449M taxcode x = 351 L

the taxcode of anyone other than Jim is 351L. In contrast, logic languages being based upon first-order logic which is monotonic must represent default knowledge extra-logically (e.g. by negation as failure [18] or default rules [19]).

1.3. l-vpe checking

A Ianguage is strongly typed if it prevents the application of a function to a value of an inappropriate type. A language is dynamicaIly typed if type errors are detected at run time, whilst it is statically typed if type errors are detected by the compiler. For exampfe, the L-calculus, Lisp and Miranda are respectively examples of typeless, dynamically and statically typed languages. Whilst type systems evolved as a means of specifying implementation details, such as the layout and optimization of storage, their use for ensuring database schema well-formedness, protecting long-term data from corruption, and enforcing the correct utilization of values has become increasingly important. Although most languages are strongly typed, many incorporate at least some dynamic type checking. Indeed, in the context of deductive database systems strong static type checking seems to be at conflict with the support of inheritance and with updates which modify the type system. Nevertheless, as much static type checking as possible is clearly desirable since it is less costly and detects errors more quickly than dynamic type checking.

1.4. ~~ferentiai tran~~aren~~

One of the most attractive features of declarative languages is referential transparency through which each expression denotes a single value which cannot be changed either by evaluation or by allowing different parts of a program to share the expression [20]. In practice, however, the need to provide update facilities leads to many deductive DBMSs being referentially opaque. For example, deductive DBMSs based upon Prolog [21] can “prove” ql but not q2, despite the commutativity of h :

ql tassert( p) A p q2+p A assert(p)

Referential transparency allows freedom in the order of execution of sub-expressions, and hence programs need not contain sequencing info~ation, they cannot contain assignment statements, and procedures cannot have side-effects. Consequently, programs are easier to write and reason about f17) and program debugging and verification is simplified 1221. Furthermore, there are enhanced opportunities for program optimization since common subexpressions can be shared [23], parallel evaluation strategies can be adopted [24], memoisation [25] can be used to store previously computed results, program transformation techniques [26] can be used to develop more efficient algorithms; and strictness analysis [27] can be used to implement algorithms more efficiently.

Referential transparency is, however, only achieved at a price. Updates must be nondestructive and effected by copying of structures, and consequently high overheads are incurred in terms of both time and space [28). It therefore seems desirable that a compromise is reached in which updates are executed destructively and yet a maximal degree of referential transparency is preserved.

1.5. Resdt continuations

Referential transparency also raises problems for declarative languages in the areas of I-O and in the provision of nondeterministic constructs which are required for applications such as operating systems. Recently, “result continuations” have been used in Hope+C to provide a form of referentially transparent I-O [29]. In Hope+C a program is a function of type (a +Result) where Result is of type:

((operation request) x (p-+Result)).

584 C.utOL SMALL

In other words, a Result comprising follows:

program is a function which takes an argument of type o! and returns a an operation request and a continuation function. Execution proceeds as

(i) the function is applied to a value of type a, and returns a Result, (op, cf); (ii) the requested operation op (e.g. to write a string to a file) is performed and gives a value

v of type /I (e.g. the number of characters successfully written); (iii) the continuation function, cf, is applied to v to produce a new result, (op,, cf,), and control

passes back to step (ii).

The execution of a program thus effectively results in a sequence of interleaved evaluation/I-O requests: evaluation requests are referentially transparent since no I-O occurs within such a request; whilst the I-O requests are referentially opaque. The work described in this paper can be viewed as an extension of this concept so that the (operation request) allows the declaration of new types and values, and the definition of equations.

2. AN OVERVIEW OF PFL

Conceptually a database? is a 34uple comprising a schema, functions and selectors:

Database = =(Schema x Functions x Selectors)

Expressions of type Operation can be used to interrogate and update the database:

Operation::= Done I Commit 1 Restore I Print a Declare {Str, Type} Delete a Define {a, a} Selector {a, Type} Include {a+[a], a} Exclude {a+[a], a}

Operation 1 Operation Operation I Operation Operation ( Operation Operation I Operation Operation I Operation Operation I Operation Operation

Throughout the paper we refer to expressions of type Operation as operations. As we shall see in our description of the type system (Section 2.1) Str and Type are types, --, is the function space type constructor, and a is a type variable. Thus one example of an operation is Print (1 + 2) Done. Again looking forward (in this case to Section 2.2) it is possible to define a function which denotes an operation. For example, if we define the function printsum:

print-sum x y c = Print(x + y)c

then print-sum 1 2 Done is also an operation. Operations are reduced to Weak Head Normal Form (WHNF)$ in order to determine the form of the operation to be executed:

1. 2. 3.

4.

Done is a “null” operation which has no effect on the database state. Commit (respectively, Restore) simply causes the database state to be committed (restored). Print exp c causes exp to be evaluated and displayed; following which the continuation operation c is processed. Declare dec s f introduces a new type or value, dec (see Section 2.1). dec is reduced to Normal Form$ and various checks are carried out (e.g. to ensure it has not previously been declared). If the checks are passed then it is added to the schema component of the database and

tPFL syntax and the structure of a PFL database are distinguished using courier and italic fonts, respectively. #Weak Head Normal Form is defined in Section 2.6 below, but informally this means that the expression is reduced only

as much as necessary to determine whether it is the operation Done, Commit, Print, etc; sub-expressions, such as e and c in the case of Print e c, are not evaluated.

$Normal Form is formally defined in Section 2.6, but informally this means that the expression is fully reduced.


processing resumes with the success continuation s; otherwise processing resumes with the fail continuation f.

5. Define eqn s f introduces a new equation, eqn (see Section 2.2). Various checks are carried out (e.g. to ensure it is type correct); if these checks are passed then eqn is added to the functions component of the database and processing resumes with s; otherwise processing resumes with f.

6. Delete exp s f removes an equation or type declaration, exp (see Section 2.5). If exp does not exist then processing resumes with f; otherwise exp is removed from the database, and processing resumes with s.

The remaining operations concern selectors, which are a special class of functions designed for the incremental storage and associative retrieval of bulk data:

7. Selector (name, type} s f declares the type of a new selector (see Section 2.3). {name, type} is reduced to Normal Form and checks are carried out on the declaration (e.g. to ensure that a selector of the same name has not previously been declared). If these checks are passed then the declaration is added to the database and processing resumes with s; otherwise processing resumes with f.

8. Include (sel, val} s f and Exclude {sel, val} s f include and exclude tuples from selectors (see Section 2.4). (sel, val) is reduced to normal form and checks are carried out on the inclusion (e.g. to ensure the value is not present in the selector) or exclusion (e.g. to ensure the value is present). If these checks are passed then the database is updated and processing resumes with s; otherwise it resumes with f.

For example, consider the evaluation of e, = Define {succ x, x + l} (printsum (succ 5) (succ 6) Done) Done:

I. e, is already in WHNF; 2. The equation succ x = x + 1 is added to the database; 3. e2 = print-sum (succ 5) (WCC 6) Done is reduced to WHNF, i.e. to Print ((succ 5) + (succ

6)) Done; 4. The sub-expression (succ 5) + (succ 6) is evaluated and the result displayed; 5. e3 = Done is already in WHNF; processing terminates.

It is important to note that processing an operation results in an interleaving of evaluation steps which are referentially transparent (the odd numbered steps) and execution steps which may modify the database by side-effect (the even numbered steps). Consequently all of the advantages of referential transparency outlined in Section 1.3-the ability to share common sub-expressions, the reuse previously computed results, etc.-are maintained within evaluation; whiist the disadvan- tages-such as the high overheads incurred in terms of both time and space-are avoided. Furthermore all expressions can be’statically type checked before evaluation, and no run-time type checking need be undertaken during evaluation.

2.1. The type system

PFL is a strongly typed language with a three-level type system PO]. The first level is a set of meta types, which is fixed to comprise the single element Type, the set of all object types. The second is a set of object types, each being a member of the unique meta type Type. The third is a set of values, each value being a member of a unique object type. Thus, the Schema component of a database conceptually comprises these three sets, together with the types which are automatically inferred for functions and the types declared by the user for selectors:

Schema = =(MetaTypes x ObjectTypes x Values x FunctionTypes x SelectorTypes) Objet t Types = = {(constructor x type)) Values = = {(constructor x type)) FunctionTypes = = ((fkctiun -name x type)} SelectorTypes = = ((selector-name x type))

586 CAROL %iALL

A number of pre-defined object types-namely Str, Num, Char and Operation-and their values are provided. In addition, the pre-defined value Any is a member of every object

type. New object types and values are introduced by an operation Declare (s, t) cl c2, where

s and t are of type Str and Type, respectively. The declaration causes the pair {s’, t} to be added to either ObjectTypes or Values as appropriate, where s’ is syntactically identical to s except that it is stripped of any quotation characters. The declaration fails if s’ has already been declared.

For example, the following operations declare the (object-) types Book Person, College and Department to be members of the (meta-) type Type, List t to be a Type for all types t, and Prod2 tl t2 to be a type for all types tl and t2 (the constants Bool, List, etc. are called type constructors since they construct instances of a type):

Declare {“Bool”, Type} Done Done Declare {“List”, a+Type} Done Done Declare {“Prod2”, a-+b+Type} Done Done Declare (“Person”, Type} Done Done Declare (“College”, Type} Done Done Declare {“Department”, Type} Done Done

Similarly, the following operations declare True to be a value of (object-) type Bool, ((:)a b) to be of type List t whenever a is of type t and b is of type List t, and Tuple2 a b to be of type Prod2 tl t2 whenever a and b are of type tl and t2, respectively, (the constants True, (:) and Tuple2 are called value constructors since they construct instances of a value):

Declare (“True”, Bool} Done Done Declare {“(:)“, a+( List a)+( List a)} Done Done Declare {“TupleZ”, a+b-+(Prod2 a b)} Done Done Declare {“Alex”, Person} Done Done Declare {“King’s”, College} Done Done Declare {“ComputerScience”, Department) Done Done

A shorthand notation can be used for certain commonly used types and values, namely:

Tuple,v ,,... v,={v ,,..., v,} Prod,t ,... t, ={t ,,..., t,}

(:) Vl VP = v, :vp List t = [t] v,:[v, ,... V”] =[v ,,..., v,]

2.2. Function definition

The Functions component of a database conceptually comprises a set of equations, each equation being a pair comprising a left-hand side and a right-hand side:

Functions = = {Equation} Equation = = (lhs x rhs)

A function is defined by a number of equations, each equation being specified by an operation Define {Ihs, rhs) cl c2. The type of the function is automatically inferred as its equations are defined. The operation succeeds if there is no existing equation with the same Ihs, in which case the pair (Ihs, rhs) is added to Functions; otherwise the operation fails. The meaning of a function is independent of the order of definition of its equations, and equations can be defined which reference functions which have yet to be either wholly or partly defined: we discuss these points further in Section 2.6.

An equation is constructed from function names, variables and constructors. A pattern is either a variable or an application (C p, . . . pn ), where C is an n -ary constructor and each Pi a pattern. An expression is either a variable or an application (CF e, . . . e, ), where CF is either a constructor or a function and each ei an expression. It should be noted that constants such as numbers, characters and strings are regarded as 0-ary constructors. An equation is represented as a pair {f Pl . . . pn, e} where f is a function name, each pi a pattern, and e an expression. For example, the


following operations define the function if which takes three arguments and returns either the second or the third depending upon whether the first is the Boolean True or not:

Define {if True x y, xf Done Done Define (if False x y, y> Done Done

Following these definitions the type of if, which is automatically inferred to be Bool+a-+a+a, is stored in FunctionTypes. Of course, both recursive and higher-order functions can be defined:

Define (filter p [I, []I Done Done Define (filter p (h:t), if (p h) (h:(filter p t)) (filter p t)) Done Done

A further form of expression which is not covered by the basic syntax given above is the list abstraction, [e 1 q, ; q,; . . . ; q,], which is read as “the list of values e such that q, and q2 and. _ . and 9”“. Each qi is either a generator pi cl,, which is read as “the pattern pi is matched against each element in the list Ii in turn”, or a Boolean-valued expression which must be satisfied. List abstractions can be translated into a series of higher-order function applications, as described by Peyton-Jones [23]. For example, the following operation defines a function which gives all the even numbers in a list:

Define {evennums list, [n 1 ntlist; (n mod 2)==0]} Done Done

2.3. Selector de$nition

The selectors component of a database is con~ptuaily a set of pairs comprising a selector name and a relation, where a relation is a set of n-tuples:

selectors = = {(selector-name x relation)} relation = = {(expression-l X . . * x expression-n)}

Unlike ordinary functions, the type of a selector must be declared, using an operation of the form:

Selector (s, t+ [t]) cl c2

where s is the name of the selector and t+ [t] is its type, for some monomorphic first-order type t. The operation fails if s has already been declared, otherwise the declaration is stored in SelectorTypes and an empty relation is associated with s and stored in selectors. Before describing the semantics of a selector we must first describe the pre-defined function N, whose purpose is to allow partial matching of values. When given values p and v, the expression (p 2: v) yields either True or False depending upon whether p is identical to v except with respect to any occurrences of Any in p, More precisely, the semantics of N is as follows, where there is one equation:

(C x, . . . x,)-(Cy,... yn)=TrueA(x,-y,)r\...r\(x,-y,)

for each n-ary constructor C, together with the default equation:

xzy= x= =Any

We recall that constants (such as 1, ‘a’ and True) are regarded as 0-ary constructors, and thus 1 N 1 untrue (by the first equation)+ whilst 1 IY 2 ev 1 2 Any (by the second equation) D- False. As further examples we have that [l ,Any,3] N [1,2,3] -True and [1,2,3] z [l ,Any,3] EW False. If s is a selector and r the relation associated with s, then s may be assumed to be defined by the following equation where r’ is a list containing the tuples of r in some system-defined order:

Define {s p, [x 1 x+-r’; p N xl> Done Done

In other words, a selector s takes argument a search pattern, p, and returns all the elements of its relation which match p. The reason for distinguishing between selectors and other functions is that unlike other functions selectors can be used for the incremental storage and associative retrieval of bulk data.

tThe symbol CT may be read as “reduces to”.

588 CAROL SMALL

2.4. Updating selectors

The relation associated with a selector is updated using the operations Include {s, v} cl c2 and Exclude (s, v} cl c2. In both cases s is the name of a selector, v is a value, and the type of (s, v} is {t+[t], t} for some type t. In the case of Include, v is added to the relation of s; in the case of Exclude, any value v’ of the relation of s such that (v N v’) is removed. The operation fails if v was already present in the relation of s (in the case of an Include operation), or if no tuples were removed from the relation of s (in the case of a Exclude operation). For example, an employee selector can be defined and updated as follows:

Selector (emp, (Person,College, { Num,Num}}+ [{ Person,College, { Num,Num)}]} Done Done Include {emp, (Mir,Birkbeck, {10,1987}}} D one Done J/Add new employees11 Include {emp, (Alex,King’s, {09,1991}}} Done Done Exclude {emp, {Any,UCL,Any}} Done Done ([Fire UCL employeesI(

A selector can be queried associatively. For example, the operation prints details of all King’s employees hired in 1991 followed by details of all Birkbeck employees:

Print (emp {Any,King’s, {Any,1 991))) (Print (emp {Any,Birkbeck,Any}) Done)

E:: . , {Alex,King’s, {09,1991}), . . .] ., {Mir,Birkbeck, {10,1987}},. ..]

2.5. Deletion of equations, types and values

Equations are deleted from the database by an operation of the form Delete (f pl . . . pn) cl c2 where f is a function name and pl to pn are patterns. If there is no equation (fp 1 . . . pn, rhs) in Functions then the deletion fails, otherwise this (unique) equation is removed and the deletion succeeds. For example, the following operations delete the equations for if:

Delete (if True x y) Done Done Delete (if False x y) Done Done

As the equations are deleted the type of if (and of any function defined in terms of if) is re-inferred. It is interesting to note that the operations Define and Delete obviate the need for a “redefine” operation since this is simply:

Define (redefine {Ihs,rhs} cl c2, Delete Ihs (Define {Ihs,rhs} cl c2) c2) Done Done

The operation Delete s cl c2 can also be used to delete the declarations for selectors, object types and values. There are three cases to consider:

1. s is a selector name. If the relation of s is not empty or if s appears in some equation then the deletion fails; otherwise the declaration is removed from SelectorTypes.

2. s is a user-defined value constructor. If there is no declaration for s or if s appears in some equation then the deletion fails; otherwise the declaration is removed from Values.

3. s is a user-defined type constructor. If there is no declaration for s or if s appears in some value declaration then the deletion fails; otherwise the declaration is removed from Object- Types.

For example, on the assumption that Alex is the only value constructor for the object-type Person, the following operation deletes both Alex and the type Person:

Delete Alex (Delete Person Done Done) Done

2.6. Evaluation

A redex is an expression of the form (f, e, . . . e, ), where f, is an n-ary function. An expression is in Normal Form (NF) if it does not contain a sub-expression which is a redex. An expression is in Weak Head Normal Form (WHNF) if it is of the form fc, e, . . . e,, where fc, is either an n-ary constructor and m < n, or an n-ary function and m < n. Thus any expression in NF is also in WHNF, although the converse does not hold. Some expressions do not have an NF; the classic example is the expression (f f) where the function f is defined as:

fx=xx


Fortunately, a consequence of the Church-Rosser theorem [23] is that if an expression has a NF then there is a normal order reduction sequence to that NF. A normal-order [20] reduction sequence specifies that when there is a choice of redex the left-most outermost redex should be reduced first, and it is this reduction sequence which PFL uses. When reducing a redex one of three cases applies:

1. f, is a pre-defined function. The code for f, is executed and the redex is replaced by the result. (Note that it may be necessary to evaluate some of the ai’s themselves before executing the code).

2. f, is a user-defined function. The redex is replaced by the right-hand side of an equation, after replacing any variables in the equation by the corresponding arguments. The equation is selected by applying a best-fit pattern-matching algorithm [20]: the equations defining f, are compared with the arguments ei from left to right in turn, and at each ai only those equations which contain the most specific match for this argument are considered for ai+,

3. f, is a selector. In this case the redex is of the form f, e, . e, is recursively evaluated to normal form, say e; , and a list is formed of all elements e of the relation of f, such that e; N e: the redex is then replaced with this list.

For example:

Define (r-rums n, n: (nums (n+l))} Done Done Print (nums 0) Done

cause the infinite list of numbers [0, 1,2, . . .] to bc printed. The intermediate steps in the computation can be shown as follows:

0: (r-rums (0+1)~0:((0+1):(nums((0+1)+1)))w0:1(nums((0+1)+1)))w...

There are a number of important observations which must be made at this point. Firstly, a consequence of the pattern matching strategy adopted (in case 2 above) means that there cannot be more than one equation matching the expression being reduced, and that the equation which is selected is independent of the order in which the equations are considered. Consequently the order in which the equations for a function are defined is immaterial. Secondly, we note that (except for the strict evaluation of selectors) a lazy evaluation strategy is used: in other words, arguments to functions are only reduced as and when they are required. The consequences of this are considered further in Section 3 below. Finally, we note that operations appearing within an expression being reduced will not cause any modification of the database: they will simply be reduced to NF. Thus, all evaluation is referentially transparent. For example, the operation (Print Commit Done) simply results in the constructor Commit being printed; it will not result in the current database state being committed.

3. EXAMPLES

Since PFL adopts a lazy evaluation strategy it is sometimes necessary to be able to force the evaluation of a sub-expression: Section 3.1 shows how this can be achieved using the predefined function strict. Three examples are then given (Sections 3.2-4) which illustrate various features of PFL. In the remainder of the paper, to aid readability we use:

f Pl . ..pn=e

to stand for the operation (Define (f p, . . . p,, e} ok err), where the functions ok and err terminate a computation and show whether or not it was successful:

ok= Print True Done err = Print False Done

We will also make use of the function update which successively causes the application of a number of operations to the database:

update [] c=c update (h:t) c= h (update t c) c

590 CAROL SMALL

(For convenience we have defined update on the assumption that we do not care if any of the operations on the database fail.) Thus, for example:

update [Define {Ihs, ,rhs,}, . . . , Define (Ihs,,rhs,)] ok KF Define (lhs, ,rhsI >(. . . (Define (Ihs,,rhs,) ok ok) . . .) ok

3.1. Strict evaluation

The operation (Define (salary Bill, salary Ben) ok err) adds to the database an equation equating Bill’s salary with Ben’s. If a change is later made to Ben’s salary then (implicitly) the same change is made to Bill’s salary. However, the question arises as to how Bill’s salary can be defined to be the salary which Ben earned at the time of definition. A way of forcing the evaluation of Ben’s salary before the insertion of the equation is required. PFL provides the pre-defined function strict which when applied to arguments e, and e2 reduces e2 to normal form and returns the expression (e, el). Using strict the function define constructs the desired update:

define (Ihs,rhs} cl c2 = strict (as-of-now cl c2 Ihs) rhs as-of-now cl c2 Ihs rhs= Define (Ihs,rhs) cl c2

Thus, if Ben earns ~30,O~:

define ((salary Bill), (salary Ben)) ok err Br;s strict (as-of-now ok err (salary Bill)) (salary Ben) DSP as-of-now ok err (salary Bill) 30000 w Define {(salary Bi11),30000) ok err

3.2. Selector restructuring

Suppose that a selector sl is to be transformed-perhaps by adding, deleting or recalculating various fields-to obtain a new selector ~2. Let f be a function which, when applied to the tuples of sl returns a list (possibly empty) of tuples to be inserted into s2. Then the transformation of sl to s2 can be accomplished using transform, which is defined as follows:

transform f sl s2 = update [In&de (~2, t) 1 t+f (sl Any)] ok

In particular, if id is the identity function then transform id s s’ simply copies s to s’.

3.3. Counters and system generated iden#ers

Consider the task of defining a function new which implements a counter. new initially gives the number 1, and on each subsequent occasion it gives the successor of the previously generated number: it should never generate the same number twice. Surprisingly, new cannot be defined in a referentially transparent language since any two occurrences of an expression must denote the same value. The severity of this limitation can be seen by considering two application areas. Firstly, we may wish a function to “remember” the number of times it has been evaluated. For example, a “log-on” function may want to ensure that only a limited number of attempts are made to log-on to a user account. Secondly, object-o~ented database systems usually provide a facility to create new object identifiers: hence these systems cannot be implemented using a referentially transparent Ianguage since there would be no way of guaranteeing that the same object identifier is generated only once.

Since PFL is referentially transparent with regard to evaluation new cannot directly be defined. However, it can be indirectly defined using a number of operations. The information regarding the state of the counter can be stored in a selector, counter, which is initialized to 1:

update [Selector {counter,Num+[Num]}, Include {counter,l}] ok

the current state of the counter is accessed by read (where maximum gives the largest element of a list):

read = maximum (counter Any)

The function new, defined below, takes as argument a continuation function c; it updates the current state of the counter value, and applies c to the result of reading the (updated) counter:

new c = Include (counter‘read + 1 f (c read) err


3.4. Memoization

Memoization [25] operates by replacing nonlinear function definitions by corresponding “memo” functions, which are the same as the original function except that they remember the arguments they have been applied to, together with the corresponding results computed from them. If the memo function is re-applied to an argument it does not re-compute the result, it just re-uses the previously computed result.

A general-purpose memoization function memo is defined below. memo takes two functions, f and g, a list of values from the domain off, and a continuation c. f is “memoized” as g for the specified values from the domain of f and the continuation c is returned:

memo f g domain c = update [define {g x, f x} I x+domain] c

Using memo the standard definition of the Fibonacci function, fib:

fib O=l fib 1 =I fibx=(fib(x-l))+(fib(x-2))

can be memoized as fastfib, where [0 . . . 201 denotes the list of integers from 0 to 20 inclusive:

memo fib fastfib [0 . . .20] Done 0~ update [define {fastfib x, fib x} 1 xt [0 . . .20]] Done -define (fastfib 0,fib O>( . . . (define (fastfib 20,fib 20) Done Done). . . ) Done ~mr Define {fastfib 0,1} (. , . (Define {fastfib 20,10946} Done Done) . . . ) Done KF...

It is interesting to note that memo allows a function either to be copied and memoized (as in the last example), or to be memoized as itself:

memo fib fib [2. .20] (Print (fib 20) ok)

Our approach to memoisation not only allows the user to decide which functions are to be memoised, but also to specify for which values from their domain; furthermore, the user decides when the function is to be memoised, and can “de-memoise” a function by removing equations.

4. EXPRESSIVENESS ISSUES

4.1. Arbitrary transformations of selectors

Selectors provide a mechanism for the storage and retrieval of bulk data. Thus, in examining the expressiveness of PFL with respect to updates we are primarily concerned with the ability to transform selectors. We recall that a schema is a 5-tuple of the form:

(MetaTypes x ObjectTypes x Values x FunctionTypes x SelectorTypes)

where SelectorTypes is a set of pairs (selector-name x type). An instance of a schema, DS, is a function which assigns to each (s,t -/i]) E SelectorTypes a finite set of expressions each of type t. The set of all instances of DS is shown as I(DS). An update is a partial recursive function from I(DS) to I(DS’) for some database schemas DS and DS’. Two schemas, DS = (M,O,V,F,S) and DS’ = (M’,O’,V’,F’,S’) are type compatible if:

(tc,t) E 0 A (tc,t’) E O’+t = t’

(vc,t) E v A (VCJ’) E V’*t = t’

(s,t) E s A (SJ’) E S’+t = t’

In other words, type compatible schemas do not redefine each others type constructors, value constructors or selectors. Given two type compatible schemas, DS and DS’, any update from I(DS) to I(DS’) can be expressed as follows, where + + is the list concatenation operation. Firstly, the relations associated with the selectors of DS are retrieved:

592 CAROL SMALL

step1 = strict step2 {s, Any,. . . , s, Any}

Any new type constructors [i.e. where (tc’,t) E 0’ and (tc’,t’) 4 01, value constructors [(vc’,t) E V’ and (vc’,t) 4 VJ and selectors [(s’,t’) E S’ and (s’,t’) 4 S] which need to be introduced are declared:

step2 old = [Declare (“tc;“,t; }, . . . ] + + [Declare {“vc;“,t; }, . . . ] + + [Selector {s; ,t; }, . . . ] + + (step3 old)

The relations of the selectors of DS are then emptied:

setp3 old = [Exclude (s, ,Any}, . . . ] ++ (step4 old)

Since PFL is computationally complete, a function can be defined (step4) which takes the instances of the old selectors and generates a list of Include operations to create the instances of the new selectors:

[Include {s;, vl,}, . . . , Include {s;, vlp}, . . . , Include {s;, v,,,,}, . . . , Include {s~,v,,,}] ++ step5

The final phase removes unwanted selectors [i.e. where (s,t) E S and (s,t) 4 S’], value constructors [(vc,t) E V and (vc,t) $ V’], and type constructors [(tc,t) E 0 and (tc,t) 4 01:

step5= [Delete s,, . . .] + +[Delete vc,, . . .] + +[Delete tc,, . . .]

Thus, the entire update is accomplished by the expression (update step1 ok).

4.2. Comparison with datalog

A Datalog program comprises an extensional database EDB which is a set of ground positive literals (or “facts”), an intensional database IDB, which is a set of horn clauses (or “rules”), and a set of “built-in” predicates for equality, arithmetic, etc. The archetypal example of such a program is one defining the ancestor relation:

parent (elizabeth,charles). % fact 1 parent (charles,harry). % fact 2 ancestor (X, Y)+ancestor (X, Z) A ancestor (Z, Y). % rule 1 ancestor (X, Y)+parent (X, Y). % rule 2

Without loss of generality, we assume the the predicates of the EDB and of the IDB are disjoint, and that the rules for a particular IDB predicate are rectified [4]. It is easy to see that an EDB fact, p(X), can be stored as a tuple in a selector p’ of type {i(}+ [{%}I. The representation of an IDB rule is a little more problematic. In general, an IDB predicate p(X) may be defined by m rules:

(1) P(W-q,, (PI, 1 A. . . A q,,(Y.)

b-4 k+-qm, (Pm, 1~. :. A qmm inI

To represent such an IDB predicate we use both a selector p of type {R} + [{%}I, in which the tuples of the IDB predicate will be stored as they are computed, together with a function p’ of the same type which says how to find these tuples. p’ is defined by a single equation:

(2) p’(ii.> = [{R} I w,,+ [Any]; . . . ; wlpt [Any]; Q,, ; . . . ; Q,“] ++ . . . + +

~{~}I~,,~E~~yl;...;~,~~~~~yl;Q,,;...;Q,~l

In this equation:

(i) the Wii are the variables appearing in the body of the rule but not in the head (that is, the Wiis are the existentially bound variables of the rule, such as Z in rule 1 above);

(ii) each 0, corresponding to an EDB or IDB predicate is a generator (~i}tqi. {ji,>; (iii) each 0, corresponding to a built-in predicate is a Boolean expression qii {pi\.


For example, rules 1 and 2 which define the “ancestor” predicate would be represented as?:

ancestor’ {w> =

[OCY> I z+ [Awl; { I > x 2 tancestor {x,2}; {z,y}tancestor (z,y>] + +

[{x,Y} I (x,y}+--wrent (x,Y>I

To compute the tuples of the selectors corresponding to IDB predicates, all that need be done is to repeatedly evaluate these functions and add any newly derivable tuples to the corresponding selectors until a fix-point is reached (i.e. until every tuple which can be derived using the function is already stored in a selector). One way in which this can be achieved is by the function naive which simulates a naive evaluation strategy over the Datalog rules [16] (where-denotes the list difference operation):

naive [] = Done naive ops = update ops (naive changes)

changes= [Include p, {R} 1 x+(p; {Any))-(p, {Any))] + +. .+ + [Include pP{R} 1 ~+(~b{Any})-_(p,{Any})l

In this function p, to pp (respectively, pi to pb) are the selectors (respectively, equations) corresponding to the IDB predicates. naive is a “bottom up” strategy which takes a list of updates and applies these to the database; and whilst tuples which are not present in the selectors pi can be “inferred” for IDB predicates using the equations p:, naive is recursively called to continue processing. Thus, processing is initiated by calling naive changes.

We could, of course, have chosen other Datalog evaluation strategies to compute the extensions of the IDB predicates, such as Semi-Naive evlauation, Henschen and Naqvi’s method, or magic sets (see Refs 1161 or [4] for a review of these methods). Not only can these methods be defined in PFL, but the methods can be mixed so that different methods are used to compute different IDB predicates. This may be advantageous since no single method is best for all rule types [16].

5. COMPARISON WITH RELATED WORK

Space precludes a comprehensive comparison of PFL with related work. Instead we briefly review the update facilities of a representative sample of other systems.

O*FDL [31] blends object-oriented concepts such as inheritance and encapsulation with concepts from functional programming languages such as a Milner-style type system and an equational programming style. A database state is perceived to be a finite function mapping object identifiers to values, and an update is a function which creates a new database state from an old state. A prototype version O’FDL is implemented in Miranda [32], in which database states are held as main-memory data structures. Thus, although updating is referentially transparent (since database states are copied during update) performance may be poor since potentially large data structures must be copied. Furthermore, only certain aspects of the database can be updated: in particular, no updates can be made to the type system or the equations defining functions. Other languages which treat databases as data structures which are updated by copying include Machiavelli [ 111 and persistent programming languages such as Staple [33].

Glue [lo] is a procedural language which extends the purely declarative (logic) language NAIL! [34]. Glue assignment statements add tuples to NAIL! extensional database predicates. A number of aggregation operators are provided, and these together with repetition constructs allow Glue update procedures to be written. Glue statements have a different syntax and semantics to NAIL! statements, and are referentially opaque. Other languages which extend a declarative language with procedural constructs include IPL [35] and EFDM [36].

Fad [37] is a deductive database language which uses a functional computational model. Updates are functions which modify the database by side-effect and hence FAD is referentially opaque. This

tancestor’ is undoubtedly less readable than the Datalog rule for ancestor. This is in part due to our desire to simplify the description of the translation process; in fact the following definition, which bears a much closer relationship to the Datalog rule, is equivalent to ancestor’:

ancestor’{x,y} = [{x,y} 1 {x,z}+-ancestor {x,Any}; {z.y}+-ancestor {z.y}] + +parent {x.y)

594 CAROL SMALL

approach is similar to that of Prolog [21] which modifies its “database” by side-effect using the “assert” and “retract” predicates. Other languages which adopt this approach include Educe [38], Prolog/FDM [39] and an earlier version of PFL [13].

The closest work to ours is that reported by Manchanda and Warren [40,41] who provide extensions to Datalog which support hypothetical reasoning and update procedures. However, whilst the resulting language is declarative it is not referentially transparent; furthermore, the extensions only allow extensional database predicates to be modified.

Finally, in a series of seminal papers culminating in Ref. [42], Abiteboul and Vianu introduce the notion of update completeness, and give a family of update languages of varying expressive power. However, since the authors are concerned with theoretical issues, the languages are very low level and again are referentially opaque.

6. CONCLUSIONS

In this paper we have described a persistent functional database language called PFL, in which: functions are defined through the insertion and deletion of individual equations; new data types and values can be declared; a class of functions called selectors support bulk data; and all information+quations, selectors, types and values-is stored in a database. The most important features of the language are that:

.

.

.

.

The

functions can be defined to update all aspects of the database, thus obviating the need for the user to learn a further “update” language, and avoiding the so-called “impedance mismatch”; no run-time type checking is undertaken during evaluation; referential transparency is maintained during evaluation, thus allowing greater opportunities for compile-time optimization and parallel evaluation; and updates are executed destructively.

update facilities provided by PFL are very low level and unsuitable for use by many programmers. Consequently a library of higher level “update” functions (which includes functions such as those given in Section 3 above for the transformation of selectors and for memoisation) is currently being developed. One other avenue which we intend to explore relates to error handling. At present two continuation operations must be supplied, one as a “success” branch and the other as a “failure” branch. Clearly, for some operations there may be many possible reasons for failure: for example, in the case of adding an equation, the type of equation may be incorrect, or there may already exist an equation with the same left hand side. An alternative to our approach is for a single continuation to be supplied which takes as argument a result code which indicates the success or (reason for) failure of the preceding operation.

There are several other areas which we are investigating. The performance and ease of use of PFL is being evaluated with respect to an application involving road transport accident data: the application records details of approximately 100,000 accidents, with about 40 items of data relating to each accident. This application has already highlighted the need to extend PFL with the ability to enforce semantic integrity constraints (cf. Refs [43,44]), rather than to encode these constraints within update functions. A further requirement identified by the application is the need to use data from diverse sources, and hence we intend to extend PFL to be able to access external databases. Whilst with respect to querying PFL can be used in the same way as most other functional languages, writing update functions in PFL can be more difficult: for example, the user needs to consider the use of strict carefully to ensure that updates to the database are correctly evaluated before they are executed. We therefore intend to provide an “environment” of suitable functions to simplify this task. Finally, we intend to investigate the extension of PFL with features from object-oriented database systems to simplify schema design and increase code re-use. We believe that much of this work can be done by defining suitable functions in PFL itself.

Acknowledgements-We would like to thank Alexandra Poulovassilis, Mark Levene, Nigel Martin, Paul Meredith, Swarup Reddi and Spiros Soukeras for their comments on an earlier draft of this paper. The work described in this paper is supported by the UK Science and Engineering Research Council (Grant No: GR/G 19596).


REFERENCES

[l] S. Ceri, G. Gottlob and L. Tanca. Logic Programming and Dar&uses, Swveys in Computer Science. Springer-Verlag, New York (1990).

@] H. GaILire, J. Minker and J-M. Nicolas. Logic and databases: a deductive approach. AC&# Gemput. Sgm. 1% 153-t&5 (1984).

[3J J. Lloyd. Foundutions of Logic Programming (2nd Edn). Springer-Verlag, New York (1987). [4] J. D. Ullman. Principles of Dotabuse and Knowledge-Base Systems. Computer Science Press, Palo Alto, CA (1988). [S] S. Abiteboul and S. Grumbach. A rule-based language with functions and sets, ACM Trans. Database Syst. 16, l--30

(1991). [6] W. Chen, M. Kifer and D. S. Warren. Hitog as a platform for database languages. 2nd Int. Workshop on Darnbase

Progr~mm~g badges, Oregon (1989). [i’] Y-H. Sheng. IDLQG: extending the expressive power of deductive database languages. ACrM S1GicIoI) Co@1

pp. 54-63 (1990). [8] J. Xu and D. S. Warren. A type inference system for Prolog. 51h Inr. Logic Programming Cotzfi pp. 604-619 {l989). [9] J. Widom, R. J. Cochrane and B. G. Lindsay. Implementing set-oriented production rules as an extension to starburst.

17th Int. Conf on Very Large Databases, Barcelona, pp. 275-286 (1991). [lo] G. Phipps, M. A. Derr and K. A. Ross. Glue-Nail: a deductive database system. ACM SIGMOD Conf. (1991). II I J A. Ohori, P. Buneman and V. Breazu-Tannen. Database pro~amming in Machiaveili-a polymorphic tanguage with

static type inference. Proc. AC’.@ SIGMQD Con& pp. 46-57 (1989). ir 21 A. Pouiovassilis and C. Small. A functional p~gmmmin8 approach to deductive databases. 17th fnf. Con,f on Very

Large L)at~bmes, Barcelona, pp. 49 i -500 (I 99 I ). [ 131 C. Small and A. Poulovassilis. An overview of PFL. 3rd Knt. Workshop on Database Programming Languages. Nafplion,

pp. 96110 (1991). [14] A. Poulovassilis and C. Small. A domain-theoretic approach to extending a functional database language with sets.

f9rh int. Conf. Very Large Databases, Dublin (1993). [l S] M. P. Atkinson and 0. P. Buneman. Types and &&tence in database programming languages. ACM Canspur. Surrl.

19, It&130 (1987). [16] F. Bancilhon‘ and’ R. Ramak~sh~an” An amateur’s introduction to recursive query processing strategies. ,3CM

SIGMUD Conf. pp. 16-52 (1986). [I?‘] R. J. M. Hughes. Why functional programming matters. The Comput. Jl 32, 98-107 (1989). [I81 K. L. Clark. Negation as failure. In Logic and Dafabases (Edited by J. Gallaire and J. Minkcr), pp* 293-224. Plenum

Press, New York. (1978). 1191 R. Reiter. A logic for default reasoning, Arrif. InfeN., 13, 81-132 (1980). [ZO] A. J_ Field and P. G. Harrison. ~~~crjo~u~ ~ro~r~~rn~~g. Addison Wesley, Reading, MA (1988). f21] W. F. Clocksin and C. S. MelI& Programming in Profog. Springer-Verlag, New York (1983). f22] R. Bird and P. Wadler. An ~n~~o~~r~o~ IO Functiomd Programming. Prentice Hall, New Jersey (1988). [23] S. Peyton-Jones. The fmpfementation of Functional Programming Languages. Prentice-Hall, New Jersey (1987). [24] S. Peyton-Jones. Parallel implementations of functional programming languages. The Comput. J 32, 175-186 (1989). [25] R. J. M. Hughes. Lazy memo functions. Functional Progrumming Languages and Computer Architectures. LNCS 201,

129-- 146. Springer-Verlag ( 1985). f26] R, M. Burstall and J. Darlington. A transformation system for developing recursive programs. f. ACM 24, 44-67

f 19771. 1271 ?. D.Clack, and S. L. Peyton-Jones. Strictness analysis-a practical approach. In Fun~~ion~~ ~ro~rumm~ng ~n~~~e.~

and Compwer Architectures. LNCS 20 I. 35119. Saringer-Verias f 198%. 1281 P. W. Trinder. A functional database. D. Phil Thksis,“Oxford &iversity Computing Laboratory (1989). [29] N. Perry. The implementation of practical functional programming languages. Ph.D. Thesis, Imperial College of

Science and Technology, London (1990). [30] L. Cardelli. Types for data-oriented languages. Advunces in Database Technology (EDBT 88), LNCS 303, pp. l--15.

Springer-Vertag. (1988). [3f] M. V. Man&o. I. J. Choi, and D. S. Batory. The object-oriented functional data laneuare. IEEE Trms. So&we - _

Engng 16, 1258-1272 (1990). 1321 D. A. Turner. Miranda-a non-strict functional language with polymorphic types. Functionui Programming Languages

und Computer Architecture, LNCS 201, pp. l-16. Springer-Verlan (1985). [33] D. J. McNally and A. J. T. Davie. Two models fir integrating persistence and lazy functional languages. .dCM

SIGPLAN Notices 26, 43-52 (1991). [34] K. Morris, 3. D. Ullman and A. van Gelder. Design overview of the NAIL! System. 3rd Int. Co& Logic Programming,

LNCS 225, pp. 554568. Springer-Verlag (1986). [35] J. Annevelink. Database programming languages: A functionaf approach. ACM SfDMUD Co& (1989). 1361 K. G. Kulkarni and M. P. Atkinson EFDM: extended functional data model. Tke Co~nput. Jf 29, 38-46 (1986). [37] F. Bancilhon, T. Briggs, S. Khoshafian and P. Valduriez. FAD, a powerful and simple database language. 13th Inf.

Conf: on Very Large Databases, Brighton, pp. 97-106 (1987). [38] J. Bocca. On the evaluation strategy of Educe. ACM SIGMOD CONF., pp. 368-378 (1986). [39] P. M. D. Gray, D. S. Moffat and N. W. Paton. A Prolog interface to a functional data model database. Adutmces

in Database Technology (EDBT 88), LNCS 303, pp. 34-48. Springer-Verlan it988). [4@] S. Manchanda and DI-Warren. A logic-based Ian&age for daiaba& updates. in ~~~~~~~j~~~ of Deductive Durabuses

und Lo.@ Prugr~mm~n~ (Edited by J. Mink&, DD. 363-394. Moran-Ka~mann. CA 0987): [41] S. Manchanda. De&a&t&e expression of ded&&e database updates. ACM Sy&. Prkx$s a!’ Database Systpms

pp. 93-100 (1989). [42] S. Abiteboul and V. Vianu. Procedural languages for database queries and updates. J. Comput. Syst. Sci. 41, 181.-229

(1990). [43] J-M. Nicolas. Logic for improving integrity checking in relational databases. Acru informuf. 18, 227-253 (198%). f44J H. Decker. fntegrity enforcement on deductive databases. isr hr. Cmf. on Expert Database Systems (1986).

a functional approach to database updates

Documents