dbtr-1

Upload: boitre

Post on 29-May-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 DBTR-1

    1/19

    Bringing Order to Query Optimization

    Giedrius Slivinskas, Christian S. Jensen, Richard T. Snodgrass

    September 12, 2002

    TR-1

    A DB Technical Report

  • 8/9/2019 DBTR-1

    2/19

    Bringing Order to Query Optimization

    Copyright c

    2002 Giedrius Slivinskas, Christian S. Jensen, Richard T.Snodgrass. All rights reserved.

    " !

    Giedrius Slivinskas, Christian S. Jensen, Richard T. Snodgrass

    #$ % & ') ( 0 1 3 2 5 46 1 7 8 @ 9

    ACM SIGMOD Record , Vol. 31, No. 2, June 2002, pp. 514.September 2002. A DB Technical Report.

    For additional information, see the DB T EC H R EPORTS homepage: A www.cs.auc.dk/DBTR B .

    Any software made available via DB T EC H R EPORTS is provided as is and without any express or implied warranties, including, without limitation, the implied warranty of merchantability and tness for a particular purpose.

    The DB T EC H R EPORTS icon is made from two letters in an early version of the Rune alphabet, whichwas used by the Vikings, among others. Runes have angular shapes and lack horizontal lines because theprimary storage medium was wood, although they may also be found on jewelry, tools, and weapons. Runeswere perceived as having magic, hidden powers. The rst letter in the logo is Dagaz, the rune for dayor daylight and the phonetic equivalent of d. Its meanings include happiness, activity, and satisfaction.The second letter is Berkano, which is associated with the birch tree. Its divinatory meanings includehealth, new beginnings, growth, plenty, and clearance. It is associated with Idun, goddess of Spring, andwith fertility. It is the phonetic equivalent of b.

  • 8/9/2019 DBTR-1

    3/19

    Abstract

    A variety of developments combine to highlight the need for respecting order when manipulating re-lations. For example, new functionality is being added to SQL to support OLAP-style querying in whichorder is frequently an important aspect. The set- or multiset-based frameworks for query optimizationthat are currently being taught to database students are increasingly inadequate.

    This paper presents a foundation for query optimization that extends existing frameworks to alsocapture ordering. A list-based relational algebra is provided along with three progressively stronger

    types of algebraic equivalences, concrete query transformation rules that obey the different equivalences,and a procedure for determining which types of transformation rules are applicable for optimizing aquery. The exposition follows the style chosen by many textbooks, making it relatively easy to teachthis material in continuation of the material covered in the textbooks, and to integrate this material intothe textbooks.

    1 Introduction

    The relational model was originally conceived as a set-based modelrelations were sets of tuples. Over thepast three decades, this property has been proclaimed a strength as well as a shortcoming of the relationalmodel.

    As a reection of this controversy, the user-level relational query language of choice, SQL, has longoffered a peculiar mix of orderedness and unorderedness. To illustrate, an ORDER BY clause permits tosorting of the tuples resulting from query based on any combination of their attributes, using ascending anddescending orderings. However, this clause is far from a rst class citizen in SQL. Rather, this clause is anadd-on that may be used only at the outermost level of a query, for ordering the result. For example, it isimpossible to create a view that includes the ORDER BY clause.

    With the more recent advent of on-line analytical processing, the ordering of query results had gainednew interest and prominence. For example, ordered top- C lists are often of interest in OLAP-style querying.Web search has also led to proposals for exploiting order. Specically, it is desirable to compute the resultsof a search in an ordered, page-by-page fashion. This often affords fast computation of the rst results andoften avoids computation of the entire results. In addition, XML documents are inherently ordered, and OR-DER BY clauses are used prominently when querying for XML documents in relational databases [CS01].

    Developments such as these have led to additions to SQL. For example, Microsofts SQL Server [Mic]offers a TOP D clause that, when specied with an integer argument D in the SELECT clause of a query,limits the number of tuples returned by the query to at most N. When used in conjunction with the ORDERBY clause, the rst N tuples of the result according to the specied order are returned. The Oracle DBMSenables TOP D queries by providing a pseudo-column ROWNUM that assigns rank values to rows accordingto a given ORDER BY clause [OraDev]. IBMs DB2 supports the clauses FETCH FIRST D ROWSONLY and OPTIMIZE FOR D ROWS; the rst returns the rst D rows, while the second asks theoptimizer to deliver the rst D rows faster than the rest [IBM].

    Whether to allow duplicates in query results, or to insist on relations indeed being sets, has generatedmuch discussion. Informed scientists and practitioners have conducted heated debates on this topic in trademagazines (not unlike in nature to the debates on null values!). This debate has been resolved in the sense

    that SQL does allow duplicates, and query optimization frameworks have emerged that consider relationsas multisets and thus afford a systematic treatment of duplicates.

    However, SQL remains mainly a set-oriented (or multiset-oriented!) language, with order being an add-on. This is perhaps the reason why the handling of order in query optimization is also in some sense anadd-on. Query optimization frameworks formalize relations as either sets or multisets, making it difcult tocapture, and formally reason about, order.

    1

  • 8/9/2019 DBTR-1

    4/19

    We believe that, like duplicates, order should be afforded fully integrated treatment in query optimiza-tion. The reasons are several. First, order is inherent to the physical representation of dataorder thusoccurs at the bottom of query plans, which may be exploited to produce better query plans. Second, sys-tematic, unconstrained reasoning about order throughout query plans, e.g., when the queries involve TOP-Nlike clauses, may lead to better plans.

    This paper offers a foundation for relational query optimization that offers comprehensive, sound, andintegrated coverage of duplicates and ordering. The foundation is enabled by a relational algebra on relationsthat are dened as lists and thus can be equivalent as sets, multisets, or lists. These types of equivalencescome into play because queries specify different types of results. For example, an SQL query not includingORDER BY and DISTINCT at the outermost level species a result of type multiset, thus rendering theapplication of transformations that need not preserve list equivalence.

    The paper provides transformation rules that satisfy the different equivalences and go beyond the exist-ing sets of rules known to the authors. In addition, a practical procedure is offered for determining when atype of transformation rule is applicable to a query.

    Some work has been reported on relational algebras for multisets [Alb91, DGK82, GUW00], with themost recent of these, by Garcia-Molina et al., being also the most extensive. This book offers compre-hensive coverage of query transformations that preserve set as well as multiset equivalences. Formalizingrelations as multisets, sorting is permitted only at the outermost level. However, pushing down sorting in

    a query plan can improve performance. Moreover, in some cases, the sorting must be performed early inthe query evaluation. For example, DBMSs such as Microsoft SQL Server allow the ORDER BY clause incombination with the TOP predicate in subqueries, thus requiring intermediate results to be sorted.

    Recent work by Pirahesh et al. [PLH97] emphasizes the importance of considering duplicates in DB2squery rewrite rules. However, duplicates are addressed as special cases when dening rewrite rules, andno formal foundation for reasoning about these is offered. Query optimizers such as Volcano [GMc93]initially generate search spaces of query plans without considering ordering, then take order into accountwhen considering the specic operator algorithms to use when transforming a (logical) query plan into aconcrete plan that may be executed by the query processor.

    Some research has been conducted on algebraic frameworks for queries on lists. Richardson [Ric92]uses an approach based on temporal logic to incorporate lists into an object-oriented data model. Seshadri

    et al. [SLR94, SLR95] propose a sequence data model and optimization techniques for sequence queries;while the model is relationally complete, the focus is on the processing of operators specic to sequence datasuch as time series. Our work aims to simplify and minimize the extensions to the conventional relationalalgebra, as well as permit the treatment of relations as multisets or sets, when order is not important.

    Carey and Kossmann [CK97] discuss how to efciently process TOP D and BOTTOM D queries byextending existing relational query processing architectures, and they propose a number of possible opti-mizations for such queries. These optimizations t into this papers foundation as specic transformationrules.

    Our earlier work [SJS01] presented a foundation for temporal query optimization including conven-tional query optimization that covered duplicates and order, as well as different types of transformationrules. All denitions omitted from this paper are included in that paper, which also covers some additionalrelated work. The present paper considers only conventional query optimization, adds the TOP D opera-tion and consequent transformation rules, and makes the argument that ordered relations should be treatedsystematically in query optimizers and textbooks.

    Section 2 proceeds to dene the extended relational algebra. The different types of algebraic equiv-alences are described in Section 3, and the concrete transformation rules that obey these are provided inSection 4. Section 5 gives a procedure for determining when a transformation rule is applicable, and Sec-tion 6 summarizes the paper. Two appendices provide precise denitions of auxiliary operations used in themain body of the paper.

    2

  • 8/9/2019 DBTR-1

    5/19

    2 An Extended Algebra

    To formally capture duplicates and ordering, the algebra to be dened must be based on relations thatare lists. Because it is also necessary to treat relations as sets or multisets, the semantics of the algebraoperations must follow the conventional relational algebra.

    It is also desirable that the operations be minimal and orthogonaleach operation should perform onesingle function and should minimally affect its argument(s) in doing so. This way, replication of functional-ity is avoided, and it is easier to combine operations in queries. Combinations of operations, termed idioms ,may be included for efciency, but should be identied as idioms.

    We proceed to dene the algebra, then exemplify the algebra and discuss pertinent properties.

    2.1 Database Structures

    We dene relation schemas, tuples, and relation schema instances in turn. The denitions are the standardones, but adapted to address duplicates and order.

    Denition 2.1 A relation schema is a four-tuple EG FI HQ PS R7 TU R VX W) Y` R8 ac b , where P is a nite set of attributes,T is a nite set of domains, VX Wd Yf eg Pi hp T is a function that associates a domain with each attribute, and

    a is a set of sets of attributes from P . q

    Consider relation rt sv u wt xv y in Figure 1. Relation schema r sv u wt x yv consists of the attributes xd 3 andv )

    and is formally a four-tuple HQ PS R7 TU R VX W) Y` R8 a b , where P F x) t g Rv )

    , T F 3 j0 k ,VX Wd Y F v Hl xd 3 Rm v n o 3 j0 k8 b8 R H

    v )

    Rm 3 j0 k8 b , and a F ) xd 3)

    ; a is essentially a set of keys forthe schema.

    Denition 2.2 A tuple over schema E F HQ P R7 TU R VX Wd Y` R8 ac b is a function ev P h $ 7 z | { , such that for everyattribute } of P , 0 H@ }S b ~i VX Wd Y H@ }S b . A relation schema instance over E is a nite sequence of tuples over

    E such that for any tuples R and for any set of attributes } R0 0 0 Rm } in a , 8 H@ }o bo Fi X H@ } b 0 0 )H@ } b6 F H@ } b . q

    r sv uv w x yv

    xd

    v X v

    1 100K

    2 80K

    3 130K

    4 110K

    5 110K

    Figure 1: Relation rt s uv w x y

    Note that the denition of a relation schema instance (relation, for short)

    corresponds to the denition of a list. A relation can thus contain dupli-cate tuples, and the ordering of the tuples is signicant. The rt sv u wt xv yrelation from Figure 1 is then the list Al R R d R ) R 0 B , where F

    v Hl x) t g R

    1b8 R H

    v )

    R 100K b and tuples correspond to the other tu-ples of the gure.

    2.2 Algebra Operations

    We proceed to dene the algebra operations. In the denitions, we use tobe the set of all tuples of any schema and to be the set of all relations, andlet ` ~G ` R F Al R R0 0 0 R B . We use -calculus for the denitions. Thedenitions do not imply actual implementation algorithms. The schema of the result relation is the same as the schema of the argument relation unlessnoted otherwise.

    Selection The selection operation eg S h corresponds to the well-known selection operation inthe relational algebra [GUW00]. The argument predicate from the set of all possible selection predicates

    is expressed as a subscript, i.e., | Hl b .

    3

  • 8/9/2019 DBTR-1

    6/19

    z Rm | Hl F b h R

    H ) Hl b F b h H@ - H t m X V Hl b b h t 7 X V Hl b8 R8 b8 R

    H@ - H t m X V Hl b b h t 7 X V Hl b8 R8 b n H d l Hl b b

    The arguments of an operation are given before the dot, and the denition is given after the dot. In thisdenition, the rst line says that if is empty (we denote an empty relation by ), the operation returns it.Otherwise, the second line is processed, which says that if contains only one tuple (the remaining part of the relation, ) Hl b , is empty), we test the predicate on the rst tuple ( t 7 X V Hl b ). If the predicate holds,the operation returns the tuple; otherwise, it returns an empty relation. If the second-line condition does nothold, the operation returns the rst tuple or an empty relation (depending on the predicate), with the resultof the operation applied to the remaining part of appended ( ).

    Standard auxiliary functions such as t 7 X V , d l , , and tuple concatenation ( )as well as the otherauxiliary functions used beloware dened in Appendices A and B.

    Projection In the projection operation e 0 0 3 h , is a set of arithmetic expressionsz

    e h , which includes any possible attribute names and which return single-attribute tuples. For thert s uv w x y relation, one possible expression

    is v v X

    s

    6

    . Functions

    R0 0 0 R

    are expressed as asubscript, i.e., 3 7 1 1 1 n Hl b .

    z R

    R0 0 0 R

    Hl o F o b hf R

    H m X V Hl X b b 6 0 0 z

    H m X V Hl X b b g 7 1 1 1 H d m Hl b b

    The schema of the result relation follows from the denition of tuple concatenation. The projection opera-tion can be used to add new attributes to the schema. If a new attribute is added, its value is set to y t foreach tuple of the argument relation.

    We also dene a foreign key below (for simplicity, foreign keys are dened at the instance level).

    Denition 2.3 A set of attributes } R0 0 0 Rm } of relation schema instance constitute a foreign key of relation schema instance with respect to a key R0 0 0 Rm of relation schema instance if and onlyif g 1 1 1 Hl b 1 1 1 Hl b . q

    Union-all Operation e h returns the union of two argument relations, retaining duplicates .

    The operation appends the second relation to the rst one.

    i ) 0 R d Hl X F b h R" m X V Hl ) b H d l Hl ) b b

    Cartesian Product Operation eg n h computes the Cartesian product of two argument relationsin nested loop fashion. The denition uses the auxiliary function W) 7 W W8 e h . The schemasresulting from and Wd 8 W W8 follow from the denition of tuple concatenation.

    i R H Hl F b Hl F b b h R W) 7 W W8 $ H t m V Hl b8 R b H d l Hl b b

    Wd 3 7 W W8 c R z Hl F b h R Hl m X V Hl X b b Wd 7 W W8 Hl R d l Hl b b

    Informally, nested-loop join is a nested-loop Cartesian product followed by a selection involving attributesfrom both arguments of the Cartesian product, and, possibly, followed by a projection.

    Difference Operation e h returns all tuples of the rst argument relation that are not in thesecond argument relation.

    i R H Hl F b Hl Fo b b hf R

    m 8 H t 7 X V Hl b8 R b h H d l m Hl b 6 m " Y Wd ) t H t m X V Hl b8 R b b8 R" t 7 X V Hl b H d l m Hl b b

    4

  • 8/9/2019 DBTR-1

    7/19

    Function 8 returns True if the argument tuple exists in the argument relation, and function m " Y Wd ) removes the rst occurrence of the argument tuple from the argument relation.The functions are dened inAppendix A.

    Duplicate Elimination Operation Vd e h removes duplicates from the argument relation. Thisoperation retains the rst occurrence of each tuple and removes all subsequent occurrences, if any.

    Vd i z Hl F b h5 z R

    8 H t 7 X V Hl b8 R d l Hl X b b h V) H t 7 X V Hl b m " Y Wd ) t H t m V Hl b8 R d l Hl b b b8 R

    t m V Hl b Vd H d l m Hl b b

    If the rst tuple of the argument relation can be found in the remaining part of the relation, the operationremoves that found tuple. Otherwise, the operation returns the rst tuple concatenated with the result of theoperation applied to the remaining part of the relation.

    Aggregation Operation eg PG - 0 0 X PG o - 0 hf performs aggregation according to givengrouping attributes and aggregation functions. The set of attributes in the schema of the argument relationsis denoted by P , and the set of all aggregation functions is denoted by ; an aggregate function

    e h

    takes a relation as argument and returns a single-attribute tuple containing the aggregate value. An exampleof an aggregate function is sv t H

    v X v

    b s .The operation returns one tuple for each unique sequence of grouping attributes. The schema of the

    result relation follows from the denition of concatenation. Our denition corresponds to that provided byKlug [Klu82] and Garcia-Molina et al. [GUW00].

    o R R0 0 0 R Rm R0 0 0 Rm Hl F b h5 z R

    H t 7 X V Hl b8 0 0 d m X V Hl X b8

    HQ v m W)

    1 1 1

    Hl z R" t 7 X V Hl X b b b 0 0

    HQ 0 m Wd

    1 1 1

    Hl R" m X V Hl b b b b

    1 1 1

    1 1 1 )

    Hl | v m Wd

    1 1 1

    Hl z R" t m X V Hl b b b

    The denition uses the auxiliary function v m W) eg P U 0 0 d P hf , which returns all tuples

    from the argument relation that have grouping-attribute values equal to those of the argument tuple. If thereare no grouping attributes, the function returns a list with all tuples of the relation.

    0 m Wd z R 8 R R0 0 0 R Hl Fo b h t g V Q R

    Hl Fi m X V Hl X b8 "

    0 0

    8 F t m X V Hl b8 b h

    H t 7 X V Hl b 0 m Wd

    1 1 1

    H d l Hl b8 R m b b8 R

    0 m Wd 3

    1 1 1

    H ) m Hl b8 R b

    Sorting Operation " W) e| f o h sorts its argument relation according to an order , whichbelongs to , the set of all possible orders for relations in . Order is expressed as a subscript, i.e.,

    " W) d Hl b .An order for a relation with schema E F HQ PS R7 TU R V Wd Y R8 a b is a relation with schema HQ P R7 T R VX W) Y R8 a b ,

    where P| | F z S Q R8 , T consists of the set of attributes in E , P , and the set of the two possible orderingsof an attribute, s

    R t x

    , V Wd YU S F v H Q 8 R7 P b8 R H R" s

    R x

    b , and a S F ) z S Q)

    . For example,A Hl x) t Rm s

    b8 R H

    v v X

    R t x

    b B is an example of an order for relation rt s uv w x y .To dene the sorting operation, we rst dene auxiliary function l 0 0 g - e ` 3 h ,

    which inserts a tuple into a sorted argument relation, maintaining its order. We denote the argument orderby .

    5

  • 8/9/2019 DBTR-1

    8/19

    l 0 0 g R z Rm v Hl Fo b h Al m B8 R

    Y m m V Hl 8 R" t 7 X V Hl b8 Rm X b h 5 z R

    t m X V Hl b 0 " " t Hl R d l Hl b8 Rm X b

    Function Y | 8 m Vv e G $ $ h Boolean returns True if the rst argument tuple precedes thesecond argument tuple according to the argument order.

    Y | 8 m Vv R Rm v H@ o Fo b h True RH m X V H@ ) b8 & Q 7 b H t 7 X V H@ ) b8 & F s

    b h R

    H m X V H@ ) b8 & Q 7 b

    Y | 8 m Vv t Hl R R d l H@ X b b

    Operation 0 Wd invokes l 0 0 g for each of its tuples.

    0 Wd i z Rm Hl F b h R0 l " " g t H t m V Hl b8 R" " Wd " H d l Hl X b b8 Rm X b

    Top Operation W8 eg 3 hf returns the rst C tuples of its argument relation , where C belongs tothe set of natural numbers, . Operation invocation uses the notation W8 Hl b .

    W8 ` R C Hl Fo - C` F! b h R" t m X V Hl b W8

    "

    H d l Hl X b b

    2.3 Example Query

    Having dened these operations, we exemplify their use in query plans, as well as indicate what kinds of transformations may be applied during optimization.

    Let us consider two relations, r sv uv w x yv (recall Figure 1) and xv w r $ #) u xv x (see Figure 2), and a query that

    x wt r % #X uv x x

    Result

    xd y ' & xd 3 y ' &

    v v X

    1 John 3 Peter 130K2 Tom 4 Anna 110K

    3 Peter 5 Suzanne 110K

    4 Anna 1 John 100K

    5 Suzanne

    Figure 2: Relation xv w r % #X u xv x and the Result Relation

    asks for a list of all employees (their IDs andnames) with salaries that are among the topthree highest salaries in the company, requir-ing the list to be sorted on the

    v X v

    at-

    tribute in descending order. Note that the re-sult (given in Figure 2) contains more thanthree tuples because several employees re-ceive the same salary.

    Figure 3(a) shows one possible initialquery plan. First, the r sv uv w x yv relation isprojected on its

    v )

    attribute, then dupli-cates are removed, and the top three salaries

    are selected. The Cartesian product and the subsequent selection then nd the IDs of all employees thatreceive a top three salary, and another Cartesian product with a selection is performed on the result andthe x wt rv $ #X uv x x relation in order to obtain the employees names. Finally, the result is projected on required

    attributes (for brevity, we do not specify from which relation the common attributes come) and sorted onthev X v

    attribute.Transformation rules that preserve different types of equivalences are applicable to different parts of a

    query. This is illustrated by the regions in Figure 3(a). Transformations below the top " W) operation andabove the W8 operation need not preserve order (indicated by the lighter shading). The top " W) operationensures that the result is correctly ordered. Transformations performed below the Vd operation need notpreserve duplicates, which is indicated by the darker shading.

    6

  • 8/9/2019 DBTR-1

    9/19

    top3

    PAYMENT

    Order needs not be preserved Duplicates are not relevant

    EMPLOYEE

    Salary DESC

    EmpID, Name, Salary

    sort

    1.EmpID = 2.EmpID

    (a)

    1.Salary = 2.Salary

    3PAYMENT top

    rdup

    sort Salary DESC

    Salary

    PAYMENT

    EMPLOYEE

    1.EmpID = 2.EmpID

    EmpID, Name, Salary

    sort Salary DESC

    rdup

    Salary

    PAYMENT

    (b)

    1.Salary = 2.Salary

    Figure 3: Initial Query Plan (a) and Resulting Query Plan (b)

    By systematically exploiting transformation rules preserving different types of equivalences, we are ableto achieve an optimized query tree such as the one shown in Figure 3(b). In this tree, the orders of theCartesian products have been switched, so that the left-most relation is the r sv u wt x yv relation projected onthe top three salaries. Since the Cartesian product is dened in nested-loop fashion, the order of its leftargument is preserved, and, consequently, the top 0 Wd operation is no longer necessary.

    Note that the Vd , " Wd " , and W8 operations do not have to be separate operations. Since they could beefciently implemented using a priority heap in main memory, an idiom involving the three operations maybe dened and used in query-plan generation.

    2.4 Operation Properties

    Section 2.2 dened only fundamental operations. The addition of derived operations (idioms), e.g., join(Cartesian product followed by selection and projection) and regular SQL union (union-all followed byduplicate elimination), would not introduce any new issues in the framework. However, idioms should beincluded in an implementation of the algebra.

    The algebra differs fundamentally from the algebra presented in [GUW00], in that this latter algebra

    7

  • 8/9/2019 DBTR-1

    10/19

    works on multisets, not lists. However, all our operations except W8 are list-insensitive, i.e., if their ar-gument relations are identical as multisets (but different as lists), their result relations are also identical asmultisets. When we treat relations as multisets, our algebra is at least as expressive as the one presentedin [GUW00] because each operation dened there may be expressed by combinations of the rst sevenoperations dened in Section 2.2.

    Most operationssuch as selection, Cartesian product, difference, duplicate elimination, and W8 retain the order of their (left) argument. Since the operation denitions constrain the orders of their results,an operation from the conventional relational algebra with several implementation algorithms may result inseveral operations being added to our algebra. For example, separate denitions are needed for nested-loop join and sort-merge join, since both return differently ordered results.

    The projection result is ordered on the largest prex of its argument order that contains the projectedattributes. For example, if we project relation , which is sorted on A H@ sg Rm s

    b8 R H) (g Rm s

    b8 R H

    R x

    b B , on s and

    , the result would be sorted on s . Similarly, the result of aggregation is ordered by the largest prex of its argument order that contains the grouping attributes. The result of sorting is the order specied by thesorting parameter if the latter is not a prex of the arguments order, and the arguments order otherwise.The result of union-all is unordered,

    An operation may (1) eliminate duplicates so that the result would only have distinct tuples, (2) retainduplicates, i.e., the result would have distinct tuples only if the argument relation(s) contains only distinct

    tuples, or (3) may generate duplicates in the result even if duplicates do not exist in the argument relation(s).Duplicate elimination and aggregation eliminate duplicates; and selection, Cartesian product, difference,sorting, and W8 retain duplicates. Projection generates duplicates only if the projection attributes do notcontain a key of the argument relation, and union-all always generates duplicates.

    3 Relation Equivalences

    The query optimizer does not always need to consider relations as lists. For example, if ORDER BY isnot specied in a query, it is enough to consider relations as multisets. To enable this type of treatmentof relations, three types of equivalences between relations are introduced: list equivalence ( 02 1 ), multisetequivalence ( 04 3 ), and set equivalence ( 0 5 ). Two relations are list equivalent if they are identical; multisetequivalent if they are identical as multisets, taking into account duplicates, but not order; and set equivalentif they are identical as sets, ignoring duplicates and order.

    Denition 3.1 Let functions 0 1 , 06 3 , and 07 5 be given, all with signature h Boolean. Rela-tions and are list equivalent ( 06 1 ), multiset equivalent ( 06 3 ), and set equivalent ( 06 5$ )if and only if function 04 1 , 06 3 , and 07 5 return True, respectively.

    06 1 R Hl F

    F b h True RHl F 85 Fo b h False R

    H t 7 X V Hl b F t m V Hl b b h ) Hl b9 06 1 d Hl b8 R

    False

    06 3 R Hl F

    F b h True RHl F @ 85 F o b h False R

    m 8 H m X V Hl b8 R b h ) Hl b9 06 3 0 Y Wd d t H t 7 X V Hl b8 R b8 R

    False

    8

  • 8/9/2019 DBTR-1

    11/19

    05

    i ) 0 R d Hl X F Fo b h True RHl F 85 F o b h False R

    m 8 H m X V Hl b8 R b h Yo 6 l H t 7 X V Hl b8 R bA 06 56 Y S H t m X V Hl b8 R b8 R

    False q

    Auxiliary function 8 Yo 6 l eg o hf removes all occurrences of the argument tuple from the argumentrelation and returns the resulting relation.

    Yo 6 8 R z Hl F b h R

    Hl F m X V Hl X b b h 8 Yo 6 l Hl R d l m Hl X b b8 R

    t m V Hl b Y S Hl 8 R d l m Hl b b

    We can exemplify different types of equivalences using different variations of the rt s uv w x y relation(Figures 1 and 4). Relations r sv uv w x yv and rt s uv wt xv y t are not equivalent as lists because the tuple orderingis different, but they are equivalent as multisets and sets. Relations rt s uv wt xv y and r sv u wt x yv are equivalentonly as sets, because the tuple for employee ID 3 is repeated twice in rt s uv w x y g .

    BD CF EF GF HF ID PR Q BF CF EF GD HF IF PR S

    HU T V% WY X `F ac bD ae dF f H T V% Wg X `F aF bD aR dF f

    2 80K 1 100K1 100K 2 80K3 130K 3 130K4 110K 3 130K5 110K 4 110K

    5 110K

    Figure 4: Variations of the r sv u wt x yv Relation

    The examples illustrate that we have an ordering be-tween the types of equivalences. Two relations being equiv-

    alent as multisets implies that they are also equivalent assets, and two relations being equivalent as lists implies thatthey are equivalent as both multisets and sets.

    The different types of equivalences can be exploited inheuristics-based query optimization. Transformation rules(to be discussed in detail shortly) can be divided into threecategories, one for each type of equivalence. For example,we may have a rule h hi 1 h , which says that afterthe replacement of expression h in the original queryplan by expression h , the result relation produced bythe new plan will be list equivalent to the result relation

    produced by the original plan, when evaluated on the same argument relation(s). That said, the resultrelations will also be multiset and set equivalent.

    Another rule h hi 3 h" g says that if we replace h" g by h g , the new plan will yield a resultrelation that may only be multiset equivalent to the result relation produced by the original plan, because theapplication of this rule does not preserve the order. This may be acceptable though, if the result needs to bea multiset. For example, query q ps r tu ru vs w Hl r sv u wt x yv b can return tuples in any order. In general, the type of theresult specied by a query determines which transformation rules can be exploited. The next two sectionslist transformation rules and describe when they are applicable.

    4 Transformation Rules

    In this section, we provide an extensive set of transformation rules for the algebra. First, we provide rulesthat derive from the conventional relational algebra. Then we discuss rules involving the duplicate elimina-tion, sorting, and W8 operations.

    The rules are given as equivalences that express that two algebraic expressions are equivalent accord-ing to one of the three equivalence types from Section 3; we always give the strongest equivalence typethat holds. An algebraic equivalence represents both a left-to-right and a right-to-left transformation rule.If necessary, we mark pre-conditions that apply only for the left-to-right transformation by

    X

    and pre-conditions that apply only for the right-to-left transformation by . Pre-conditions with no such marks

    9

  • 8/9/2019 DBTR-1

    12/19

    apply to both directions. All rules can be veried formally, as the operations and equivalence types haveformal denitions. We believe the transformations are correct; reference [SJS99] provides an example proof of one transformation rule.

    In transformation rules, can be a base relation or an operation tree. We denote the attribute domain of the schema of relation by Py x . Function ) Q @ returns the set of attributes present in a selection predicate,projection functions, or a sorting list.

    4.1 Conventional Rules

    The conventional transformation rules derive from the rules for multisets given by [GUW00]; we list themin Figure 5. The rules are ordered based on the operation they concern, e.g., rules C1C4 concern selection.

    (C1) n z Hl bA 06 1 n H@ n d Hl b b(C2) n z Hl bA 06 5 g Hl b g d Hl X b(C3) H@ Hl b b9 0 1 H@ Hl b b(C4) ' | Hl X bA 07 1 n Hl b

    (C5) g 7 1 1 1 H@ d 1 1 1 6 Hl X b b 06 1| g 7 1 1 1 Hl bX

    g ) Q @ H

    0 R0 0 0 R

    t b P x

    g ) Q @ H R0 0 0 R b P x

    (C6) g 7 1 1 1 n H@ n | Hl b b9 06 1| g | H@ g 7 1 1 1 n Hl b bX

    g ) Q @ H@ b i ) Q @ H

    R0 0 0 R

    b

    (C7) g 7 1 1 1 H@ n | Hl b b9 06 1| 7 1 1 1 H@ g H@ ' d 1 1 1 Hl b b b ,where

    Fi e g $ ~ d R0 0 0 Rs H

    ~

    R0 0 0 R

    ~G ) Q @ H@ b b g ) Q @ H@ b P x

    (C8) 07 3(C9) n | Hl b9 06 1| g Hl b|

    X

    g ) Q @ H@ b| P x

    (C10) g | Hl b9 06 1 g | Hl bX

    g ) Q @ H@ b| P x

    (C11) 1 1 1 n Hl ) b9 0 1 Hl X b q Hl b , where} F

    g ~ d R0 0 0 R C

    ) Q @ H

    b P x ,} F

    g ~ d R0 0 0 R C ) Q @ H

    b P x

    X

    F ' ~ d R0 0 0 R C ) Q @ H

    b| P x

    ) Q @ H

    b P xu

    g ) Q @ H@ } bA X Q @ H@ } b F

    (C12) 1 1 1 n Hl ) b9 0 1 g 1 1 1 g H@ Hl X 7 b| q Hl b b ,where } F e Y - ~` P x ~G ) Q @ H

    R0 0 0 R

    b ,}6 F e Y - ~` P

    x U ~ ) Q @ H

    R0 0 0 R

    t b g ) Q @ H

    0 R0 0 0 R

    b Px

    xu

    (C13) Hl b| 06 1| Hl b

    (C14) g | Hl $ bA 06 1 g Hl b $(C15) g | Hl $ bA 06 1 g Hl b n | Hl b(C16) 07 3 (C17) g | Hl bA 06 1 g Hl b n | Hl b(C18) 7 1 1 1 Hl b9 07 1| g 7 1 1 1 Hl b - g 7 1 1 1 Hl b

    (C19) g | Hl 1 1 1 1 1 1 ) Hl b b9 07 1 1 1 1 1 1 1 ) H@ g Hl b b X Q @ H@ b 0 R0 0 0 R (C20) 1 1 1 ) 1 1 1 Hl b9 06 1| 1 1 1 ) 1 1 1 H@ ' j Hl b b ) Q @ H R0 0 0 R Rm R0 0 0 Rm b l k

    Figure 5: Conventional Rules

    Most rules satisfy the list equivalence, but the commutativity rules, e.g., for Cartesian product andunion-all, satisfy only the 04 3 equivalence because the result relations produced by the left- and right-

    10

  • 8/9/2019 DBTR-1

    13/19

    side expressions have differently ordered tuples (see rules C8 and C16). Finally, rule C2 only satises06 5 equivalence because if both predicates and are satised for a tuple of , the right-hand side of the

    transformation would return two instances of the same tuple.

    4.2 Duplicate Elimination Rules

    Figure 6 lists duplicate elimination rules. Rules D1D2 indicate when duplicate elimination is not necessary.

    Rule D6 follows because aggregations involving only functions w3 y and wt s

    are insensitive to duplicates.

    (D1) Vd Hl bA 0 1 does not have duplicates(D2) Vd Hl bA 06 5(D3) Vd H@ g Hl b bA 06 1 n H V) $ Hl b b(D4) Vd H@ m 1 1 1 H Vd Hl b b b 07 1S V) H@ g 7 1 1 1 Hl b b(D5) Vd Hl b9 06 1S Vd Hl b V) Hl b(D6) 1 1 1 1 1 1 ) H Vd Hl b b9 0 1 1 1 1 1 1 1 ) Hl b X s m v H@ R0 0 0 Rm b n 0 w z yg R w s

    Figure 6: Duplicate Elimination Rules

    Duplicate elimination cannot be pushed before union-all because the latter may generate duplicates evenif its arguments do not contain any. Also, duplicate elimination cannot be pushed down before difference,because difference is sensitive to the number of duplicates in both arguments. If tuple occurs o times inthe rst argument and times in the second argument ( i o ), it occurs o times in the result. However,if we were to remove duplicates rst, tuple would occur only once in each argument to the difference, andit would be absent from the result.

    If duplication elimination is applied after an operation that does not manufacture duplicates, we can re-move the duplicate elimination using rule D1. Thus, duplicate elimination can be removed if it is performedon top of duplicate elimination or aggregation.

    4.3 Sorting Rules

    Sorting can be eliminated if performed on a relation that already satises the sorting, if we can treat therelation as multiset, or if there is a subsequent sorting operation. Predicate s | h$ 3 , dened formallyin Appendix A, takes two lists as argument and returns True is the rst if a prex of the second. Predicate

    Wd V " n Hl z Rm X b , also dened formally in Appendix A, takes a relation and an order and returns true if relationhas order . The sorting rules are given in Figure 7.

    If we wish to sort the result of some operation, the sorting can be performed on the argument relation(s)for that operation if the operation preserves the ordering. All operations except fully or partially preservethe ordering of their rst argument.

    4.4 TOP Rules

    Rules for the W operation are given in Figure 8. Several rules have applicability conditions involving thecardinality of the argument relations. These rules can only be applied if the exact cardinality is known, i.e.,if the cardinality is only estimated, these rules are not applicable.

    11

  • 8/9/2019 DBTR-1

    14/19

    (S1) " Wd " $ Hl b9 07 1 s m z { h$ 3 H@ }o Rm X b W) V 0 n Hl Rm X b(S2) " Wd " Hl b9 0 3(S3) " Wd " $ H 0 Wd " m Hl b b9 06 16 0 Wd Hl b s m z { h$ 3 H@ - Rm } b(S4) " Wd " $ H@ g | Hl X b b 06 1| n | H 0 Wd Hl b b(S5) " Wd " $ H@ 7 1 1 1 Hl b bA 06 1 g 7 1 1 1 H 0 Wd " m $ Hl b b

    X

    X Q @ H@ } b P| x

    X Q @ H@ } b i X Q @ H

    R0 0 0 R

    b

    (S6) " Wd " Hl X bA 0 1 0 Wd " Hl ) b X

    X Q @ H@ } b Px

    (S7) " Wd " $ Hl b9 06 16 0 Wd " $ Hl b (S8) " Wd " $ Hl 1 1 1 ) 1 1 1 Hl b b9 06 1 1 1 1 ) 1 1 1 H " W) Hl X b b ) Q @ H@ } b| 0 R0 0 0 R (S9) " Wd " $ H Vd $ Hl b b9 06 1S Vd H 0 Wd " $ Hl b b

    Figure 7: Sorting Rules

    (T1) W8 Hl b9 06 1 C Hl b} C(T2) W8 H@ g 7 1 1 1 g Hl b bA 06 1 7 1 1 1 n H W8 Hl b b(T3) W8 Hl bA 06 1 W8 H W8 Hl b b(T4) W8 Hl bA 06 1 W8 Hl G W8 Hl b b(T5) W8 H@ n | Hl b b9 06 1| g | H W8 Hl b b H H@ } F 0 0 } F b| ~ b

    } R0 0 0 Rm } is a foreign key of

    R0 0 0 Rm 6 is a key of

    (T6) W8 Hl bA 06 1 W8 Hl b C Hl b ~ C(T7) W8 Hl bA 06 1 ` W

    D

    Hl b C Hl b9 5 CA F C

    Figure 8: TOP D Rules

    5 Applicability of Transformation Rules

    Queries expressed in SQL are mapped to an initial algebraic expression, to which the optimizer then applies

    transformation rules according to some strategy. The resulting, new algebraic expressions must, when evalu-ated, return relations that are equivalent to the relation returned by the original expression, which we assumecorrectly computes the users query. The type of equivalence required between result relations depends onthe actual query statement; we name the required equivalence between results the outer equivalence andassign it to the root of the query tree.

    For SQL queries, the outer equivalence is 0 3 or 06 1c 1 , depending on whether the query given in-cludes ORDER BY A . The presence of ORDER BY species a list; otherwise, the query species a mul-tiset, rendering order of the result tuples immaterial. Intuitively, we can apply transformation rules to aquery evaluation plan if the result relations produced by the new plan and the original plan are equivalent asmultisets or lists, depending on whether or not ORDER BY was specied.

    Having the outer equivalence, we can derive the required equivalence for each operation in the query

    tree. Due to the different characteristics of operations, an operation somewhere in the query tree mayrequire an equivalence that is not the same as the outer equivalence. For example, in the query tree shownin Figure 3(a), the outer equivalence is 0 1 , but the operations between the top 0 Wd operation and the W8 operation do not need to preserve order; hence, 0 3 rules are applicable.

    The required equivalences constrain the types of transformation rules that can be applied during query1 Two relations are R equivalent if they are equivalent and their projections on attribute list are equivalent. The

    R equivalence is slightly less restrictive than ; the equivalence implies the R equivalence.

    12

  • 8/9/2019 DBTR-1

    15/19

    plan enumeration. There are no restrictions on rules of type 0 1 these can always be applied safelybecause a transformed expression evaluates to a result identical as a list to that obtained from evaluating theoriginal expression.

    To enable the formal procedure of determining when a transformation rule is applicable to a query plan,we introduce properties for the operations in an operation tree.

    5.1 Denitions of Properties

    Table 1 introduces two Boolean properties of operations of a query tree. For each combination of the prop-erty values, Table 2 gives an equivalence type that should hold for results of that operation. A transformationrule of some type can be applied at some location in a query tree if the result produced by its right-hand sideis equivalent to the result produced by its left-hand side according to the required equivalence type, as spec-ied by the properties for the top-most operation at that location. For example, rule G8 guarantees only the

    06 3 equivalence between its right- and left-hand side, but it can be applied to the query plan in Figure 3(a)to both Cartesian products because the required equivalence at each location is 0 3 (the W) V 0 s propertyvalue is False).

    Property Name DescriptionWd V "

    True the result of the operation must preserve some orderingVd % 0 0 z d g True the operation cannot arbitrarily add or remove duplicates

    Table 1: Properties of an Operation in an Operation Tree

    Wd V 0 s g H@ 3 b V) $ " " d d n H@ 3 b Type

    True True or False 07 1cFalse True 04 3False False 07 5

    Table 2: Combinations of Property Values and Corresponding Equivalence Types

    During query optimization, the properties are rst set for the initial query evaluation plan. For the root,the W) V 0 s property is set to True only if the ORDER BY clause is specied at the outer-most level of the user query, and the Vd % " " z ) n property is always set to True. Then, the two properties are propagateddown the tree from the root.

    Table 3 denes the Vd % " " d d n property values for a non-root operation . This property dependsalmost entirely on the parent of the operation, denoted W8 , and it is independent of the specic W8 . Forbinary operations, keywords @ " and ) v denote the location of relative to its parent. If this propertyholds at the parent, it also holds at a child, except: (1) when the parent operation is difference, the operationin question is located at the right child, and the relation produced by the left child does not contain dupli-cates; (2) when the parent operation is duplicate elimination, because then the child operation may deal withduplicates in any way, since they will later be removed; and (3) when the parent operation is aggregationwith only the duplicate-insensitive aggregation functions ( w3 y and wt s

    ).To set the property for the right child of difference, an auxiliary property Y F e d d o g is used, which

    tells if the relation produced by the child operation may contain duplicates. This property is propagatedbottom-up from the base relations using the duplicate-preservation properties of operations as described inSection 2.4.

    13

  • 8/9/2019 DBTR-1

    16/19

    W8 Vd % " " d d n H@ 3 b

    n , g 7 1 1 1 , ( @ " and 8 d ), Vd % 0 0 z d n H W b( @ " and 8 d ), 0 Wd " , W8

    ( @ " ) True ( ) v ) Y F e d d o g v H@ z 7 H W8 b b

    V) False

    1 1 1

    d 1 1 1 False ) X s m H@ R0 0 0 Rm S b n 0 w3 z yg R w s

    True $ D e e

    Table 3: The Vd % 0 0 z d g Property

    The next case to consider is when the property does not hold at the parent. Then, the property holds ata child in the following situations: (1) when the parent operation is difference, the operation in question islocated at the left child, or it is located at the right child, and the relation produced at the left child does notcontain duplicates; and (2) when the parent operation is aggregation with at least one duplicate-sensitivefunction ( sv , w , or

    #z yv ).Table 4 describes the propagation of the orderReq property. This property also depends almost entirely

    on the parent of the operation. Most often, the Wd Vv " property holds for an operation at a child nodewhen it holds for the operation at the parent node and the parent node operation preserves the order of itsargument. For example, if order is required for a select operation ( ), then order will be required of theimmediate child of that operation. However, if the parent operation is " Wd " , the property does not hold forits immediate child because the order of the argument is immaterial. In contrast, if the parent operation is

    W8 , the property holds for its immediate child because the order of the argument is important.

    W8 Wd Vv " g H@ b

    n , g 7 1 1 1 , ( @ " ), ( Q ), Vd , 1 1 1 1 1 1 ) Wd Vv " g H W8 b ( @ " and 8 d v ), ( 8 d v ), ( ) v ), 0 Wd " False

    W8 True

    Table 4: The Wd V 0 s Property

    When a transformation rule is applied during query optimization, the properties must be adjusted. Thetop-down nature of property denitions ensures that adjustments for most of the rules are local, i.e., it is notnecessary to scan the whole operation tree [SJS01].

    6 Summary

    With the advent of on-line analytical processing and the use of database technology in Internet search,the ordering of query results has gained new interest and prominence. Thus, TOP-N like queries havereceived increased attention in the user community, and major DBMS vendors have included support forsuch queries into their products over the past few years. However, order is far from a rst-class citizen inquery optimization, where relations are often viewed as sets or multisets. In contrast, we believe that, likeduplicates, order should be afforded fully integrated treatment in query optimization.

    This paper presents a foundation for relational query optimization that offers comprehensive and pre-cise handling of duplicates and order. This is enabled by a list-based algebra where relations thus can beequivalent as sets, multisets, or lists. This leads to three types of transformation rules that can be exploited

    14

  • 8/9/2019 DBTR-1

    17/19

    during query optimization, depending on whether the ORDER BY or DISTINCT clauses are specied inan SQL query. In addition, a procedure is offered for determining when a rule of some type is applicableto a query tree. This foundation proposes to handle the sorting and W8 operations as all the other algebraoperations during the search-space generation.

    While the foundation proposed here may readily be integrated into database textbooks so that studentsget exposed to the issues related to duplicates and order, much research and engineering remains to be doneto reect the foundation in an efcient query optimizer.

    References

    [Alb91] J. Albert. Algebraic Properties of Bag Data Types. In Proc. VLDB, pp. 211219 (1991).

    [CK97] M. J. Carey and D. Kossmann. Processing Top N and Bottom N Queries. Data Engineering Bulletin , 20(3):1219 (1997).

    [CS01] S. Chaudhuri and K. Shim. Storage and Retrieval of XML Data using Relational Databases.Tutorial presented at VLDB 2001.

    [DGK82] U. Dayal, N. Goodman, and R. H. Katz. An Extended Relational Algebra with Control overDuplicate Elimination. In Proc. PODS , pp. 117123 (1982).

    [GMc93] G. Graefe and W. J. McKenna. The Volcano Optimizer Generator: Extensibility and EfcientSearch. In Proc. IEEE ICDE , pp. 209218 (1993).

    [GUW00] H. Garcia-Molina, J. D. Ullman, and J. Widom. Database System Implementation . Prentice Hall(2000).

    [IBM] DB2 Universal Database and DB2 Connect for Windows, OS/2 and Unix. AdministrationGuide. www-4.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/document.d2w/report?fn=db2v7d0frm3toc.htm , current as of August 2, 2001.

    [Kie85] W. Kiessling. On Semantic Reefs and Efcient Processing of Correlation Queries with Aggre-gates. In Proc. VLDB, pp. 241249 (1985).

    [Klu82] A. Klug. Equivalence of Relational Algebra and Relational Calculus Query Languages HavingAggregate Functions. JACM , 29(3): 699717 (1982).

    [PLH97] H. Pirahesh, T. Y. C. Leung, and W. Hasan. A Rule Engine for Query Transformation in Starburstand IBM DB2 C/S DBMS. In Proc. IEEE ICDE, pp. 391400 (1997).

    [Mic] Microsoft SQL Server Product Documentation. www.microsoft.com/sql/techinfo/productdoc/2000/ , current as of July 27, 2001.

    [OraDev] Oracle8i Application Developers Guide - Fundamentals. technet.oracle.com/doc/server.815/a68003/toc.htm , current as of July 27, 2001.

    [Ric92] J. Richardson. Supporting Lists in a Data Model (A Timely Approach). In Proc. VLDB,pp. 127138 (1992).

    [SJS99] G. Slivinskas, C. S. Jensen, and R. T. Snodgrass. Query Plans for Conventional andTemporal Queries Involving Duplicates and Ordering. T IME C ENTER TR-49 (1999).

    www.cs.auc.dk/TimeCenter , current as of July 27, 2001.

    15

  • 8/9/2019 DBTR-1

    18/19

    [SJS01] G. Slivinskas, C. S. Jensen, and R. T. Snodgrass. A Foundation for Conventional and TemporalQuery Optimization Addressing Duplicates and Ordering. IEEE TKDE , 13(1):2149 (2001).

    [SLR94] P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence Query Processing. In Proc. ACM SIGMOD, pp. 430441 (1994).

    [SLR95] P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A Model for Sequence Databases. In Proc. IEEE ICDE, pp. 232239 (1994).

    A Auxiliary Operations on Relations

    Head Function m X V eX h returns the rst tuple of the argument relation.

    t 7 X V- i z Hl F b h t g Vv @ R

    Tail Function d e) h returns the argument relation without its rst tuple.

    d l z Hl F b h V @ R Al d R0 0 0 R B

    According to the denition,d l

    applied to a relation with one tuple returns an empty relation.

    Append Function en hf prepends the argument tuple to the argument relation.

    i R Hl Fo b h Al m B8 R Al 8 R R0 0 0 R B

    isIn Function m 8 ` eg h Boolean returns True if the argument tuple exists in the argument relationand False, otherwise.

    8 8 R z Hl F b h False RHl F m X V Hl X b b h True R

    m 8 Hl R d l Hl b b

    Remove Function 0 Y W) d U e h removes the rst occurrence of the argument tuple from theargument relation. The schema of the argument relation is retained for the result relation.

    m " Y Wd ) R z Hl F b h R

    Hl F t 7 X V Hl b b h ) Hl b8 R

    m X V Hl X b 0 Y W) d t Hl R d m Hl b b

    isPrexOf Function s | h$ 3 e f h Boolean returns True if one relation is a prex of otherrelation.

    s | h$ 3 R Hl F b h True RH t m X V Hl b F m X V Hl b b s m z { h$ 3 H d Hl b8 R d l m Hl b b

    order Function Wd V " e 5 6 h Boolean returns True if the argument relation has the argumentorder.

    Wd V 0 z Rm H t m V H d l m Hl b b F b h True RY | 8 7 V t H t m V Hl b8 R" t m V H d l Hl X b b8 Rm X b

    Wd V " H d l m Hl X b8 Rm X b

    16

  • 8/9/2019 DBTR-1

    19/19

    B Auxiliary Operations on Tuples

    Concatenation Function e o h concatenates two tuples. Let two tuples be and and theircorresponding schemas be E F HQ P R7 T R D b and E F HQ P R7 T R v D b . We dene the result tuple s xand its schema Eq x F HQ P xz R7 T xz R D x b as follows. An attribute name of the schema of the result tuple isprexed by 1 and 2 only if the attribute appears in the schemas of both argument tuples.

    x| v Hz z Rs ) z b t H Hz v z Rs q b ~

    ~ P b H Hz z Rs ) z b| ~

    ~ P b

    v Hu d - Rs ) z b H Hz z Rs ) z b| ~ 8

    ~` P| b

    v H - Rs ) z b H Hz z Rs ) z b| ~ ~` P b

    P x e Hz U ~ P

    ~ P b Hz U ~ P

    ~ P b

    d - Hz ~ P ~ P b

    R - Hz ~ P|

    ~ P6 8 b

    T x T T

    v R v Hz z R e z b t H Hz v z R R b ~ D

    v ~ P b H Hz v z R R b ~ D

    v ~ P b

    v Hu d - R e q b Hz z R e z b ~ v R

    v ~ P b

    v H - R e q b Hz z R e z b ~ v R v ~ P6 8 b

    For example, the concatenation of tuples F v H@ D ) Rm z b8 R H E z v F g R F b8 R| H d R U b8 R H R F band F v H@ D ) Rm z b8 R Hz u d 0 C R8 E R z b leads to tuple x F v Hu d D ) Rm z b8 R H E ) v F g R F b8 R

    H d R U b8 Rn H R F b8 Rn H D ) Rm z b8 R Hz s ' d 0 C R8 E z R z b .

    17