database management sys1

8/14/2019 Database Management Sys1

1/12

Access Path Selectionin a Relational Database Management SystemP. Griffiths SelingerM. M. AstrahanD. D. Chamberlin

, It. A. Lorie.: T. G. Price4:IBM Research Division, San Jose, California 95193

ABSTRACT: In a high level query and datamanipulation language such as SQL, requestsare stated non-procedurally, withoutreference to access paths. This paperdescribes how System R chooses access pathsfor both simple (single relation) andcomplex queries (such as joins), given auser specification of desired data as aboolean expression of predicates. System Ris an experimental database managementsystem developed to carry out research onthe relational model of data. System R wasdesigned and built by members of the IBMSan Jose Research'Laboratory.1. Introduction

System' R is an experimental databasemanagement system based on the relationalmodel of data which has been under develop-ment at the IBM San Jose Research Laborato-ry since 1975 Cl>. The software was. developed as a research vehicle in rela-tional database, and is not generallyavailable outside the IBM Research Divi-sion.This paper assumes familiarity withrelational data model terminology asdescribed in Codd and Date . Theuser interface in System R is the unifiedquery, data definition, and manipulationlanguage SQL . Statements in SQL can beissued both from an on-line casual-user-or-iented terminal interface and from program-ming languages such as PL/I and COBOL.In System R a user need not know howthe tuples are physically stored and whataccess paths are available (e.g. whichcolumns have indexes). SQL statements do

not require the user to specify anythingabout the access path to be used for tuplePermission to copy without fee all or part o f thismaterial is granted provided that the copies arenot made or distributed for direct couunercial ad-vantage, the ACMcopyright notice and the title ofthe publication and its date appear, and notice isgiven that copying is by permission of the Associa-tion for Computing Machinery. To copy otherwise,or to republish, requires a fee end/or specificpermission.01979 ACM 0-89791-001-X/79/0500-0023 $00.75

retrieval. Nor does a user specify in whatorder joins are to be performed. TheSystem R optimizer .chooses both join orderand an access path for each table in theSQL statement. Of the many possiblechoices, the optimizer chooses the onewhich minimizes "total access cost" forperforming the entire statement.This paper will address the issues ofaccess path selection for queries.Retrieval for data manipulation (UPDATE,DELETE) is treated similarly. Section 2will describe the place of the optimizer inthe processing of a SQL statement, andsection 3 will describe the storage compo-nent access paths that are available on asingle physically stored table. In section4 the optimizer cost formulas are intro-duced for single table queries, and section5 discusses the joining of two or moretables, and their corresponding costs.Nested queries (queries in predicates) arecovered in section 6.

2. processi.Bg & B.B u statementA SQL statement is sub jected to fourphases of processing. Depending on 'theorigin and contents of the statement., thesephases may be separated by arbitraryintervals. of time. In System RI thesearbitrary time intervals are transparent tothe system components which process a SQLstatement. These mechanisms and a descrip-tion of the processing of SQL statementsfrom both programs and terminals arefurther discussed in . Only an overviewof those processing steps that are relevantto access path selection will be discussedhere.

The four phases of statement processingare-parsing, optimization. code generation.and execution. Each SQL statement is sentto , the parser. where it is checked forcorrect syntax. A guery block is repre-sented by a SELECT list, a FROM list, and aWHERE tree, containing, respectively thelist of .items to be retrieved, the table(s)referenced, and the boolean combination ofsimple predicates specified by the user. Asingle SQL statement may have many queryblocks because a predicate may have one

23


2/12

operand which is itself a query.If the parser returns without anyerrors detected, the OPTIMIZER component iscalled. The OPTIMIZER accumulates thenames of tables and columns referenced inthe query and looks them up in the System Rcatalogs to verify their existence and toretrieve information about them.The catalog lookup portion of the

OPTIMIZER also obtains statistics about thereferenced relations, and the access pathsavailable on each of them. These will beused later in access path selection. Aftercatalog lookup has obtained the datatypeand length of each column, the OPTIMIZERrescans the SELECT-list and WHERE-tree tocheck for semantic errors and type compati-bility in both expressions and predicatecomparisons.

Finally the OPTIMIZER performs accesspath selection. It first determines theevaluation order among the query blocks inthe statement. Then for each query block,the relations in the FROM list areprocessed. If there is more than onerelation in a block, permutations of thejoin order and of the method of joining areevaluated. The access paths that minimizetotal cost for the block are chosen from atree of alternate path choices. Thisminimum cost .solution is represented by astructural modification of the parse tree.The result is an execution plan in theAccess Specification Language (ASLI .

After a plan is chosen for each queryblock and represented in the parse tree,the CODE GENERATOR is called. The CODEGENERATOR s a table-driven program whichtranslates ASL trees into machine languagecode to execute the plan chosen by theOPTIMIZER. In doing this it uses a rela-tively small number of code templates, onefor each type of join method (including nojoin). Query blocks for nested queries aretreated as "subroutines" which returnvalues to the predicates in which theyoccur. The CODE GENERATOR is furtherdescribed in .During code generation, the parse treeis replaced by executable machine code andits associated data structures. Eithercontrol is immediately transfered to thiscode or the code is stored away in thedatabase for later execution, depending onthe origin of the statement (program orterminal). In either case, when the codeis ultimately enecuted, it calls upon theSystem R internal storage system (RSS) viathe storage system interface (RSII to scaneach of the physically stored relations inthe query. These scans are along theaccess paths chosen by the OPTIMIZER. TheRSI commands that may be used by generatedcode are described in the next section.

3. _T'he Research Storaae SystemThe Research Storage System (RSSI isthe storage subsystem of System R. It isresponsible for maintaining Physicalstorage of relations, access paths on theserelations, locking (in a multi-user envi-ronment), and logging and recovery facili-ties. The RSS presents a tuple-orientedinterface (RSII to its users. Although theRSS may be used independently of System R,we are concerned here with its use forexecuting the code generated by the proces-sing of SQL statements in System R, asdescribed in the previous section. For acomplete description of the RSS, see .

Relations are stored in the RSS as acollection of tuples whose columns arephysically contiguous. These tuples arestored on 4K byte pages; no tuple spans apage. Pages are organized into logicalunits called segments. Segments maycontain one or more relations, but norelation rn%y span a segment. Tuples fromtwo or more relations may occur on the samepage. Each tuple is tagged with theidentification of the relation to which itbelongs.

The'primary way of accessing tuples ina relation is via an RSS scan. A scanreturns a tuple at a time along a givenaccess path. OPEN, NEXT, and CLOSE are theprincipal commands on a scan.Two types of scans are currentlyavailable for SQL statements. The firsttype is a segment scan to find all thetuples of a given relation. A series ofNEXTs on a segment scan simply examines allpages of the segment which contain tuples,from any relation, and returns those tuplesbelonging to the given relation.The second type of scan is an indexscan. An index may be created by a.Sy.stemR user on one or more columns of a rela-tion, and a relation may have any number(including zero1 of indexes on it. Theseindexes are stored on separate pages fromthose containing the relation tuples.Indexes are implemented as B-trees ,whose leaves are pages containing sets of(key, identifiers of tuples .. which containthat key). Therefore a series of NEXTs onan index scan does a sequential read alongthe leaf Pages of the index, obtaining thetuple identifiers matching a key, and usingthem to find and return the data tuples to

the user in key value order. Index leafpages are chained together so that NEXTsneedinot reference any upper level Pages Ofthe i,ndex.In a segment scan, all the non-emptypage5 of a segment will be touched. regard-less of whether there are any tuples fromthe desired relation on them. However,each page is touched only once. When anentire relation is enamined via an indexscan, each page of the index is touched

24


3/12

only once, but 'a data page may be examinedmore than once if it has two tuples on itwhich are not "close" in the index order-ing. If the tuples are inserted intosegment pages in the index 'ordering, and ifthis physical proximity corresponding toindex key value is maintained, we say thatthe index is clustered. A clustered indexhas the property that not only each indexpaw P but also each data page containing atuple from that relation will be touchedonly once in a scan on that index. ,...:An index scan need not scan the *entirerelation. Starting and stopping key valuesmay be specified in order to scan onlythose tuples which have a key in a range ofindex values. Both index and segment scansmay optionally take a set of predicates,called search arguments (or SARGS), whichare applied to a tuple before it isreturned to the RSI caller. If the tuplesatisfies the predicates, it is returned;otherwise the scan continues until iteither finds a tuple which satisfies theSARGS or exhausts the segment or thespecified index value range. This reducescost by eliminating the overhead of makingRSI calls for tuples which can be effi-ciently rejected within the RSS. Not allpredicates are of the form that can becomeSARGS. A sm predicate is one of theform (or which can be put into the form)ncolumn comparison-operator value". SARGSare expressed as a boolean expression ofsuch predicates in disjunctive normal form.

f 4. fox sinqleosts relation access pathsIn the next 'several sections we willdescribe the process of choosing a plan forevaluating a query. We will first describethe simplest case, accessing a single

relation, and show how it extends andgeneralizes to t-way joins of relations,n-way joins, and finally multiple queryblocks (nested queries).The OPTIMIZER examines both the predi-cates in the query and the access pathsavailable on the relations referenced bythe queryI and.formu1ate.s a cost predictionfor each access plan, using the followingcost formula:COST = PAGE -FETCHES + W * (RSI CALLS).This cost is' a weighted measure of I/O(pages fetched) and CPU utilization(instructions executed). W is an adjusta-ble weighting factor between I/O and CPU.RSI CALLS is the predicted number 0.f tuples

returned from the RSS. Since most ofSystem R's CPU time is spent in the RSS,the number of RSI calls is a good approxi-mation for CPU utilization. Thus thechoice of a minimum cost path to process aquery attempts to minimize total resourcesrequired.During execution of the type-compati-bility and semantic checking portion of theOPTIMIZER, each query block's WHERE tree ofpredicates is examined. The WHERE tree is

considered to be in conjunctive normalform, and every conjunct is called aboolean factoy. Boolean factors arenotable because every tuple returned to theuser must satisfy every boolean factor. Anindex is said to match a boolean factor i,fthe boolean factor iswhose referenced a sargable predicatecolumn is the index key;e.g., an index on SALARY matches thepredicate 'SALARY = 20000'. More precise-ly, we say that a predicate or set ofpredicates matches an index access pathwhen the predicates are sargable and thecolumns mentioned in the predicate(s) arean initial substring of the set of columnsof the index key. For example. a NAME,LOCATION index matches NAME = 'SMITH' ANDLOCATION = 'SAN JOSE'. If an index matchesa boolean factor, an access using thatindex is an efficient way to satisfy theboolean factor. Sargable boolean factorscan also be efficiently satisfied if theyare expressed as search arguments. Notethat a boolean factor may be an entire treeof predicates headed by an OR.During catalog lookup, the OPTIIlIZER

retrieves statistics on the relations inthe query and on the access paths availableon each relation. The statistics kept arethe following:For each relation T.- NCARD(TIr the cardinality of relation T.- TCARDfT). the number of pages in thesegment that hold tuples of relation T.- P(T), the fraction of data pages in thesegment that hold tuples of relation T.P(T) = TCARD(T1 / (no. of non-emptypages in the segment).

For each index I on relation T,- ICARD( number of distinct keys inindex I.- NINDXfIlr the number of pages in index I.

These statistics are maintained in theSystem R catalogs, and come from severalsources. Initial relation loading andindex creation ihitialize these statistics.They are then updated periodically by anUPDATE STATISTICS command, which can be runby any user. System R does not updatethese statistics at every INSERT, DELETE,or UPDATE because of the extra databaseoperations and the locking bottleneck th,iswould create at the system catalogs.Dynamic updating of statistics would tendto serialize accesses that modify therelation contents.

Using these statistics, the OPTIMIZERassigns a selectivity factor 'F' for eachboolean factor in the predicate list. Thisselectivity factor very roughly correspondstb the of tuples whichwk expected fraction11 satisfy the predicate. TABLE 1 givesthe selectivity factors for different kindsof predicates. We assume that a lack ofstatistics implies that the relation issmall, so an arbitrary factor is chosen.

25


4/12

TABLE 1 SELECTIVITY FACTORScolumn = value. F = 1 / ICARDfcolumn index) if there is an index on columnThis assumes an even distribution of tuples among the index keyvalues.F = l/10 otherwisecolumn1 = column2F = l/MAX(ICARD(columnl index), ICARD(column2 index))

if there are indexes on both column1 and column2This assumes that each key value in the index with the smallercardinality has a matching value in the other index.F = l/ICARD(col.umn-i index) if there is only an index on Column-iF = l/l0 otherwisecolumn > value (or any other open-ended comparison)F = (high key value - value) / (high key value - low key value)Linear interpolation of the value within the range of key valuesyields F if the column is an arithmetic type and value is kno,wn ataccess path selection time.F = l/3 otherwise (i.e. column not arithmetic)There is no significance to this number, other than the fact thatit is less selective than the guesses for equal predicates forwhich there are no indexes, and that it is less than l/2. Wehypothesize that few queries use predicates that are satisfied by

more than half the tuples.column BETWEEN value1 AND value2F = (value? -. value11 / (high key value - low key value)A ratio of the BETWEEN value range to the entire key value range isused as the selectivity factor if column is arithmetic and bothvalue1 and value2 are known at access path selection.F = l/4 otherwiseAgain there is no significance to this choice except that it isbetween the default selectivity factors for an equal predicate anda range p.redicate.

column IN (list of values)F = (number of items in list) * (selectivity factor for column =value 1This is allowed to be no more than l/2.

columnA IN subqueryF = (expected cardinality of the subquery result) /(product of the cardinalities of all the relations in thesubquerys FROM-list).The computation of query cardinality will be discussed below.This formula is derived by the following argument:Consider the simplest case, where subquery is of the form SELECTcolumnB FROM relationc . ... Assume that the set of all columnBvalues in relationc contains the set of all columnA values. If allthe tuples of relationc are selected by the subquery, then thepredicate is always TRUE and F = 1. If the tuples of the subqueryare restricted by a selectivity factor F, then assume that the setof unique values in the subquery result that match columnA valuesis proportionately restricted, i.e. the selectivity factor for thepredicate should be F. F is the product of all the subquerysselectivity factors. namely (subquery cardinality) / (cardinalityof all possible subquery answers). With a little optimism, we canextend this reasoning to include sifbqueries which are joins andsubqueries in which columnB is replabed by an arithmetic expressioninvolving column names. This leads to the formula given above.

fpred expression11 OR (pred expression2)F = F(pred1) + F(pred2) - Ffpredll * F(pred21

26


5/12

(predll AND (predtlF = Ffpredl) * F(pred27Note that this assumes that column values are independent.NOT pred F = 1 - Ffpredl

Query cardinality (QCARD) is the product ofthe cardinalities of every relation in thequery block's FROM list times the productof all the selectivity factors of thatquery block's boolean factors. The numberof expected RSI calls (RSICARD) i* theproduct of the relation cardinalities timesthe selectivity factors of the s_arsablqboolean factors, since the sargable booleanfactors will be put into search argumentswhich will filter out tuples withoutreturning across the RSS interface.Choosing,an optimal access path for asingle relation consists of using theseselectivity factors in formulas togetherwith the statistics on available accesspaths. Before this process is described, adefinition is needed . Using an indexaccess path or sorting tuples produces

tuples in the index value or sort keyorder. We say that a tuple order is anjnterestinq order if that order is onespecified by the query block's GROUP BY orORDER BY clauses.For single relations, the cheapestaccess path is obtained by evaluating thecost for each available access path (eachindex on the relation, plus a segmentscan). The costs will bedescribed below.For each such access path, a predicted costis computed along with the ordering of thetuples it will produce. Scanning along theSALARY index in ascending order, forexample, will produce some cost C and a

tuple order of SALARY (ascending). To findthe cheapest access plan fo; aTABLE 2

SITUATIORUnique index matchingan equal predicateClustered index I matchingone or more boolean factorsNon-clustered index I matchingone or more boolean factors

Clustered index I notmatching any boolean factorsNon-clustered index I notmatching any boolean factors

Segment scan

single

relation query, we need only to examine thecheapest access path which produces tuplesin each "interesting" order and the cheap-est "unordered" access path. Note that an"unordered" access path may in fact producetuples in some order, but the order is not"interesting". If there are no GROUP BY orORDER BY clauses on the query, then therewill be no interesting orderings, and thecheapest access path is the one chosen. Ifthere are GROUP BY or ORDER BY clauses,then the cost for producing that interest-ing ordering must be compared to the costof the cheapest unordered path J&U thecost of sorting QCARD tuples into theproper order. The cheapest of thesealternatives is chosen as the plan for thequery block.The cost formulas for

access pathssingle'relation

are given in TABLE 2. Theseformulas give index pages fetched plus datapages fetched plus the weighting factortimes RSI tuple retrieval calls. W is theweighting factor between page fetches andRSI calls. Some situations give severalalternative formulas depending on whetherthe set of tuples retrieved will fitentirely in the RSS buffer pool for effec-tive buffer pool per user). We assume forclustered indexes that a page remains. inthe buffer long enough for every tup1e tobe retrieved from it. For non-clusteredindexes, it is assumed that for thoserelations not fitting in the buffer, the.relation is sufficiently large with respectto the buffer size that a page fetch isrequired for every tuple retrieval.

COST FORMULASlxGlc(inaases)

1+1+w

F(predsI * (NINDX(Il + TCARD) + W x RSICARDF(predsl * (NINDXfI) + NCARD) + W * RSICARD

or F(predsl * (NINDXfI) + TCARD) + W * RSICARD ifthis number fits in the System R buffer(NINDX(Il + TCARDI + W * RSICARD

(NINDX(Il + NCARDI + W * RSICARDor (NINDX(II + TCARD) + W * RSICARD ifthis number fits in the System R buffer

TCARD/P + W * RSICARD

27


6/12

5. pccess & selecti- .&E joinsIn 1976, Blasgen and Eswaran examined a number of methods for performing2-way joins. The performance of each ofthese methods was analyzed under a varietyof relation cardinalities. Their evidenceindicates that for other than very smallrelations, one of two join methods werealways optimal or near optimal. The System

R optimizer chooses between these twomethods. We first describe these methods,and then discuss how they are extended forn-way joins. Finally we specify how thejoin order (the order in which the rela-tions are joined) is chosen. For joinsinvolving two relations, the two relationsare called the outer relation, from which atuple will be retrieved first, and them relation, from which tuples will beretrieved, possibly depending on the valuesobtained in the outer relation tuple. Apredicate which relates columns of twotables to be joined is called a &&IPredicate. The columns referenced in ajoin predicate are called && golumnq.The first join method, called thenested loops. method, uses scans, in anyorder, on the outer and inner relations.The scan on the outer relation is openedand the first tuple is retrieved. For eachouter relation tuple obtained, a scan isopened on the inner relation to retrieve,one at a time, all the tuples of the innerrelation which satisfy the join predicate.The composite tuples formed by theouter-relation-tuple / inner-relation-tuplepairs comprise the result of this join.The second join method, called mereinqscans. requires the outer and inner rela-tions to be scanned in join column order.

This implies that, along with the columnsmentioned in ORDER BY and GROUP BY, columnsof equi-join predicates (those of the formTable1 .columnl = TableZ.column2) alsodefine interesting orders. If there ismore than one join pred icate, one of themis used as the join predicate and theothers are treated as ordinary predicates.The merging scans method is only applied toequi-joins, although in principle it couldbe applied to other types of joins. If oneor both of the relations to be joined hasno indexes on the join column, it must besorted into a temporary list which isordered by the join column.The more complex logic of the merging

scan join method takes advantage of theordering on join columns to avoid rescan-ning the entire inner relation (looking fora match 1 for each tuple of the outerrelation. It does this by synchronizingthe inner and outer scans by reference tomatching join column values and by remem-ber ing where matching join groups .arelocated. Further savings occur if theinner relation is clustered on the joincolumn (as would be true if it is theoutput of a sort on the join column).

Clustering on a columnwhich have the same means that tuplesvalue in that columnare physically stored close to each otherso that one Page access will retrieveseveral tuples.N-way joins can be visualized as asequence of a-way joins. In this visuali-zation, two relations are joined together,the resulting composite relation is joined

with the third relation, etc. At each stepof the n-way join it is possible to identi-fy the outer relation (which in general iscomposite.1 and the inner relation (therelation being added to the join). Thusthe methods described above for two wayjoins are easily generalized to n-wayjoins. However, it should be emphasizedthat the first 2-way join does not have tobe completed before the second t-way joinis started. As soon as we get a compositetuple for the first t-way join, it can bejoined with tuples of the third relation toform result tuples for the 3-way join, etc.Nested ioop joins and merge scan joins maybe mixed in the same query, e.g. the firsttwo relations of a three-way join may bejoined using merge scans and the compositeresult may be joined with the third rela-tion using a nested loop join. Theintermediate composite relations arephysically stored only if a sort isrequired for the next join step. When asort of the composite relation is notspecified, the composite relation will bematerialized one tuple at a time to parti-cipate in the next join.We now consider the order in which therelations are chosen to be joined. Itshould be noted that although the cardinal-ity of the join of n relations is the sameregardless of join order, the cost ofjoining in different orders can be substan-tially different. If a query block has nrelations in its FROM list, then there aren factorial permutations of relation joinorders. The search space can be reduced byobserving that that once the first krelations are joined, the method to jointhe composite to the k+l-st relation isindependent of the order of joining thefirst k; i.e. the applicable predicates arethe same, the set of interesting orderingsis the same, the possible join methods arethe same, etc. Using this property, anefficient way to organize the search is tofind the best join order for successivelylarger subsets of tables.A heuristic is used to reduce the joinorder permutations which are considered.Wheh possible, the search is reduced byconsideration only of join orders whichhave join predicates relating the innerrelation to the other relations alreadyparticipating in the join. This means thatin joining relations tl,tt,...,tn onlythose orderings til,ti2,...,tin areexamined .in which for all j (j=2,...,n)either(1) tij has a t least one join predicate

28


7/12

with some relation tik, where k < j, or(2) for all k > j, tik has no join predi-cate with til,tit,...,or ti(j-1).This means that all joins requiring Carte-sian products are performed as late in thejoin sequence as possible. For example, ifTl.T2,T3 are the three relations in a queryblocks FROM list, and there are joinpredicates between Tl and T2 and between T2and T3 on different columns than the Tl-T2join, then the followingnot considered: permutations are.T l-T3-T2 *,T3-T l-T2

To find the optimal plan for joining nrelations, a tree of possible solutions isconstructed. As discussed above, thesearch is performed by finding the best wayto join subsets of the relations. For eachset of relations joined, the cardinality ofthe composite relation is estimated andsaved. In addition, for the unorderedjoin, and for each interesting orderobtained by the join thus far, the cheapestsolution for achieving that order and thecost of that solution are saved. A solu-tion consists of an ordered list of therelations to be joined, the join methodused for each join, and a plan indicatinghow each relation is to be accessed. Ifeither the outer composite relation or theinner relation needs to be sorted beforethe join, then that is also included in theplan. As in the single relation case,interesting orders are those listed inthe query blocks GROUP BY or ORDER BYclause. if any. Also every join columndefines an interesting order. To mini-nimize the number of different interestingorders and hence the number of solutions inthe tree, equivalence clns,ses for interest-ing orders are computed and only the bestsolution for each equivalence class issaved. For example, if there is a joinpredicate E. DNO = D.DNO and another joinpredicate D.DNO = F.DNOe then all three ofthese columns belong to the same orderequivalence class.

The search tree is constructed byiteration on the number of relations joinedso far. First, the best way is found toaccess each single relation for eachinteresting tuple ordering and for theunordered case. Next, the best way ofjoining any relation to these is found,subject to the heuristics for join order.This produces solutions for joining pairsof relations. Then the best way to joinsets of three relations is found by COnSid-eration of all sets of two relations andjoining in each third relation permitted bythe join order heuristic. For each plan tojoin a set of relations, the order of thecomposite result is kept in the tree. Thisallows consideration of a merge scan joinwhich would not require sorting the compo-site. After the complete solutions (all Ofthe relations joined together) have beenfound, the optimizer chooses the cheapestsolution which gives the required order, if

any was specified. Note that if a solutionexists with the correct order, no sort isperformed for ORDER BY or GROUP BY, unlessthe ordered solution is more expensive thanthe cheapest unordered solution plus theCOSt of. sorting into the required order.The number of solutions which must bestored is at most 2X*n (the number ofsubsets of n tables) t i.mes the number ofinteresting result orders. The computationtime to generate the tree is approximatelyproportional to the same number. Thisnumber is frequently reduced substantiallyby the join order heuristic. Our experi-ence is that typical cases require only afew thousand bytes of storage and a fewtenths of a second of 3701158 CPU time.Joins of 8 tables have been optimized in afew seconds. '

Gomputation DJ costsThe costs for joins are computed fromthe costs of the scans on each of therelations and the cardinalities. The costsof the scans on each of the relations arecomputed using the cost formulas for singlerelation access paths presented in sectionb.Let C-outerfpath 1) be the cost of scanningthe outer relation via pathl, and N be thecardinality of the outer relation tupleswhich satisfy the applicable predicates. Nis computed by:N = (product of the cardinalities of allrelations T of the join so far) *(product of the selectivity factors ofal 1 applicable predicates).Let C-innercpatht) be the cost of scanningthe inner relation, applying all applicablepredicates. Note that in the merge scanjoin this means scanning the contiguousgroup of the inner relation which corres-ponds to one join column value in the outerrelation. Then the cost of a nested loopjoin isc-nested-loop-join(pathl.path2)EC-outerfpathll + N * C-inner(path21

The cost of a merge scan join can bebroken up into the cost of actually doingthe merge plus the cost of sorting theouter or inner relations, if required. Thecost of doing the merge isC-merge(pathlspath2)=C-outer(path1) + N * C-inner(path2)For the case where the inner relationis sorted into a temporary relation none ofthe single relation access path formulas insection 4 apply. In this case the innerscan is like a segment scan except that themerging scans method makes use of the fact .that the inner relation is sorted so thatit is not necessary to scan the entireinner relation looking for a match. Forthis case we use the following formula forthe cost of the inner scan.C-innercsorted list) =TEHPPAGES/N + W*RSICARDwhere TEMPPAGES is the number of pages

29


8/12

required to hold the inner relation. Thisformula assumes that during the merge eachpage of the inner relation is fetched once.It is interesting to observe that thecost formula for nested loop joins and thecost formula for merging scans are essen-tially the same. The reason that mergingscans is sometimes better than nested loopsis that the cost of the inner scan may be

much less. After sorting, the innerrelation is clustered on the join columnwhich tends to minimize the number of pagesfetched, and it is not necessary to scanthe entire inner relation (looking for amatch) for each tuple of the outer rela-tion.The cost of sorting a relation,C-sort(path), includes the cost of retriev-ing the data using the specified accessPath, sorting the data, which may involveseveral passes, and putting the resultsinto a temporary list. Note that prior tosorting the inner table, only the localpredicates can be applied. Also, if it is

necessary to sort a composite result, theentire composite relation must be stored ina temporary relation before it can besorted. The cost of inserting the compo-site tuples into a temporary relationbefore sorting is included in C-sortfpath).

We now show how the search is done forthe example join shown in Fig. 1. First wefind all o f the reasonable access paths forsingle relations with only their localpredicates applied. The results for thisexample are shown in Fig. 2. There arethree access paths for the EHP table: anindex on DNO, an index on JOB, and asegment .scan. The interesting orders areDNO and JOB. The index on DNO provides thetuples in DNO order and the index on JOBprovides the tuples in JOB order. Thesegment 'scan access path is, for ourpurposes, unordered. For this example weassume that the index on JOB is the cheap-est path, so the segment scan path ispruned. For the DEPT relation there aretwo access paths, an index on DNO and asegment scan. We assume that the index onDNO is cheaper so the segment scan path ispruned. For the JOB relation there are twoaccess paths, an index on JOB and a segmentscan. We assume that the segment scan pathis cheaper, so both paths are saved. Theresults just described are saved in thesearch tree as shown in Fig. 3. In thefigures, the notation C(EMP.DNOl orC(E.DNOl means the cost of scanning EMP viathe DNO index, applying all predicateswhich are applicable given that tuples fromthe specified set of relations have alreadybeen fetched. The notation Ni is used torepresen t the cardinalities of the differ-ent partial results.

Next, solutions for pairs of relationsare found by joining a second relation to

JoB pzqxr--JOB5 CLERK6 TYPIST9 SALES12 MECHANIC

SELECT NAME, TITLE, SAL, DNAMEFROM EMP, DEPT, JOBWHERE TITLE=CLERKAND LOCYDENVERAND EMP.DNO=DEPT.DNOAND EMP.JOB=JOB.JOBRetrieve the name, salary, job title, and departmentname of employees who are clerks and work fordepartments in Denver.

Figure 1. JOIN exampleAccess Path for Single Relations

l Eligibl e Predicates: Local Predicates Onlyl Interesting Orderings: DNO,JOB

EM 1 %:DNO

$EMP.DNO,

1 :I., 1 Ftt

CIEMP seg. scan)

N2C(DEPT.DNO) N2C(DEPT reg. scan)X prunedJOB: index

I I

segmentJOBJOB scan onJOB

$JOB.JOB)Ndoe sag. scan)

: Figure 2.the.$esults for single relations shown inFig. 3. For each single rela tion, we findaccess paths for joining in each secondrelation for which there exists a predicateconnecting it to the first relation. Firstwe consider access path selection fornested loop joins. In this example weassume that the EMP-JOB join is cheapest byaccessing JOB on the JOB index. This is

30


9/12

likely since it can fetch directly thetuples with matching JOB, (without having toscan the entire relation). In practice thecost o f joining is estimated using theformulas given earlier and the cheapestpath is chosen. For joining the EMPrelation to the DEPT ,relation we assumethat the DHO index is cheapest. The bestaccess path for each second-level relationis combined with each of the plans in Fig.3 to form the nested loop solutions shownin Fig. 4. ,:'*,

C(EMP.DNOI C(EMP.JOBION0 order JOB orderUDEPT.ONO~ CIJOB.JOBi CIJOB rsg. scanIDNO order JOB ordef unordered

Figure 3. Search tree for single relationsNext we generate solutions using themerging scans method. As we see on theleft side of Fig. 3, there is a scan on theEHP relation in DHO order, so it is possi-ble to use this scan and the DHO scan onthe DEPT relation to do a merging scansjoin, without any sorting. Although it ispossible to do the merging join withoutsorting as just described, it might becheaper to use the JOB index on EMP, sorton DHO. and then merge. Note that we neverconsider sorting the DEPT table because thecheapest scan on that table is already inDHO order.

For merging JOB with EMP, ueconsider the JOB onlyindex on EMP since it isthe cheapest access path for EHP regardlessof order. Using the JOB index on JOB, wecan merge without any sorting. However,it might be cheapter to sort JOB using arelation scan as input to the sort and thendo the merge.Referring to Fig. 3, we see that theaccess path chosen for the' the DEPT rela-tion is the DHO index. After accessingDEPT via this index, we can merge with EMPusing the DHO index on EMP, again withoutany sorting. However, it might be cheaperto sort EMP first using the JOB index asinput to the sort and then do the merge.Both of these cases are shown in Fig. 5.As each of the costs shown in Figs. 4and 5 are computed they are compared withthe cheapest equivalent solution (sametables and same result order) found so far,and the cheapest solution is saved. Afterthis pruning. solutions for all threerelations are found. For each pair ofrelations, we find access paths for joiningin the remaining third relation. As beforewe will extend the tree using nested loopjoins and merging scans to join the thirdrelation. The search tree for threerelations is shown in Fig. 6 . Note that inone case both the composite relation andthe table being added (JOB1 are sorted.

Note also that for some of the cases. nosorts are performed at all. In thesecases, the composite result is materializedone tuple at a time and the intermediatecomposite relation is never stored. As'before, as each of the costs are computedthey are compared with the cheapest solu-

(EMP. OEPT)

h&x IndexEMP.DNO EMP.JOB

Nt

n

Nl

Index IndexDEPT.DNO DEPT.DNO

N4 N4

1 (DEPT. EMP) (JOB, EMP)Index segmentJOB.JOB 3

L N3

n

N3

index IndexEMPJOB EMP.JOB

1 % NSC(E.DNO) C(E.JOBl C(E.DNO) CULJOBI C(D.DNO) IXJJOB) C(J seg scan)+ + + +N&(D.DNO) N,C,(D.DNO) N,&J.JOB) N&J.JOBI N,C,(E.DNO) N,C,(E.JOBI NIL&JOB)DNO order JOE order DNO order JOE order DNO order JOB order unordered

Figure 4. Extended search tree for second relation (nested loop join)

31


10/12

NI

MergeEON0withDDNO

4sonE.JOBly DNOintoLl 0MerpaLlwithD.LlNO

N, l

*JOB, EMP)fIEMP, JOB1

indexE.JOB IndexD.DNO Be!lmmtSG.JCi

N3

DNO order ON0 order

M-C MergeE.JO8 EJOBwith withJ.JOB LZNSJOB order

N5JOB order

JISort JOB reg scan

\ tq JOB into L2\

Figure 5. Extended search tree for second relation (merge join)

MergeD.DNOwithLl

N3Sort JOIs.?g. scanhy JOEinto L2

MergeJ.JOBwithE.JOB

MewLZwithE.JOB

N5JOB order NsJOB order

,EMP. DEW fi

M-9I I

MergeL5 withwith D.DNOD.DNO

Figure 6. Extended search tree for third relation

32


11/12

6. Nested QueriesA query may appear as an operand of apredicate of the form "expression operatorquery". Such a query is called a NestedQuery or -a Subquery. If the operator isone of the six scalar comparisons (=, -1,>t >=, level 2 (SELECT SALARYFROM EMPLOYEEWHERE EMPLOYEE-NUMBER =level 3 (SELECT MANAGERFROM ERPLOYEEWHERE EMPLOYEE-NUMBER =X.MANAGERllThis selects names of EMPLOYEE's that earnmore than their MANAGER's MANAGER. Asbefore, for each candidate tuple of thelevel-l query block, the EMPLOYEE.MANAGERvalue is used for evaluation of the level-3query block. In this case, because thelevel 3 subquery references a level 1 value

but does not reference level 2 values, itis evaluated once for every new level 1candidate tuple. but not for every level 2candidate tuple.

33

If the value referenced by a correla-tion subquery (X.MANAGER above) is notunique in the set of candidate tuples(e.g., many employees have the same manag-er), the procedure given above will stillcause the subquery to be re-evaluated foreach occurrence of a replicated value.However, if the referenced relation isordered on the referenced column. there-evaluation can be made conditional,depending. on a test of whether or not thecurrent referenced value is the same as theone in the previous candidate tuple. Ifthey are the same, the previous evaluationresult can be used again. In some cases,it might even pay to sort the referencedrelation on the referenced column in orderto avoid re-evaluating subqueries unneces-sarily. In order to determine whether ornot the referenced column values areunique. the OPTIHIZER can use clues likeNCARD > ICARD. where NCARD is the relationcardinality and ICARD.is the index cardi-nality of an index on the referencedcolumn.

the query:SELECT NAMEFROM EMPLOYEE XWHERE SALARY > (SELECT SALARYFROM EMPLOYEEWHERE EMPLOYEE-NUMBER=X.MANAGERlThis selects names of EMPLOYEE's that earnmore than their HANAGER. Here X identifiesthe query block and relation which furnish-es the candidate tuple for the correlation.For each candidate tuple of the top levelquery block, the MANAGER value is used forevaluation of the subquery. The subqueryresult is then returned to the "SALARY >"predicate for testing acceptance of thecandidate tuple.


12/12

7. ConclusionThe System R access path selection hasbeen described for single table queries,joins, and nested queries. Evaluation workon comparing the choices made to the"right" choice is in progress, and will bedescribed in a forthcoming paper. Prelimi-nary results indicate that, although thecosts p.redicted by the optimizer are often

not accurate in absolute value, the trueoptimal path is selected in a large m ajori-ty of cases. In many cases, the orderingamong the estimated costs for 'all pathsconsidered is precisely the same as thatamong the actual measured costs.Furthermore. the cost of path selectionis not overwhelming. For a two-way join,the cost of optimization is approximatelyequivalent to between 5 and 20 databaseretrie,vals. This number becomes even moreinsignificant when such a path selector isplaced in an environment such as System R,where application programs are compiledonce and run many times. The cost of

optimization is amortized over many runs.The key contributions of this pathselector over other work in this area arethe expanded use of statistics (indexcardinality, for example), the inclusion ofCPU utilization into the cost formulas, andthe method of determining join order. Manyqueries are CPU-bound, particularly mergejoins for which temporary relations arecreated and sorts performed. The conceptof "selectivity factor" permits the optim-izer to take advantage of as many of thequery's restriction predicates as possiblein the RSS search arguments and accesspaths. By remembering "interesting order-ing" equivalence classes for joins andORDER or GROUP specifications, the optimiz-er does more bookkeeping than most pathselectors, but this additional work in manycases results in avoiding the storage andsorting of intermediate query results.Tree pruning and tree searching techniquesallow this additional bookkeeping to beperformed efficiently.

More work on validation of the optimiz-er cost formulas needs to be done, but wecan conclude from this preliminary workthat database management systems cansupport non-procedura l query languages withperformance comparable to those supporting

the current more procedural languages.Cited and General References Astrahan, M. M. et al. System R:Relational Approach to Database Management.ACM Transactions on Database Systems, Vol.1, No. 2, June 1936, pp. 97-137. Astrahan, M. M. et al. System R: ARelational Database Management System. Toappear in Computer. Bayer, R. and McCreight, E. Organiza-tion and Maintenance of Large OrderedIndices. Acta Infornatica, Vol. 1, 1972. Blasgen, M.W. and Eswaran, K.P. On theEvaluation of Queries in a Relational DataBase System. IBM Research Report RJl745.April, 1976. Chamberlin, D.D., et al. SEQUELZ: AUnified Approach to Data Definition,Manipulation, and Control. IBH Journal ofResearch and Development, Vol. 20, No. 6,Nov. 1976, pp. 560-575. Chamberlin, D.D., Gray, J.N., andTraiger, 1.1. Views, Authorization andLocking in a Relational Data Base System.ACM National Computer Conference Proceed-ings, 1975, pp. 425-430. Codd, E.F. A Relational Model of Datafor Large Shared Data Banks. ACM Communi-cations, Vol. 13. No. 6, June, 1970, pp.377-387. Date, C.J. An Introduction to Data BaseSystems, Addison-Wesley, 1975.CS> Lorie. R.A. and Wade, B.W. The Compila-tion of a Very High Level Data .Language.IBM Research Report RJ2008, May, 1977. Lorie, R.A. and Nilsson, J.F. AnAccess Specification Language for ,a Rela-tional Data Base System. IBH ResearchReport RJ2218. April, 1978.(11) Stonebraker, M.R., Wang, E., Kreps.P., and Held, G-D. The Design and Implemen-tation of INGRES. ACM Trans. on DatabaseSystems, Vol. 1, No. 3, September, 1976,PP. 189-222.(12) Toad, s. PRTV: An Efficient Implemen-tation for Large Relational Data Bases.Proc. International Conf. on. Very LargeData Bases, Framingham. Mass., September,1975. Wong, E., and Youssefi, K. Decomposi-tion - A Strategy for Query Processing. ACHTransactions on Database Systems, Vol. 1,No. 3 (Sept. 1976) pp. 223-241.(19) ZlOOf, M.H. Query by Example. Proc.AFIPS 1975 NCC, Vol. 94, AFIPS Press,Montvale, N.J., pp. 431-437. I)

34

database management sys1

Documents