querry processing

Upload: uttam-kesri

Post on 29-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Querry Processing

    1/112

    1

    Chapter 21

    Query Processing

    Transparencies

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    2/112

    2

    Chapter 21 - Objectives

    Objectives of query processing and optimization.

    Static versus dynamic query optimization.

    How a query is decomposed and semantically

    analyzed.

    How to create a R.A.T. to represent a query.

    Rules of equivalence for RA operations.

    How to apply heuristic transformation rules toimprove efficiency of a query.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    3/112

    3

    Chapter 21 - Objectives

    Types of database statistics required to estimate

    cost of operations.

    Different strategies for implementing selection.

    How to evaluate cost and size of selection.

    Different strategies for implementing join.

    How to evaluate cost and size of join.

    Different strategies for implementing projection.How to evaluate cost and size of projection.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    4/112

    4

    Chapter 21 - Objectives

    How to evaluate the cost and size of other RAoperations.

    How pipelining can be used to improve efficiency

    of queries.Difference between materialization and

    pipelining.

    Advantages of left-deep trees.

    Approaches to finding optimal executionstrategy.

    How Oracle handles QO.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    5/112

    5

    Introduction

    In network and hierarchical DBMSs, low-levelprocedural query language is generally embeddedin high-level programming language.

    Programmers responsibility to select mostappropriate execution strategy.

    With declarative languages such as SQL, userspecifies what data is required rather than how it

    is to be retrieved.Relieves user of knowing what constitutes good

    execution strategy.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    6/112

    6

    Introduction

    Also gives DBMS more control over systemperformance.

    Two main techniques for query optimization:

    heuristic rules that order operations in a query;

    comparing different strategies based on relativecosts, and selecting one that minimizes resource

    usage.

    Disk access tends to be dominant cost in queryprocessing for centralized DBMS.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    7/112

    7

    Query Processing

    Activities involved in retrieving data from the

    database.

    Aims of QP: transform query written in high-level language

    (e.g. SQL), into correct and efficient execution

    strategy expressed in low-level language

    (implementing RA);

    execute strategy to retrieve required data.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    8/112

    8

    Query Optimization

    Activity of choosing an efficient executionstrategy for processing query.

    As there are many equivalent transformations of

    same high-level query, aim of QO is to choose onethat minimizes resource usage.

    Generally, reduce total execution time of query.

    May also reduce response time of query.

    Problem computationally intractable with largenumber of relations, so strategy adopted isreduced to finding near optimum solution.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    9/112

  • 8/9/2019 Querry Processing

    10/112

    10

    Example 21.1 - Different Strategies

    Three equivalent RA queries are:

    (1) W(position='Manager') (city='London')

    (Staff.branchNo=Branch.branchNo) (Staff X Branch)

    (2) W(position='Manager') (city='London')(

    Staff Staff.branchNo=Branch.branchNo Branch)

    (3) (Wposition='Manager'(Staff)) Staff.branchNo=Branch.branchNo

    (Wcity='London' (Branch))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    11/112

    11

    Example 21.1 - Different Strategies

    Assume:

    1000 tuples in Staff; 50 tuples in Branch;

    50 Managers; 5 London branches;

    no indexes or sort keys;

    results of any intermediate operations stored

    on disk;

    cost of the final write is ignored;

    tuples are accessed one at a time.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    12/112

    12

    Example 21.1 - Cost Comparison

    Cost (in disk accesses) are:

    (1) (1000 + 50) + 2*(1000 * 50) = 101 050

    (2) 2*1000 + (1000 + 50) = 3 050

    (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160

    Cartesian product and join operations muchmore expensive than selection, and third option

    significantly reduces size of relations being joinedtogether.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    13/112

    13

    Phases of Query Processing

    QP has four main phases:

    decomposition (consisting of parsing and

    validation); optimization;

    code generation;

    execution.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    14/112

    14

    Phases of Query Processing

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    15/112

    15

    Dynamic versus Static Optimization

    Two times when first three phases of QP can becarried out:

    dynamically every time query is run;

    statically when query is first submitted.Advantages of dynamic QO arise from fact that

    information is up to date.

    Disadvantages are that performance of query is

    affected, time may limit finding optimumstrategy.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    16/112

    16

    Dynamic versus Static Optimization

    Advantages of static QO are removal of runtime

    overhead, and more time to find optimum

    strategy.

    Disadvantages arise from fact that chosenexecution strategy may no longer be optimal

    when query is run.

    Could use a hybrid approach to overcome this.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    17/112

    17

    Query Decomposition

    Aims are to transform high-level query into RAquery and check that query is syntactically andsemantically correct.

    Typical stages are: analysis,

    normalization,

    semantic analysis,

    simplification,

    query restructuring.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    18/112

    18

    Analysis

    Analyze query lexically and syntactically using

    compiler techniques.

    Verify relations and attributes exist.

    Verify operations are appropriate for object type.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    19/112

    19

    Analysis - Example

    SELECT staff_no

    FROM Staff

    WHERE position > 10;

    This query would be rejected on two grounds:

    staff_no is not defined for Staff relation

    (should be staffNo).

    Comparison >10 is incompatible with type

    position, which is variable character string.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    20/112

    20

    Analysis

    Finally, query transformed into some internalrepresentation more suitable for processing.

    Some kind of query tree is typically chosen,

    constructed as follows:Leaf node created for each base relation.

    Non-leaf node created for each intermediaterelation produced by RA operation.

    Root of tree represents query result.Sequence is directed from leaves to root.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    21/112

    21

    Example 21.1 - R.A.T.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    22/112

    22

    Normalization

    Converts query into a normalized form for easier

    manipulation.

    Predicate can be converted into one of two forms:

    Conjunctive normal form:

    (position = 'Manager' salary > 20000) (branchNo = 'B003')

    Disjunctive normal form:

    (position = 'Manager' branchNo = 'B003' )

    (salary > 20000 branchNo = 'B003')

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    23/112

    23

    Semantic Analysis

    Rejects normalized queries that are incorrectlyformulated or contradictory.

    Query is incorrectly formulated if components

    do not contribute to generation of result.Query is contradictory if its predicate cannot be

    satisfied by any tuple.

    Algorithms to determine correctness exist only

    for queries that do not contain disjunction andnegation.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    24/112

    24

    Semantic Analysis

    For these queries, could construct:

    A relation connection graph.

    Normalized attribute connection graph.

    Relation connection graph

    Create node for each relation and node for

    result. Create edges between two nodes that

    represent a join, and edges between nodes thatrepresent projection.

    If not connected, query is incorrectly formulated.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    25/112

    25

    Semantic Analysis - Normalized Attribute

    Connection Graph

    Create node for each reference to an attribute, or

    constant 0.

    Create directed edge between nodes that represent

    a join, and directed edge between attribute nodeand 0 node that represents selection.

    Weight edges a p b with value c, if it represents

    inequality condition (a e b + c); weight edges 0 p a

    with -c, if it represents inequality condition (a u c).

    If graph has cycle for which valuation sum is

    negative, query is contradictory.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    26/112

    26

    Example 21.2 - Checking Semantic Correctness

    SELECT p.propertyNo, p.street

    FROM Client c, Viewing v, PropertyForRent p

    WHERE c.clientNo = v.clientNo AND

    c.maxRent >= 500 ANDc.prefType = Flat AND p.ownerNo = CO93;

    Relation connection graph not fully connected, so

    query is not correctly formulated.Have omitted the join condition (v.propertyNo =

    p.propertyNo) .

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    27/112

    27

    Example 21.2 - Checking Semantic Correctness

    Relation Connection graph

    Normalized attribute

    connection graph

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    28/112

    28

    Example 21.2 - Checking Semantic Correctness

    SELECT p.propertyNo, p.street

    FROM Client c, Viewing v, PropertyForRent p

    WHERE c.maxRent > 500 AND

    c.clientNo = v.clientNo AND

    v.propertyNo = p.propertyNo AND

    c.prefType = Flat AND c.maxRent < 200;

    Normalized attribute connection graph has cycle

    between nodes c.maxRent and 0 with negativevaluation sum, so query is contradictory.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    29/112

    29

    Simplification

    Detects redundant qualifications,

    eliminates common sub-expressions,

    transforms query to semantically equivalent

    but more easily and efficiently computed form. Typically, access restrictions, view definitions,

    and integrity constraints are considered.

    Assuming user has appropriate access privileges,

    first apply well-known idempotency rules ofboolean algebra.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    30/112

    30

    Transformation Rules for RA Operations

    Conjunctive Selection operations can cascade into

    individual Selection operations (and vice versa).

    Wpqr(R) = Wp(Wq(Wr(R)))

    Sometimes referred to as cascade of Selection.

    WbranchNo='B003' salary>15000(Staff) =

    WbranchNo='B003'(Wsalary>15000(Staff))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    31/112

    31

    Transformation Rules for RA Operations

    Commutativity of Selection.

    Wp(Wq(R)) = Wq(Wp(R))

    For example:

    WbranchNo='B003'(Wsalary>15000(Staff)) =

    Wsalary>15000(WbranchNo='B003'(Staff))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    32/112

    32

    Transformation Rules for RA Operations

    In a sequence of Projection operations, only the

    last in the sequence is required.

    4L4

    M

    4N

    (R) = 4L

    (R)

    For example:

    4lName4branchNo, lName(Staff) = 4lName (Staff)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    33/112

    33

    Transformation Rules for RA Operations

    Commutativity of Selection and Projection.

    If predicate p involves only attributes in projection list,

    Selection and Projection operations commute:

    4Ai, , Am(Wp(R)) = Wp(4Ai, , Am(R))

    where p {A1, A2, , Am}

    For example:

    4fName, lName

    (WlName='Beech'

    (Staff)) =

    WlName='Beech'(4fName,lName(Staff))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    34/112

    34

    Transformation Rules for RA Operations

    Commutativity of Theta join (and Cartesianproduct).

    R p S = S p R

    R X S = S X R

    Rule also applies to Equijoin and Natural join.For example:

    Staff staff.branchNo=branch.branchNo Branch =

    Branch staff.branchNo=branch.branchNoStaff

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    35/112

    35

    Transformation Rules for RA Operations

    Commutativity of Selection and Theta join (orCartesian product).

    If selection predicate involves only attributes ofone of join relations, Selection and Join (orCartesian product) operations commute:

    Wp(R r S) = (Wp(R)) r S

    Wp(R X S) = (Wp(R)) X S

    where p {A1, A2, , An}

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    36/112

    36

    Transformation Rules for RA Operations

    If selection predicate is conjunctive predicate

    having form (p q), where p only involves

    attributes of R, and q only attributes of S,

    Selection and Theta join operations commute as:

    Wp q(R r S) = (Wp(R)) r (Wq(S))

    Wp q(R X S) = (Wp(R)) X (Wq(S))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    37/112

    37

    Transformation Rules for RA Operations

    For example:

    Wposition='Manager' city='London'(Staff

    Staff.branchNo=Branch.branchNo Branch) =(Wposition='Manager'(Staff)) Staff.branchNo=Branch.branchNo

    (Wcity='London' (Branch))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    38/112

    38

    Transformation Rules for RA Operations

    Commutativity of Projection and Theta join (orCartesian product).

    If projection list is of form L = L1 L2, where L1only has attributes of R, and L2 only hasattributes of S, provided join condition onlycontains attributes of L, Projection and Thetajoin commute:

    4L1L2(R r S) = (4L1(R)) r (4L2(S))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    39/112

    39

    Transformation Rules for RA Operations

    If join condition contains additional attributes

    not in L (M = M1 M2 where M1 only has

    attributes of R, and M2 only has attributes of S),

    a final projection operation is required:

    4L1L2(R r S) = 4L1L2( (4L1M1(R)) r(4L2M2(S)))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    40/112

    40

    Transformation Rules for RA Operations

    For example:

    4position,city,branchNo(Staff Staff.branchNo=Branch.branchNo Branch)

    =

    (4

    position, branchNo(Staff)) Staff.branchNo=Branch.branchNo (4city, branchNo (Branch))

    and using the latter rule:

    4position, city

    (StaffStaff.branchNo=Branch.branchNo

    Branch) =

    4position, city ((4position, branchNo(Staff))

    Staff.branchNo=Branch.branchNo ( 4city, branchNo (Branch)))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    41/112

    41

    Transformation Rules for RA Operations

    Commutativity of Union and Intersection (but

    not set difference).

    R

    S = S

    RR S = S R

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    42/112

    42

    Transformation Rules for RA Operations

    Commutativity of Selection and set operations

    (Union, Intersection, and Set difference).

    Wp(R

    S) =W

    p(S) W

    p(R)Wp(R S) = Wp(S) Wp(R)

    Wp(R - S) = Wp(S) - Wp(R)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    43/112

    43

    Transformation Rules for RA Operations

    Commutativity of Projection and Union.

    4L(R S) = 4L(S) 4L(R)

    Associativity of Union and Intersection (but not

    Set difference).

    (R S) T = S (R T)

    (R S) T = S (R T)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    44/112

    44

    Transformation Rules for RA Operations

    Associativity of Theta join (and Cartesian product).

    Cartesian product and Natural join are always

    associative:

    (R S) T = R (S T)

    (R X S) X T = R X (S X T)

    If join condition q involves attributes only from S

    and T, then Theta join is associative:(R p S) q r T = R p r (S q T)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    45/112

    45

    Transformation Rules for RA Operations

    For example:

    (Staff Staff.staffNo=PropertyForRent.staffNo PropertyForRent)

    ownerNo=Owner.ownerNo staff.lName=Owner.lName Owner =

    Staff staff.staffNo=PropertyForRent.staffNo staff.lName=lName

    (PropertyForRent ownerNo Owner)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    46/112

    46

    Example 21.3 Use of Transformation Rules

    For prospective renters of flats, find propertiesthat match requirements and owned by CO93.

    SELECT p.propertyNo, p.street

    FROM Client c, Viewing v, PropertyForRent pWHERE c.prefType = Flat AND

    c.clientNo = v.clientNo AND

    v.propertyNo = p.propertyNo AND

    c.maxRent >= p.rent ANDc.prefType = p.type AND

    p.ownerNo = CO93;

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    47/112

    47

    Example 21.3 Use of Transformation Rules

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    48/112

    48

    Example 21.3 Use of Transformation Rules

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    49/112

    49

    Example 21.3 Use of Transformation Rules

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    50/112

    50

    Heuristical Processing Strategies

    Perform Selection operations as early as possible.

    Keep predicates on same relation together.

    Combine Cartesian product with subsequent

    Selection whose predicate represents join

    condition into a Join operation.

    Use associativity of binary operations to

    rearrange leaf nodes so leaf nodes with mostrestrictive Selection operations executed first.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    51/112

    51

    Heuristical Processing Strategies

    Perform Projection as early as possible.

    Keep projection attributes on same relation together.

    Compute common expressions once.

    If common expression appears more than once, and

    result not too large, store result and reuse it when

    required.

    Useful when querying views, as same expression is usedto construct view each time.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    52/112

    52

    Cost Estimation for RA Operations

    Many different ways of implementing RAoperations.

    Aim of QO is to choose most efficient one.

    Use formulae that estimate costs for a number ofoptions, and select one with lowest cost.

    Consider only cost of disk access, which is usuallydominant cost in QP.

    Many estimates are based on cardinality of therelation, so need to be able to estimate this.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    53/112

    53

    Database Statistics

    Success of estimation depends on amount and

    currency of statistical information DBMS holds.

    Keeping statistics current can be problematic.

    If statistics updated every time tuple is changed,this would impact performance.

    DBMS could update statistics on a periodic basis,

    for example nightly, or whenever the system is

    idle.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    54/112

    54

    Typical Statistics for Relation R

    nTuples(R) - number of tuples in R.

    bFactor(R) - blocking factor of R.

    nBlocks(R) - number of blocks required to store R:

    nBlocks(R) = [nTuples(R)/bFactor(R)]

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    55/112

    55

    Typical Statistics for Attribute A of Relation R

    nDistinctA(R) - number of distinct values that

    appear for attribute A in R.

    minA(R),maxA(R)

    minimum and maximum possible values

    for attribute A in R.

    SCA(R) - selection cardinality of attribute A in R.

    Average number of tuples that satisfy anequality condition on attribute A.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    56/112

    56

    Statistics for Multilevel Index I on Attribute A

    nLevelsA(I) - number of levels in I.

    nLfBlocksA(I) - number of leaf blocks in I.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    57/112

    57

    Selection Operation

    Predicate may be simple or composite.

    Number of different implementations, dependingon file structure, and whether attribute(s)

    involved are indexed/hashed.Main strategies are:

    Linear Search (Unordered file, no index).

    Binary Search (Ordered file, no index).

    Equality on hash key.Equality condition on primary key.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    58/112

    58

    Selection Operation

    Inequality condition on primary key.

    Equality condition on clustering (secondary)

    index.

    Equality condition on a non-clustering(secondary) index.

    Inequality condition on a secondary B+-tree

    index.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    59/112

    59

    Estimating Cardinality of Selection

    Assume attribute values are uniformly distributed

    within their domain and attributes are

    independent.

    nTuples(S) = SCA(R)

    For any attribute B { A of S, nDistinctB(S) =

    nTuples(S) if nTuples(S) < nDistinctB(R)/2

    nDistinctB(R) if nTuples(S) > 2*nDistinctB(R)[(nTuples(S) + nDistinctB(R))/3] otherwise

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    60/112

    60

    Linear Search (Ordered File, No Index)

    May need to scan each tuple in each block tocheck whether it satisfies predicate.

    For equality condition on key attribute, cost

    estimate is:[nBlocks(R)/2]

    For any other condition, entire file may need to besearched, so more general cost estimate is:

    nBlocks(R)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    61/112

    61

    Binary Search (Ordered File, No Index)

    If predicate is of form A = x, and file is orderedon key attribute A, cost estimate:

    [log2(nBlocks(R))]

    Generally, cost estimate is:[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1

    First term represents cost of finding first tupleusing binary search.

    Expect there to be SCA(R) tuples satisfyingpredicate.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    62/112

    62

    Equality of Hash Key

    If attribute A is hash key, apply hashing

    algorithm to calculate target address for tuple.

    If there is no overflow, expected cost is 1.

    If there is overflow, additional accesses may benecessary.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    63/112

    63

    Equality Condition on Primary Key

    Can use primary index to retrieve single record

    satisfying condition.

    Need to read one more block than number of

    index accesses, equivalent to number of levels inindex, so estimated cost is:

    nLevelsA(I) + 1

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    64/112

    64

    Inequality Condition on Primary Key

    Can first use index to locate record satisfying

    predicate (A = x).

    Provided index is sorted, records can be found by

    accessing all records before/after this one.Assuming uniform distribution, would expect

    half the records to satisfy inequality, so estimated

    cost is:

    nLevelsA(I) + [nBlocks(R)/2]

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    65/112

    65

    Equality Condition on Clustering Index

    Can use index to retrieve required records.

    Estimated cost is:

    nLevelsA(I) + [SC

    A(R)/bFactor(R)]

    Second term is estimate of number of blocks that

    will be required to store number of tuples that

    satisfy equality condition, represented as SCA(R).

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    66/112

    66

    Equality Condition on Non-Clustering Index

    Can use index to retrieve required records.

    Have to assume that tuples are on different

    blocks (index is not clustered this time), so

    estimated cost becomes:

    nLevelsA(I) + [SCA(R)]

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    67/112

    67

    Inequality Condition on a Secondary B+-

    Tree Index

    From leaf nodes of tree, can scan keys from

    smallest value up to x (< or or >=).

    Assuming uniform distribution, would expecthalf the leaf node blocks to be accessed and, via

    index, half the file records to be accessed.

    Estimated cost is:

    nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2]

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    68/112

    68

    Composite Predicates - Conjunction

    without Disjunction

    May consider following approaches:

    - If one attribute has index or is ordered, can use one of

    above selection strategies. Can then check each retrieved

    record.

    - For equality on two or more attributes, with composite

    index (or hash key) on combined attributes, can search

    index directly.

    - With secondary indexes on one or more attributes

    (involved only in equality conditions in predicate), could

    use record pointers if exist.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    69/112

    69

    Composite Predicates - Selections with

    Disjunction

    If one term contains an (OR), and term requires

    linear search, entire selection requires linear

    search.

    Only if index or sort order exists on every termcan selection be optimized by retrieving records

    that satisfy each condition and applying union

    operator.

    Again, record pointers can be used if they exist.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    70/112

    70

    Join Operation

    Main strategies for implementing join:

    Block Nested Loop Join.

    Indexed Nested Loop Join.

    Sort-Merge Join.

    Hash Join.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    71/112

  • 8/9/2019 Querry Processing

    72/112

    72

    Estimating Cardinality of Join

    If assume uniform distribution, can estimate for

    Equijoins with a predicate (R.A = S.B) as follows:

    If A is key of R: nTuples(T) e nTuples(S)

    If B is key of S: nTuples(T) e nTuples(R)

    Otherwise, could estimate cardinality of join as:

    nTuples(T) = SCA(R)*nTuples(S) or

    nTuples(T) = SCB(S)*nTuples(R)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    73/112

    73

    Block Nested Loop Join

    Simplest join algorithm is nested loop that joinstwo relations together a tuple at a time.

    Outer loop iterates over each tuple in R, andinner loop iterates over each tuple in S.

    As basic unit of reading/writing is a disk block,better to have two extra loops that processblocks.

    E

    stimated cost of this approach is:nBlocks(R) + (nBlocks(R) * nBlocks(S))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    74/112

    74

    Block Nested Loop Join

    Could read as many blocks as possible of smallerrelation, R say, into database buffer, saving oneblock for inner relation and one for result.

    New cost estimate becomes:

    nBlocks(R) + [nBlocks(S)*(nBlocks(R)/(nBuffer-2))]

    If can read all blocks of R into the buffer, thisreduces to:

    nBlocks(R) + nBlocks(S)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    75/112

    75

    Indexed Nested Loop Join

    If have index (or hash function) on joinattributes of inner relation, can use indexlookup.

    For each tuple in R, use index to retrievematching tuples of S.

    Cost of scanning R is nBlocks(R), as before.

    Cost of retrieving matching tuples in S dependson type of index and number of matching tuples.

    If join attribute A in S is PK, cost estimate is:

    nBlocks(R) + nTuples(R)*(nlevelsA(I) + 1)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    76/112

    76

    Sort-Merge Join

    For Equijoins, most efficient join is when both

    relations are sorted on join attributes.

    Can look for qualifying tuples merging relations.

    May need to sort relations first.

    Now tuples with same join value are in order.

    If assume join is *:* and each set of tuples with

    same join value can be held in database buffer at

    same time, then each block of each relation need

    only be read once.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    77/112

    77

    Sort-Merge Join

    Cost estimate for the sort-merge join is:

    nBlocks(R) + nBlocks(S)

    If a relation has to be sorted, R say, add:

    nBlocks(R)*[log2(nBlocks(R)]

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    78/112

    78

    Hash Join

    For Natural or Equijoin, hash join may be used.

    Idea is to partition relations according to somehash function that provides uniformity andrandomness.

    Each equivalent partition should hold samevalue for join attributes, although it may holdmore than one value.

    Cost estimate of hash join as:

    3(nBlocks(R) + nBlocks(S))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    79/112

  • 8/9/2019 Querry Processing

    80/112

    80

    Estimating Cardinality of Projection

    When projection contains key, cardinality is:

    nTuples(S) = nTuples(R)

    If projection consists of a single non-key

    attribute, estimate is:

    nTuples(S) = SCA(R)

    Otherwise, could estimate cardinality as:

    nTuples(S) e min(nTuples(R), 4im

    =1(nDistinctai(R)))

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    81/112

    81

    Duplicate Elimination using Sorting

    Sort tuples of reduced relation using all

    remaining attributes as sort key.

    Duplicates will now be adjacent and can be

    removed easily. Estimated cost of sorting is:

    nBlocks(R)*[log2(nBlocks(R))].

    Combined cost is:

    nBlocks(R) + nBlocks(R)*[log2(nBlocks(R))]

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    82/112

    82

    Duplicate Elimination using Hashing

    Two phases: partitioning and duplicate

    elimination.

    In partitioning phase, for each tuple in R,

    remove unwanted attributes and apply hashfunction to combination of remaining attributes,

    and write reduced tuple to hashed value.

    Two tuples that belong to different partitions are

    guaranteed not to be duplicates.

    Estimated cost is: nBlocks(R) + nB

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    83/112

    83

    Set Operations

    Can be implemented by sorting both relations on

    same attributes, and scanning through each of

    sorted relations once to obtain desired result.

    Could use sort-merge join as basis. Estimated cost in all cases is:

    nBlocks(R) + nBlocks(S) +

    nBlocks(R)*[log2(nBlocks(R))] +nBlocks(S)*[log2(nBlocks(S))]

    Could also use hashing algorithm.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    84/112

    84

    Estimating Cardinality of Set Operations

    As duplicates are eliminated when performing

    Union, difficult to estimate cardinality, but can

    give an upper and lower bound as:

    max(nTuples(R), nTuples(S)) e nTuples(T) e

    nTuples(R) + nTuples(S)

    For Set Difference, can also give upper and lower

    bound:

    0 e nTuples(T) e nTuples(R)

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    85/112

    85

    Aggregate Operations

    SELECT AVG(salary)

    FROM Staff;

    To implement query, could scan entire Staff

    relation and maintain running count of number

    of tuples read and sum of all salaries.

    Easy to compute average from these two running

    counts.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    86/112

    86

    Aggregate Operations

    SELECT AVG(salary)

    FROM Staff

    GROUP BY branchNo;

    For grouping queries, can use sorting or hashing

    algorithms similar to duplicate elimination.

    Can estimate cardinality of result using

    estimates derived earlier for selection.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    87/112

    87

    Enumeration of Alternative Strategies

    Fundamental to efficiency of QO is the searchspace of possible execution strategies and theenumeration algorithm used to search this space.

    Query with 2 joins gives 12 join orderings:R (S T) R (T S) (S T) R (T S) R

    S (R T) S (T R) (R T) S (T R) S

    T (R S) T (S R) (R S) T (S R) T

    With n relations, (2(n 1))!/(n 1)! orderings.

    Ifn = 4 this is 120; ifn = 10 this is > 176 billion.

    Compounded by different selection/join methods. Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    88/112

    88

    Pipelining

    Materialization - output of one operation isstored in temporary relation for processing bynext.

    Could also pipeline results of one operation toanother without creating temporary relation.

    Known as pipelining or on-the-fly processing.

    Pipelining can save on cost of creatingtemporary relations and reading results back inagain.

    Generally, pipeline is implemented as separateprocess or thread.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    89/112

    89

    Types of Trees

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    90/112

    90

    Pipelining

    With linear trees, relation on one side of eachoperator is always a base relation.

    However, as need to examine entire inner relationfor each tuple of outer relation, inner relationsmust always be materialized.

    This makes left-deep trees appealing as innerrelations are always base relations.

    Reduces search space for optimum strategy, andallows QO to use dynamic processing.

    Not all execution strategies are considered.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    91/112

    91

    Physical Operators & Strategies

    Term physical operator refers to specific

    algorithm that implements a logical operation,

    such as selection or join.

    For example, can use sort-merge join toimplement the join operation.

    Replacing logical operations in a R.A.T. with

    physical operators produces an execution strategy

    (or query evaluation plan or access plan).

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    92/112

    92

    Physical Operators & Strategies

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    93/112

    93

    Reducing the Search Space

    Restriction 1: Unary operations processed on-the-fly: selections processed as relations areaccessed for first time; projections processed asresults of other operations are generated.

    Restriction 2: Cartesian products are neverformed unless query itself specifies one.

    Restriction 3: Inner operand of each join is abase relation, never an intermediate result. Thisuses fact that with left-deep trees inner operand is

    a base relation and so already materialized.Restriction 3 excludes many alternative strategies

    but significantly reduces number to be considered.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    94/112

    94

    Dynamic Programming

    Enumeration of left-deep trees using dynamic

    programming first proposed for System R QO.

    Algorithm based on assumption that the cost

    model satisfies principle of optimality. Thus, to obtain optimal strategy for query with n

    joins, only need to consider optimal strategies for

    subexpressions with (n 1) joins and extend those

    strategies with an additional join. Remainingsuboptimal strategies can be discarded.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    95/112

    95

    Dynamic Programming

    To ensure some potentially useful strategies are

    not discarded algorithm retains strategies with

    interesting orders: an intermediate result has an

    interesting order if it is sorted by a final ORDE

    RBY attribute, GROUP BY attribute, or any

    attributes that participate in subsequent joins.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    96/112

    96

    Dynamic Programming

    SELECT p.propertyNo, p.street

    FROM Client c, Viewing v, PropertyForRent p

    WHERE c.maxRent < 500 AND

    c.clientNo = v.clientNo ANDv.propertyNo = p.propertyNo;

    Attributes c.clientNo, v.clientNo, v.propertyNo,and p.propertyNo are interesting.

    If any intermediate result is sorted on any of theseattributes, then corresponding partial strategymust be included in search.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    97/112

    97

    Dynamic Programming

    Algorithm proceeds from the bottom up andconstructs all alternative join trees that satisfy therestrictions above, as follows:

    Pass 1: Enumerate the strategies for each baserelation using a linear search and all availableindexes on the relation. These partial strategiesare partitioned into equivalence classes based onany interesting orders. An additional equivalence

    class is created for the partial strategies with nointeresting order.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    98/112

    98

    Dynamic Programming

    For each equivalence class, strategy with lowestcost is retained for consideration in next pass.

    Do not retain equivalence class with no interestingorder if its lowest cost strategy is not lower thanall other strategies.

    For a given relation R, any selections involvingonly attributes of R are processed on-the-fly.Similarly, any attributes of R that are not part of

    the SELECT clause and do not contribute to anysubsequent join can be projected out at this stage(restriction 1 above).

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    99/112

    99

    Dynamic Programming

    Pass 2: Generate all 2-relation strategies byconsidering each strategy retained after Pass 1 asouter relation, discarding any Cartesian productsgenerated (restriction 2 above). Again, any on-the-

    fly processing is performed and lowest coststrategy in each equivalence class is retained.

    Pass n: Generate all n-relation strategies byconsidering each strategy retained after Pass (n

    1) as outer relation, discarding any Cartesianproducts generated. After pruning, now havelowest overall strategy for processing the query.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    100/112

    100

    Dynamic Programming

    Although algorithm is still exponential, there arequery forms for which it only generates O(n3)strategies, so for n = 10 the number is 1,000, whichis significantly better than the 176 billion different

    join orders noted earlier.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    101/112

    101

    Semantic Query Optimization

    Based on constraints specified on the databaseschema to reduce the search space.

    For example, a constraint states that staff cannotsupervise more than 100 properties, so any query

    searching for staff who supervise more than 100properties will produce zero rows. Now consider:

    CREATE ASSERTION ManagerSalary

    CHECK(salary > 20000 AND position = Manager)

    SELECT s.staffNo, fName, lName, propertyNoFROM Staff s, PropertyForRent p

    WHERE s.staffNo = p.staffNo AND

    position = Manager; Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    102/112

    102

    Semantic Query Optimization

    Can rewrite this query as:

    SELECT s.staffNo, fName, lName, propertyNo

    FROM Staff s, PropertyForRent p

    WHERE s.staffNo = p.staffNo AND

    salary > 20000 AND position = Manager;

    Additional predicate may be very useful if only

    index for Staff is a B+-tree on the salary attribute.

    However, additional predicate would complicatequery if no such index existed.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    103/112

    103

    Query Optimization in Oracle

    Oracle supports two approaches to queryoptimization: rule-based and cost-based.

    Rule-based

    15 rules, ranked in order of efficiency. Particularaccess path for a table only chosen if statementcontains a predicate or other construct thatmakes that access path available.

    Score assigned to each execution strategy usingthese rankings and strategy with best (lowest)score selected.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    104/112

    104

    QO in Oracle Rule-Based

    When 2 strategies have same score, tie-break

    resolved by making decision based on order in

    which tables occur in the SQL statement.

    Pearson Education Limited 1995,

    2005

  • 8/9/2019 Querry Processing

    105/112

    105

    QO in Oracle Rule-based: Example

    SELECT propertyNo

    FROM PropertyForRent

    WHERE rooms > 7 AND city = London

    Single-column access path using index on city from

    WHERE condition (city = London). Rank 9. Unbounded range scan using index on rooms from

    WHERE condition (rooms > 7). Rank 11.

    Full table scan - rank 15.

    Although there is index on propertyNo, column does notappear in WHERE clause and so is not considered byoptimizer.

    Based on these paths, rule-based optimizer will choose touse index based on city column.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    106/112

    106

    QO in Oracle Cost-Based

    To improve QO, Oracle introduced cost-basedoptimizer in Oracle 7, which selects strategy thatrequires minimal resource use necessary toprocess all rows accessed by query (avoiding

    above tie-break anomaly).User can select whether minimal resource usage

    is based on throughputor based on response time,by setting the OPTIMIZER_MODE initialization

    parameter.Cost-based optimizer also takes into

    consideration hints that the user may provide.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    107/112

    107

    QO in Oracle Statistics

    Cost-based optimizer depends on statistics for alltables, clusters, and indexes accessed by query.

    Users responsibility to generate these statisticsand keep them current.

    Package DBMS_STATS can be used to generateand manage statistics.

    Whenever possible, Oracle uses a parallel methodto gather statistics, although index statistics arecollected serially.EXECUTE

    DBMS_STATS.GATHER_SCHEMA_STATS(Manager);

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    108/112

    108

    QO in Oracle Histograms

    Previously made assumption that data values

    within columns of a table are uniformly

    distributed.

    Histogram of values and their relativefrequencies gives optimizer improved selectivity

    estimates in presence of non-uniform

    distribution.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    109/112

    109

    QO in Oracle Histograms

    (a) uniform distribution of rooms; (b) actual non-uniformdistribution.

    (a) can be stored compactly as low value (1) and high value

    (10), and as total count of all frequencies (in this case, 100).

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    110/112

    110

    QO in Oracle Histograms

    Histogram is data structure that can improveestimates of number of tuples in result.

    Two types of histogram:

    width-balanced histogram, which divides data into a

    fixed number of equal-width ranges (called buckets)each containing count of number of values fallingwithin that bucket;

    height-balanced histogram, which placesapproximately same number of values in each bucketso that end points of each bucket are determined byhow many values are in that bucket.

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    111/112

    111

    QO in Oracle Histograms

    (a) width-balanced for rooms with 5 buckets. Each bucket

    of equal width with 2 values (1-2, 3-4, etc.)

    (b) height-balanced height of each column is 20 (100/5).

    Pearson Education Limited 1995, 2005

  • 8/9/2019 Querry Processing

    112/112

    QO in Oracle Viewing Execution Plan