chapter 15 algorithms for query processing and optimization

35
ICS 424 - 01 (072) Query Processing and Opti mization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr. Muhammad Shafique

Upload: powa

Post on 04-Jan-2016

84 views

Category:

Documents


4 download

DESCRIPTION

Chapter 15 Algorithms for Query Processing and Optimization. ICS 424 Advanced Database Systems Dr. Muhammad Shafique. Outline. Introduction Processing a query SQL queries and relational algebra Implementing basic query operations Heuristics-based query optimization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

1

Chapter 15

Algorithms for Query Processing and Optimization

ICS 424 Advanced Database Systems

Dr. Muhammad Shafique

Page 2: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

2

Outline

• Introduction• Processing a query• SQL queries and relational algebra• Implementing basic query operations• Heuristics-based query optimization• Overview of query optimization in Oracle

Page 3: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

3

Material Covered from Chapter 15

• Pages 537, 538, 539• Section 15.1• Section 15.2• Section 15.6• Section 15.7• Section 15.9

Page 4: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

4

Introduction to Query Processing

• Query optimization• The process of choosing a suitable execution strategy

for processing a query.

• Two internal representations of a query:• Query Tree

• Query Graph

Page 5: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

5

Background Review

• DDL compiler

• DML compiler

• Runtime database processor

• System catalog

Page 6: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

6

Processing a Query

• Tasks in processing a high-level query1. Scanner scans the query and identifies the language tokens

2. Parser checks syntax of the query

3. The query is validated by checking that all attribute names and relation names are valid

4. An intermediate internal representation for the query is created (query tree or query graph)

5. Query execution strategy is developed

6. Query optimizer produces an execution plan

7. Code generator generates the object code

8. Runtime database processor executes the code

• Query processing and query optimization

Page 7: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

7

Processing a Query

• Typical steps in processing a high-level query1. Query in a high-level query language like SQL

2. Scanning, parsing, and validation

3. Intermediate-form of query like query tree

4. Query optimizer

5. Execution plan

6. Query code generator

7. Object-code for the query

8. Run-time database processor

9. Results of query

Page 8: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

8

Page 9: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

9

SQL Queries and Relational Algebra

• SQL query is translated into an equivalent extended relational algebra expression --- represented as a query tree

• In order to transform a given query into a query tree, the query is decomposed into query blocks• Query block:

• The basic unit that can be translated into the algebraic operators and optimized.

• A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.

• The query optimizer chooses an execution plan for each block

Page 10: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

10

COMPANY Relational Database Schema (1)

Page 11: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

11

COMPANY Relational Database Schema (2)

Page 12: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

12

SQL Queries and Relational Algebra (1)

• Example

SELECT Lname, Fname

FROM EMPLOYEE

WHERE Salary > ( SELECT MAX(Salary)

FROM EMPLOYEE

WHERE Dno = 5 )

• Inner block and outer block

Page 13: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

13

Translating SQL Queries into Relational Algebra

SELECT LNAME, FNAMEFROM EMPLOYEEWHERE SALARY > ( SELECT MAX (SALARY)

FROM EMPLOYEEWHERE DNO = 5);

SELECT MAX (SALARY)FROM EMPLOYEEWHERE DNO = 5

SELECT LNAME, FNAME

FROM EMPLOYEE

WHERE SALARY > C

πLNAME, FNAME (σSALARY>C(EMPLOYEE)) ℱMAX SALARY (σDNO=5 (EMPLOYEE))

Page 14: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

14

SQL Queries and Relational Algebra (2)

• Uncorrelated nested queries Vs Correlated nested queries• Example

Retrieve the name of each employee who works on all the projects controlled by department number 5.

SELECT FNAME, LNAMEFROM EMPLOYEEWHERE ( (SELECT PNO FROM WORKS_ON WHERE SSN=ESSN) CONTAINS (SELECT PNUMBER FROM PROJECT WHERE DNUM=5) )

Page 15: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

15

SQL Queries and Relational Algebra (3)

• ExampleFor every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate.

• SQL query:SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE

FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS EWHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND

P.PLOCATION=‘STAFFORD’;

• Relation algebra:

PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))

DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

Page 16: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

16

SQL Queries and Relational Algebra (4)

Page 17: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

17

Implementing Basic Query Operations

• An RDBMS must provide implementation(s) for all the required operations including relational operators and more

• External sorting• Sort-merge strategy

• Sorting phase• Number of file blocks (b)

• Number of available buffers (nB)

• Runs --- (b / nB)

• Merging phase --- passes• Degree of merging --- the number of runs that are merged

together in each pass

Page 18: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

18

Algorithms for External Sorting (1)

• External sorting:• Refers to sorting algorithms that are suitable for large files

of records stored on disk that do not fit entirely in main memory, such as most database files.

• Sort-Merge strategy:• Starts by sorting small subfiles (runs) of the main file and

then merges the sorted runs, creating larger sorted subfiles that are merged in turn.

Page 19: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

19

Algorithms for External Sorting (2)

Page 20: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

20

Algorithms for External Sorting (3)

• AnalysisNumber of file blocks = b

Number of initial runs = nR

Available buffer space = nB

Sorting phase: nR = (b/nB)

Degree of merging: dM = Min (nB-1, nR);

Number of passes: nP = (logdM(nR))

Number of block accesses: (2 * b) + (2 * b * (logdM(nR)))

• Example done in the class

Page 21: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

21

Implementing Basic Query Operations (cont.)

• Estimates of selectivity• Selectivity is the ratio of the number of tuples that satisfy the

condition to the total number of tuples in the relation.

• SELECT ( ) operator implementation1. Linear search

2. Binary search

3. Using a primary index (or hash key)

4. Using primary index to retrieve multiple records

5. Using clustering index to retrieve multiple records

6. Using a secondary index on an equality comparison

7. Conjunctive selection using an individual index

8. Conjunctive selection using a composite index

9. Conjunctive selection by intersection of record pointers

Page 22: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

22

Implementing Basic Query Operations (cont.)

• JOIN operator implementation1. Nested-loop join

2. Sort-merge join

3. Hash join• Partition Hash join

• Hybrid hash join

• PROJECT operator implementation• Set operator implementation• Implementing Aggregate operators/functions• Implementing OUTER JOIN

Page 23: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

23

Page 24: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

24

Buffer Space and Join performanceIn the nested-loop join, it makes a difference which file is chosen for the outer loop and which for the inner loop. If EMPLOYEE is used for the outer loop, each block of EMPLOYEE is read once, and the entire DEPARTMENT file (each of its blocks) is read once for each time we read in ( nB - 2) blocks of the EMPLOYEE file. We get the following:

Total number of blocks accessed for outer file = bE

Number of times ( nB - 2) blocks of outer file are loaded = bE/ nB – 2

Total number of blocks accessed for inner file = bD * bE/ nB – 2

Hence, we get the following total number of block accesses:

bE + ( bE/ nB – 2 * bD) = 2000 + ( (2000/5) * 10) = 6000 blocks

On the other hand, if we use the DEPARTMENT records in the outer loop, by symmetry we get the following total number of block accesses:

bD + ( bD/ nB – 2 * bE) = 10 + ((10/5) * 2000) = 4010 blocks

Page 25: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

25

Implementing Basic Query Operations (cont.)

• Combining operations using pipelining• Temporary files based processing

• Pipelining or stream-based processing

• Example: consider the execution of the following query

list of attributes( ( c1(R) ( c2 (S))

Page 26: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

26

General Transformation Rules for Relational Algebra Operations

1. Cascade of : A conjunctive selection condition can be broken up into a cascade (that is, a sequence) of individual operations: C1 AND C2 AND ….AND Cn (R) ≡ C1 (C2( …(Cn(R))…)

2. Commutativity of : The operation is commutative: C1(C2(R)) ≡ C2(C1(R))

3. Cascade of : In a cascade (sequence) of operations, all but the last one can be ignored

4. Commuting with : If the selection condition c involves only those attributes A1, ..., An in the projection list, the two operations can be commuted

• And more …

Page 27: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

27

Heuristic-Based Query Optimization• Outline of heuristic algebraic optimization algorithm

1. Break up SELECT operations with conjunctive conditions into a cascade of SELECT operations

2. Using the commutativity of SELECT with other operations, move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition

3. Using commutativity and associativity of binary operations, rearrange the leaf nodes of the tree

4. Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a JOIN operation, if the condition represents a join condition

5. Using the cascading of PROJECT and the commuting of PROJECT with other operations, break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed

6. Identify sub-trees that represent groups of operations that can be executed by a single algorithm

Page 28: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

28

Heuristic-Based Query Optimization: Example• Query

"Find the last names of employees born after 1957 who work on a project named ‘Aquarius’."

• SQLSELECT LNAME   

FROM EMPLOYEE, WORKS_ON, PROJECT   

WHERE PNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE.‘1957-12-31’;

Page 29: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

29

Page 30: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

30

Page 31: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

31

Page 32: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

32

Page 33: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

33

Page 34: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

34

Overview of Query Optimization in Oracle

• Rule-based query optimization: the optimizer chooses execution plans based on heuristically ranked operations. • May be phased out

• Cost-based query optimization: the optimizer examines alternative access paths and operator algorithms and chooses the execution plan with lowest estimate cost.• The query cost is calculated based on the estimated usage of resources such as

I/O, CPU and memory needed.

• Application developers could specify hints to the ORACLE query optimizer.• application developer might know more information about the data.

• SELECT /*+ ...hint... */ [rest of query] • SELECT /*+ index(t1 t1_abc) index(t2 t2_abc) */ COUNT(*)

FROM t1, t2WHERE t1.col1 = t2.col1;

Page 35: Chapter 15 Algorithms for Query Processing  and Optimization

ICS 424 - 01 (072) Query Processing and Optimization

35

Summary

• Background review• Processing a query• SQL queries and relational algebra• Implementing basic query operations• Heuristics-based query optimization• Overview of query optimization in Oracle