query processing

28
Query Processing Query Processing is the procedure of transforming a high level query into a correct and efficient execution plan expressed in low level language that performs the required retrievals and manipulations in the database.

Upload: zankhana

Post on 18-Nov-2014

744 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Query Processing

Query ProcessingQuery Processing is the procedure of

transforming a high level query into a correct and efficient execution plan expressed in low level language that performs the required retrievals and manipulations in the database.

Page 2: Query Processing

Correct Query

Execution plan

Algebraic expression

Code to execute query

Query Result

Join manger

Runtime database processor

Query Code Generator

Query optimizer

Query Decomposer

Syntax checking and verification by the parser portion of query processor whether relations and attributes used in the query are defined in the database

Scanning, parsing, Validating

High- Level query language (standard form), for example,SQL

General query

Translation of relational calculus query to a relational algebra query using equivalency rules, idem-potency rules, transformation rules etc. from the global database dictionary

Transform into

Performing optimization by substituting equivalent expression for those in the query

generating code for the queries

Estimation of each access plan, selecting optimal plan and execution

Action

Action

Action

Action

Action

Main Database

Database Catalog

Statistical data

Estimation Formulas

Cost Module

Page 3: Query Processing

Query ProcessingAs shown in the figure_

The user gives the query request, which may be QBE or other form.

This is first transformed into standard high level query language, such as SQL.

This SQL query is read by the syntax analyzer so that it can be checked for correctness.

The correct query is then passes to the query decomposer. Which will gives the algebraic expression of the query.

This expression is now passes to the query optimiser.

Page 4: Query Processing

Query ProcessingAfter optimization, the query optimiser

generates an action plan.This action plans are converted into

query codes that are finally executes by the run time database processor.

The runtime database processor estimates the cost of each access plan and chose the optimal one for execution.

Page 5: Query Processing

1. Syntax AnalyzerThe syntax analyzer takes query

from the users, parses it into tokens and analyses the tokens and their order to make sure they comply with rules of the language grammar.

If an error is found in the query submitted by the user, it is rejected and an error code together with an explanation of why the query was rejected is returned to the user.

Page 6: Query Processing

2. Query DecompositionAim is to transform a high level

query into a relational algebra query

To check whether that query is syntactically and semantically correct.

To transform the high level query into a query graph of a low level operations (algebraic expression).

Page 7: Query Processing

Query DecompositionThe query decomposer goes

through five sages of processing for decomposition into low level algebraic expression.

Query analysisQuery normalizationSemantic analysisQuery simplifierQuery restructuring

Page 8: Query Processing

Equivalence Rules

Idem potency Rules

Transformation Rules

Data Dictiona

ry

SQL Query

Algebraic Expression

Page 9: Query Processing

2.1 Query AnalysisAt the end of this analysis phase,

the high level query(SQL) is transformed into some internal representation that is more suitable for processing.

This internal representation is_Kind of query treeIt is a tree data structure that corresponds

to a relational algebra expression.It is also called as relational algebra tree.

Page 10: Query Processing

Relational Algebra TreeLeaf nodes of the tree, representing

the base input relations of the query.Internal nodes of the tree,

representing an intermediate relation which is the result of the applying an operation in the algebra.

Root of the tree, representing the result of the query.

The sequence of operation is directed from leaves to the root.

Page 11: Query Processing

Relational Algebra Tree

Mumbai_projбproj_loc=“Mumbai”(project)Control_dept(Mumbai_proj deptno=dno

(departement))Proj_de_mgr(Control_dept mgrid=empid

(employee))Result∏proj_no,deptno,name,add,dob(proj_de_mgr)

Page 12: Query Processing

Relational Algebra Tree

employee

project

бproj_loc=“Mumbai”(project) department

mgrid=empid

deptno=dno

∏proj_no,deptno,name,add,dob(proj_de_mgr)

Page 13: Query Processing

Query Graph NotationIn query graph representation, the relations

in the query are represented by relation nodes.

These relation nodes are displayed as single circle.

The constant values from the query selection are represented by the constant nodes, displayed as double circles.

The selection and join conditions are represented by the graph edges.

The attributes to be retrieved from each relation are displayed in square brackets above each relation.

Page 14: Query Processing

Query Graph Notation

P D E

“Mumbai”

p.proj_loc=“Mumbai”

p.deptno=d.deptno d.mgrid=e.empid

[p.proj_no,p.deptno] e.ename,e.add,a.dob

Page 15: Query Processing

Disadvantages of query graph notation

It corresponds to a relation calculus expression.

It does not indicate an order on which operation to perform first as is the case with query tree.

Page 16: Query Processing

Query NormalizationThe primary goal of normalization

is to avoid redundancyIn the normalization phase, a set

of equivalency rules is applied so that the projection and selection operations included in the query are simplified to avoid redundancy.

Page 17: Query Processing

Query Normalization

Conjunctive normal form – a sequence of boolean expressions connected by conjunction (AND):Each expression contains terms of comparison operators connected by Disjunctions (OR)

(emp_desig=“programmer” V empsal>40000) ^ loc=“mumbai”

Page 18: Query Processing

Query NormalizationDisjunctive normal form – a sequence of boolean expressions connected by disjunction (OR):Each expression contains terms of comparison operators connected by Conjunction (AND)

(emp_desig=“programmer” ^ loc=“mumbai”) V (empsal>40000 ^ loc=“mumbai”)

Page 19: Query Processing

ExampleLet us consider the following two relations stored

in a distributed databaseEmployee (empid, ename, salary,

designation, deptno)Department (deptno, dname, location)

and the following query:“Retrieve the names of all employees whose

designation is Manager and department name is Production or Printing”.

In SQL, the above query can be represented asSelect ename from Employee, Department

where designation = “Manager” and Employee.deptno = Department.deptno and dname = “Production” or dname = “Printing”.

Page 20: Query Processing

ExampleThe conjunctive normal form of the query is as

follows: designation = “Manager” Employee.deptno =

Department.deptno (dname = “Production” νdname = “Printing”)

The disjunctive normal form of the same query is(designation = “Manager” Employee.deptno =

Department.deptno dname = “Production) ν (designation = “Manager” Employee.deptno = Department.deptno dname = “Printing”)

Hence, in the above disjunctive normal form, each disjunctive connected by ν (OR) operator can processed as independent conjunctive subqueries.

Page 21: Query Processing

Equivalency RulesAn equivalence rule says that expressions of two forms are equivalent.

By applying these rules we can transform the RE into equivalent CNF or DNF.CNF – only tuples that satisfy all expressions

DNF – tuples that are the result of union of tuples that satisfy the expressions

Page 22: Query Processing

Equivalency Rules

1. Commutativity of UNARY operation:

UNARYOP1 UNARYOP2 REL <-> UNARYOP2 UNARYOP1 REL

б ѳ1 (бѳ2(E))=б ѳ2 (бѳ1(E))

2. Commutativity of BINARY operation:

REL1 BINOP (REL2 BINOP REL3) <-> (REL1 BINOP REL2) BINOP REL3

(E1 E2) E3 = E1 (E2 E3)

Page 23: Query Processing

Equivalency Rules

3. Idempotency of UNARY operations:

UNARYOP1 UNARYOP2 REL <-> UNARYOP REL

4. Distributivity of UNARY operations with respect to BINARY operation:

UNARYOP (REL1 BINOP REL2) <-> UNARYOP (REL1) BINOP UNARYOP (REL2)

∏L(E1 U E2)= (∏L(E1)) U (∏L(E2)) 5. Factorisation of UNARY operations:

UNARYOP (REL1) BINOP UNARYOP (REL2) <-> UNARYOP (REL BINOP REL2)

Page 24: Query Processing

Semantic Analyser

Applied to normalized queriesRejects incorrectly formulated queries:

Condition components do not contribute to generation of the result.

Rejects contradictory queries:Qualification condition cannot be satisfied by any tuple

The incorrectness and contradiction in the query is detected based on the corresponding query graph or relation connection graph.

Page 25: Query Processing

Relation Connection Graph for IncorrectnessA node is created in the query graph for the result

and for each base relation specified in the query.An edge between two nodes is drawn in the query

graph for each join operation and for each project operation in the query. An edge between two nodes that are not result nodes represents a join operation, while an edge whose destination node is the result node represents a project operation.

A node in the query graph which is not result node is labeled by a select operation or a self-join operation specified in the query.

A join graph for a query is a subgraph of the relation connection graph which represents only join operations specified in the query and it can be derived from the corresponding query graph.

Page 26: Query Processing

ExampleLet us consider the following two relations

Student (s-id, sname, address, course-id, year) and

Course (course-id, course-name, duration, course-fee, intake-no, coordinator)

and the query “Retrieve the names, addresses and course names of all those student whose year of admission is 2008 and course duration is 4 years”.

Using SQL, the above query can be represented as:Select sname, address, course-name from

Student, Course where year = 2008 and duration = 4 and Student.course-id = Course.course-id.

Page 27: Query Processing

Example

Student Course

Result

Duration = 4year = 2008

Student.course-id = Course.course-id

sname, address

course-name

Figure 11.4(a) Query Graph

Student Course

Student.course-id = Course.course-id

Figure 11.4(b) Join Graph

Page 28: Query Processing

ExampleIn the above SQL query, if the join condition

between two relations (that is Student.course-id = Course.course-id) is missing, then there should be no line between the nodes representing the relations Student and Course in the corresponding query graph (figure 11.4(a)). Hence, the SQL query is semantically incorrect since the relation connection graph is disconnected. In this case, either the query is rejected or an implicit Cartesian product between the relations is assumed.