query processing

Query ProcessingQuery Processing is the procedure of

transforming a high level query into a correct and efficient execution plan expressed in low level language that performs the required retrievals and manipulations in the database.

Correct Query

Execution plan

Algebraic expression

Code to execute query

Query Result

Join manger

Runtime database processor

Query Code Generator

Query optimizer

Query Decomposer

Syntax checking and verification by the parser portion of query processor whether relations and attributes used in the query are defined in the database

Scanning, parsing, Validating

High- Level query language (standard form), for example,SQL

General query

Translation of relational calculus query to a relational algebra query using equivalency rules, idem-potency rules, transformation rules etc. from the global database dictionary

Transform into

Performing optimization by substituting equivalent expression for those in the query

generating code for the queries

Estimation of each access plan, selecting optimal plan and execution

Action

Action

Action

Action

Action

Main Database

Database Catalog

Statistical data

Estimation Formulas

Cost Module

Query ProcessingAs shown in the figure_

The user gives the query request, which may be QBE or other form.

This is first transformed into standard high level query language, such as SQL.

This SQL query is read by the syntax analyzer so that it can be checked for correctness.

The correct query is then passes to the query decomposer. Which will gives the algebraic expression of the query.

This expression is now passes to the query optimiser.

Query ProcessingAfter optimization, the query optimiser

generates an action plan.This action plans are converted into

query codes that are finally executes by the run time database processor.

The runtime database processor estimates the cost of each access plan and chose the optimal one for execution.

1. Syntax AnalyzerThe syntax analyzer takes query

from the users, parses it into tokens and analyses the tokens and their order to make sure they comply with rules of the language grammar.

If an error is found in the query submitted by the user, it is rejected and an error code together with an explanation of why the query was rejected is returned to the user.

2. Query DecompositionAim is to transform a high level

query into a relational algebra query

To check whether that query is syntactically and semantically correct.

To transform the high level query into a query graph of a low level operations (algebraic expression).

Query DecompositionThe query decomposer goes

through five sages of processing for decomposition into low level algebraic expression.

Query analysisQuery normalizationSemantic analysisQuery simplifierQuery restructuring

Equivalence Rules

Idem potency Rules

Transformation Rules

Data Dictiona

ry

SQL Query

Algebraic Expression

2.1 Query AnalysisAt the end of this analysis phase,

the high level query(SQL) is transformed into some internal representation that is more suitable for processing.

This internal representation is_Kind of query treeIt is a tree data structure that corresponds

to a relational algebra expression.It is also called as relational algebra tree.

Relational Algebra TreeLeaf nodes of the tree, representing

the base input relations of the query.Internal nodes of the tree,

representing an intermediate relation which is the result of the applying an operation in the algebra.

Root of the tree, representing the result of the query.

The sequence of operation is directed from leaves to the root.

Relational Algebra Tree

Mumbai_projбproj_loc=“Mumbai”(project)Control_dept(Mumbai_proj deptno=dno

(departement))Proj_de_mgr(Control_dept mgrid=empid

(employee))Result∏proj_no,deptno,name,add,dob(proj_de_mgr)

Relational Algebra Tree

employee

project

бproj_loc=“Mumbai”(project) department

mgrid=empid

deptno=dno

∏proj_no,deptno,name,add,dob(proj_de_mgr)

Query Graph NotationIn query graph representation, the relations

in the query are represented by relation nodes.

These relation nodes are displayed as single circle.

The constant values from the query selection are represented by the constant nodes, displayed as double circles.

The selection and join conditions are represented by the graph edges.

The attributes to be retrieved from each relation are displayed in square brackets above each relation.

Query Graph Notation

P D E

“Mumbai”

p.proj_loc=“Mumbai”

p.deptno=d.deptno d.mgrid=e.empid

[p.proj_no,p.deptno] e.ename,e.add,a.dob

Disadvantages of query graph notation

It corresponds to a relation calculus expression.

It does not indicate an order on which operation to perform first as is the case with query tree.

Query NormalizationThe primary goal of normalization

is to avoid redundancyIn the normalization phase, a set

of equivalency rules is applied so that the projection and selection operations included in the query are simplified to avoid redundancy.

Query Normalization

Conjunctive normal form – a sequence of boolean expressions connected by conjunction (AND):Each expression contains terms of comparison operators connected by Disjunctions (OR)

(emp_desig=“programmer” V empsal>40000) ^ loc=“mumbai”

Query NormalizationDisjunctive normal form – a sequence of boolean expressions connected by disjunction (OR):Each expression contains terms of comparison operators connected by Conjunction (AND)

(emp_desig=“programmer” ^ loc=“mumbai”) V (empsal>40000 ^ loc=“mumbai”)

ExampleLet us consider the following two relations stored

in a distributed databaseEmployee (empid, ename, salary,

designation, deptno)Department (deptno, dname, location)

and the following query:“Retrieve the names of all employees whose

designation is Manager and department name is Production or Printing”.

In SQL, the above query can be represented asSelect ename from Employee, Department

where designation = “Manager” and Employee.deptno = Department.deptno and dname = “Production” or dname = “Printing”.

ExampleThe conjunctive normal form of the query is as

follows: designation = “Manager” Employee.deptno =

Department.deptno (dname = “Production” νdname = “Printing”)

The disjunctive normal form of the same query is(designation = “Manager” Employee.deptno =

Department.deptno dname = “Production) ν (designation = “Manager” Employee.deptno = Department.deptno dname = “Printing”)

Hence, in the above disjunctive normal form, each disjunctive connected by ν (OR) operator can processed as independent conjunctive subqueries.

Equivalency RulesAn equivalence rule says that expressions of two forms are equivalent.

By applying these rules we can transform the RE into equivalent CNF or DNF.CNF – only tuples that satisfy all expressions

DNF – tuples that are the result of union of tuples that satisfy the expressions

Equivalency Rules

1. Commutativity of UNARY operation:

UNARYOP1 UNARYOP2 REL <-> UNARYOP2 UNARYOP1 REL

б ѳ1 (бѳ2(E))=б ѳ2 (бѳ1(E))

2. Commutativity of BINARY operation:

REL1 BINOP (REL2 BINOP REL3) <-> (REL1 BINOP REL2) BINOP REL3

(E1 E2) E3 = E1 (E2 E3)

Equivalency Rules

3. Idempotency of UNARY operations:

UNARYOP1 UNARYOP2 REL <-> UNARYOP REL

4. Distributivity of UNARY operations with respect to BINARY operation:

UNARYOP (REL1 BINOP REL2) <-> UNARYOP (REL1) BINOP UNARYOP (REL2)

∏L(E1 U E2)= (∏L(E1)) U (∏L(E2)) 5. Factorisation of UNARY operations:

UNARYOP (REL1) BINOP UNARYOP (REL2) <-> UNARYOP (REL BINOP REL2)

Semantic Analyser

Applied to normalized queriesRejects incorrectly formulated queries:

Condition components do not contribute to generation of the result.

Rejects contradictory queries:Qualification condition cannot be satisfied by any tuple

The incorrectness and contradiction in the query is detected based on the corresponding query graph or relation connection graph.

Relation Connection Graph for IncorrectnessA node is created in the query graph for the result

and for each base relation specified in the query.An edge between two nodes is drawn in the query

graph for each join operation and for each project operation in the query. An edge between two nodes that are not result nodes represents a join operation, while an edge whose destination node is the result node represents a project operation.

A node in the query graph which is not result node is labeled by a select operation or a self-join operation specified in the query.

A join graph for a query is a subgraph of the relation connection graph which represents only join operations specified in the query and it can be derived from the corresponding query graph.

ExampleLet us consider the following two relations

Student (s-id, sname, address, course-id, year) and

Course (course-id, course-name, duration, course-fee, intake-no, coordinator)

and the query “Retrieve the names, addresses and course names of all those student whose year of admission is 2008 and course duration is 4 years”.

Using SQL, the above query can be represented as:Select sname, address, course-name from

Student, Course where year = 2008 and duration = 4 and Student.course-id = Course.course-id.

Example

Student Course

Result

Duration = 4year = 2008

Student.course-id = Course.course-id

sname, address

course-name

Figure 11.4(a) Query Graph

Student Course

Student.course-id = Course.course-id

Figure 11.4(b) Join Graph

ExampleIn the above SQL query, if the join condition

between two relations (that is Student.course-id = Course.course-id) is missing, then there should be no line between the nodes representing the relations Student and Course in the corresponding query graph (figure 11.4(a)). Hence, the SQL query is semantically incorrect since the relation connection graph is disconnected. In this case, either the query is rejected or an implicit Cartesian product between the relations is assumed.

query processing

Documents

query selection

query optimiser

query request

query codes

query decompositionaim

relational algebra query

kind of query tree

query graph notationp