query processing
Embed Size (px)
TRANSCRIPT

Query ProcessingQuery Processing is the procedure of
transforming a high level query into a correct and efficient execution plan expressed in low level language that performs the required retrievals and manipulations in the database.

Correct Query
Execution plan
Algebraic expression
Code to execute query
Query Result
Join manger
Runtime database processor
Query Code Generator
Query optimizer
Query Decomposer
Syntax checking and verification by the parser portion of query processor whether relations and attributes used in the query are defined in the database
Scanning, parsing, Validating
High- Level query language (standard form), for example,SQL
General query
Translation of relational calculus query to a relational algebra query using equivalency rules, idem-potency rules, transformation rules etc. from the global database dictionary
Transform into
Performing optimization by substituting equivalent expression for those in the query
generating code for the queries
Estimation of each access plan, selecting optimal plan and execution
Action
Action
Action
Action
Action
Main Database
Database Catalog
Statistical data
Estimation Formulas
Cost Module

Query ProcessingAs shown in the figure_
The user gives the query request, which may be QBE or other form.
This is first transformed into standard high level query language, such as SQL.
This SQL query is read by the syntax analyzer so that it can be checked for correctness.
The correct query is then passes to the query decomposer. Which will gives the algebraic expression of the query.
This expression is now passes to the query optimiser.

Query ProcessingAfter optimization, the query optimiser
generates an action plan.This action plans are converted into
query codes that are finally executes by the run time database processor.
The runtime database processor estimates the cost of each access plan and chose the optimal one for execution.

1. Syntax AnalyzerThe syntax analyzer takes query
from the users, parses it into tokens and analyses the tokens and their order to make sure they comply with rules of the language grammar.
If an error is found in the query submitted by the user, it is rejected and an error code together with an explanation of why the query was rejected is returned to the user.

2. Query DecompositionAim is to transform a high level
query into a relational algebra query
To check whether that query is syntactically and semantically correct.
To transform the high level query into a query graph of a low level operations (algebraic expression).

Query DecompositionThe query decomposer goes
through five sages of processing for decomposition into low level algebraic expression.
Query analysisQuery normalizationSemantic analysisQuery simplifierQuery restructuring

Equivalence Rules
Idem potency Rules
Transformation Rules
Data Dictiona
ry
SQL Query
Algebraic Expression

2.1 Query AnalysisAt the end of this analysis phase,
the high level query(SQL) is transformed into some internal representation that is more suitable for processing.
This internal representation is_Kind of query treeIt is a tree data structure that corresponds
to a relational algebra expression.It is also called as relational algebra tree.

Relational Algebra TreeLeaf nodes of the tree, representing
the base input relations of the query.Internal nodes of the tree,
representing an intermediate relation which is the result of the applying an operation in the algebra.
Root of the tree, representing the result of the query.
The sequence of operation is directed from leaves to the root.

Relational Algebra Tree
Mumbai_projбproj_loc=“Mumbai”(project)Control_dept(Mumbai_proj deptno=dno
(departement))Proj_de_mgr(Control_dept mgrid=empid
(employee))Result∏proj_no,deptno,name,add,dob(proj_de_mgr)

Relational Algebra Tree
employee
project
бproj_loc=“Mumbai”(project) department
mgrid=empid
deptno=dno
∏proj_no,deptno,name,add,dob(proj_de_mgr)

Query Graph NotationIn query graph representation, the relations
in the query are represented by relation nodes.
These relation nodes are displayed as single circle.
The constant values from the query selection are represented by the constant nodes, displayed as double circles.
The selection and join conditions are represented by the graph edges.
The attributes to be retrieved from each relation are displayed in square brackets above each relation.

Query Graph Notation
P D E
“Mumbai”
p.proj_loc=“Mumbai”
p.deptno=d.deptno d.mgrid=e.empid
[p.proj_no,p.deptno] e.ename,e.add,a.dob

Disadvantages of query graph notation
It corresponds to a relation calculus expression.
It does not indicate an order on which operation to perform first as is the case with query tree.

Query NormalizationThe primary goal of normalization
is to avoid redundancyIn the normalization phase, a set
of equivalency rules is applied so that the projection and selection operations included in the query are simplified to avoid redundancy.

Query Normalization
Conjunctive normal form – a sequence of boolean expressions connected by conjunction (AND):Each expression contains terms of comparison operators connected by Disjunctions (OR)
(emp_desig=“programmer” V empsal>40000) ^ loc=“mumbai”

Query NormalizationDisjunctive normal form – a sequence of boolean expressions connected by disjunction (OR):Each expression contains terms of comparison operators connected by Conjunction (AND)
(emp_desig=“programmer” ^ loc=“mumbai”) V (empsal>40000 ^ loc=“mumbai”)

ExampleLet us consider the following two relations stored
in a distributed databaseEmployee (empid, ename, salary,
designation, deptno)Department (deptno, dname, location)
and the following query:“Retrieve the names of all employees whose
designation is Manager and department name is Production or Printing”.
In SQL, the above query can be represented asSelect ename from Employee, Department
where designation = “Manager” and Employee.deptno = Department.deptno and dname = “Production” or dname = “Printing”.

ExampleThe conjunctive normal form of the query is as
follows: designation = “Manager” Employee.deptno =
Department.deptno (dname = “Production” νdname = “Printing”)
The disjunctive normal form of the same query is(designation = “Manager” Employee.deptno =
Department.deptno dname = “Production) ν (designation = “Manager” Employee.deptno = Department.deptno dname = “Printing”)
Hence, in the above disjunctive normal form, each disjunctive connected by ν (OR) operator can processed as independent conjunctive subqueries.

Equivalency RulesAn equivalence rule says that expressions of two forms are equivalent.
By applying these rules we can transform the RE into equivalent CNF or DNF.CNF – only tuples that satisfy all expressions
DNF – tuples that are the result of union of tuples that satisfy the expressions

Equivalency Rules
1. Commutativity of UNARY operation:
UNARYOP1 UNARYOP2 REL <-> UNARYOP2 UNARYOP1 REL
б ѳ1 (бѳ2(E))=б ѳ2 (бѳ1(E))
2. Commutativity of BINARY operation:
REL1 BINOP (REL2 BINOP REL3) <-> (REL1 BINOP REL2) BINOP REL3
(E1 E2) E3 = E1 (E2 E3)

Equivalency Rules
3. Idempotency of UNARY operations:
UNARYOP1 UNARYOP2 REL <-> UNARYOP REL
4. Distributivity of UNARY operations with respect to BINARY operation:
UNARYOP (REL1 BINOP REL2) <-> UNARYOP (REL1) BINOP UNARYOP (REL2)
∏L(E1 U E2)= (∏L(E1)) U (∏L(E2)) 5. Factorisation of UNARY operations:
UNARYOP (REL1) BINOP UNARYOP (REL2) <-> UNARYOP (REL BINOP REL2)

Semantic Analyser
Applied to normalized queriesRejects incorrectly formulated queries:
Condition components do not contribute to generation of the result.
Rejects contradictory queries:Qualification condition cannot be satisfied by any tuple
The incorrectness and contradiction in the query is detected based on the corresponding query graph or relation connection graph.

Relation Connection Graph for IncorrectnessA node is created in the query graph for the result
and for each base relation specified in the query.An edge between two nodes is drawn in the query
graph for each join operation and for each project operation in the query. An edge between two nodes that are not result nodes represents a join operation, while an edge whose destination node is the result node represents a project operation.
A node in the query graph which is not result node is labeled by a select operation or a self-join operation specified in the query.
A join graph for a query is a subgraph of the relation connection graph which represents only join operations specified in the query and it can be derived from the corresponding query graph.

ExampleLet us consider the following two relations
Student (s-id, sname, address, course-id, year) and
Course (course-id, course-name, duration, course-fee, intake-no, coordinator)
and the query “Retrieve the names, addresses and course names of all those student whose year of admission is 2008 and course duration is 4 years”.
Using SQL, the above query can be represented as:Select sname, address, course-name from
Student, Course where year = 2008 and duration = 4 and Student.course-id = Course.course-id.

Example
Student Course
Result
Duration = 4year = 2008
Student.course-id = Course.course-id
sname, address
course-name
Figure 11.4(a) Query Graph
Student Course
Student.course-id = Course.course-id
Figure 11.4(b) Join Graph

ExampleIn the above SQL query, if the join condition
between two relations (that is Student.course-id = Course.course-id) is missing, then there should be no line between the nodes representing the relations Student and Course in the corresponding query graph (figure 11.4(a)). Hence, the SQL query is semantically incorrect since the relation connection graph is disconnected. In this case, either the query is rejected or an implicit Cartesian product between the relations is assumed.