phases of distributed query processing

20
PHASES OF DISTRIBUTED QUERY PROCESSING -I PRESENTATION MEMBERS -ANU ISSAC -CRYSTAL CUTHINHO -LEON D’SOUZA -NEVIL D’SOUZA -ANDREA FURTADO

Upload: nevil-dsouza

Post on 26-Jan-2017

67 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Phases of distributed query processing

PHASES OF DISTRIBUTED QUERY

PROCESSING -IPRESENTATION MEMBERS

-ANU ISSAC

-CRYSTAL CUTHINHO

-LEON D’SOUZA

-NEVIL D’SOUZA

-ANDREA FURTADO

Page 2: Phases of distributed query processing

• SQL , Parsing , The SQL query determines what data is to be found, but does not define the method by which the data manager searches the database.

• Indexing, hashing, Views , ...

Page 3: Phases of distributed query processing
Page 4: Phases of distributed query processing

OBJECTIVES OF QUERY PROCESSING• First and main objective of query Processing, "To convert high level language(SQL) to low level

language(Relational Algebra)".

• In distributed systems, the query written on single machine but are actually executed on different local database. as the local database provides the data to execute the query.

• A query can be processed using various techniques, such that the technique should be efficient to execute the query in distributed environment.

• To minimize the overall cost of execution.(I/O cost + CPU cost +Communication cost).

• to minimize the time required to execute the query.(I/O time + CPU time +Communication time).

• This phase brings query into normalized form in order of easier processing of query.

• its specifically focus on the where clause of query.

Page 5: Phases of distributed query processing

2. ANALYSIS

• Lexical & Syntactical analysis

• Verification of relations & attributes

• Conflict between operations

• Checks if result is possible

Page 6: Phases of distributed query processing

2. ANALYSISSEMANTIC ANALYSIS :

1) Connection Graph:

aka Query Graph

Page 7: Phases of distributed query processing

2. ANALYSISSEMANTIC ANALYSIS :

2) Join Graph:

-Subgraph of connection graph

-Only join operations considered

Page 8: Phases of distributed query processing

3. SIMPLIFICATION

• Detects redundant predicates

• Transforms queries – makes them simple & efficient

• Not at the cost of semantic correctness

• Checks for factors responsible for redundancy

Page 9: Phases of distributed query processing

QUERY RESTRUCTURING

1. Rewrites Query into equivalent Relational Algebra

2. Makes use of a Query Tree or Operator Tree

• Leaf node for every relation in the query

• Non-leaf node for intermediate relation we can generate

• Root node for the result of a query

3.Sequence of operation is from Leaf towards the Root

4.Transformation Rules are applied .

Page 10: Phases of distributed query processing

EXAMPLE• Q) Find the names of employees other than Raj who worked on CAD/CAM project for either one or two years.

• Query:

SELECT Ename

FROM PROJ, ASG, EMP

WHERE ASG.ENO=EMP.ENO

AND ASG.PNO=PROJ.PNO

AND ENAME ≠ “Raj”

AND PROJ.PNAME=“CAD/CAM”

AND (DUR=12 OR DUR=24);

Page 11: Phases of distributed query processing

ΠENAME (P)

σDUR=12 OR DUR=24(S)

σPNAME=“CAD/CAM”(S)

σENAME≠“RAJ”(S)

Pno(J) ENO

PROJASGEMP

Page 12: Phases of distributed query processing

RESTRUCTURING

Page 13: Phases of distributed query processing

FRAGMENTATION:

• Forming relational algebraic queries to be used on fragmented relations.• Generation of fragmented query is done by replacing the global relations with

fragmented relation in the query tree of distributed query.• The generic tree still has some scope for reconstruction and simplification.• Generic tree is used to generate a simpler and optimized query by using

reduction technique.• Type of fragmentation determines the reduction technique to be used.

Page 14: Phases of distributed query processing

REDUCTION FOR HORIZONTAL FRAGMENTATION:• Reduction of generic tree is done using either selection operation

or join operation.• Selection operation: Produces an intermediate relation that

remains empty if there is contradiction between selection predicate and definition of the fragment.

• Join operation: Detects useless join operation by commuting joins with union operation.

Page 15: Phases of distributed query processing

EXAMPLE• Consider the schemas: EMP(ENO, ENAME, TITLE) , ASG(ENO, PNO, RESP, DUR)• Consider the following query and fragmentation: Query: SELECT *

FROM EMP, ASG WHERE EMP.ENO=ASG.ENO • Horizontal fragmentation: ∗ EMP1 = σENO ≤ “ E3”(EMP) ASG1 = σENO ≤ ” E3”(ASG)

EMP2 = σ “ E3””<ENO<” E6”(EMP) ASG2 = σENO> ” E3”(ASG) –

EMP3 = σENO> ” E6”(EMP)

Page 16: Phases of distributed query processing

USING SELECTION OPERATION

• Consider the query: SELECT * FROM EMP WHERE ENO=”E5”• Here the leaf node that corresponds to Employee relation in the generic tree

can be replaced by reconstruction rule.• The selection predicate has a contradiction with the definition of EMP1 and

EMP3 fragment and produces empty result.

Page 17: Phases of distributed query processing
Page 18: Phases of distributed query processing

Department(deptno,dname,location)

• DEPT1= σ deptno<=10(Department)

• DEPT2= σ deptno>10(Department)

• Assume that the fragmention of employee relation is derived from department

• EMPi=Employee deptno DEPTi i=1,2

Select * from Employee,Department where depno>10 and Employee.deptno=Department.deptno.

Page 19: Phases of distributed query processing

Employee.deptno=Department.deptno

deptno>10

U

U DEPT2

EMP1 EMP2

EMP2

Employee.deptno=Department.deptno

DEPT2 DEPT1

Page 20: Phases of distributed query processing