query processing in distributed database

Post on 26-Nov-2015

66 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Adapted from:Connolly, T and Carolyn Begg. Database Systems: A Practical Approach to Design, Implementation, and Management 4th Ed, Pearson Education Limited, 2005.

TRANSCRIPT

Distributed Query Optimization

Bayu Adhi Tama, ST, MTIbayu@unsri.ac.id

Distributed Query Optimization

Distributed Query Optimization

• Query decomposition: takes query expressed on global relations and performs partial optimization using centralized QO techniques. Output is some form of RAT based on global relations.

• Data localization: takes into account how data has been distributed. Replace global relations at leaves of RAT with their reconstruction algorithms.

Distributed Query Optimization

• Global optimization: uses statistical information to find a near-optimal execution plan. Output is execution strategy based on fragments with communication primitives added.

• Local optimization: Each local DBMS performs its own local optimization using centralized QO techniques.

Data Localization

• In QP, represent query as R.A.T. and, using transformation rules, restructure tree into equivalent form that improves processing.

• In DQP, need to consider data distribution. • Replace global relations at leaves of tree with

their reconstruction algorithms - RA operations that reconstruct global relations from fragments:– For horizontal fragmentation, reconstruction algorithm is

Union; – For vertical fragmentation, it is Join.

Data Localization

• Then use reduction techniques to generate simpler and optimized query.

• Consider reduction techniques for following types of fragmentation:– Primary horizontal fragmentation.– Vertical fragmentation.– Derived fragmentation.

Reduction for Primary Horizontal Fragmentation

• If selection predicate contradicts definition of fragment, this produces empty intermediate relation and operations can be eliminated.

• For join, commute join with union.• Then examine each individual join to

determine whether there are any useless joins that can be eliminated from result.

• A useless join exists if fragment predicates do not overlap.

Example 23.2 Reduction for PHF

SELECT *FROM Branch b, PropertyForRent pWHERE b.branchNo = p.branchNo AND p.type = ‘Flat’;

P1: branchNo=‘B003’ type=‘House’ (PropertyForRent)

P2:branchNo=‘B003’ type=‘Flat’ (PropertyForRent)

P3:branchNo!=‘B003’ (PropertyForRent)

B1: branchNo=‘B003’ (Branch)

B2: branchNo!=‘B003’ (Branch)

Example 23.2 Reduction for PHF

Example 23.2 Reduction for PHF

Example 23.2 Reduction for PHF

Reduction for Vertical Fragmentation

• Reduction for vertical fragmentation involves removing those vertical fragments that have no attributes in common with projection attributes, except the key of the relation.

Example 23.3 Reduction for Vertical Fragmentation

SELECT fName, lNameFROM Staff;

S1: staffNo, position, sex, DOB, salary(Staff)

S2: staffNo, fName, lName, branchNo (Staff)

Example 23.3 Reduction for Vertical Fragmentation

Reduction for Derived Fragmentation

• Use transformation rule that allows join and union to be commuted.

• Using knowledge that fragmentation for one relation is based on the other and, in commuting, some of the partial joins should be redundant.

Example 23.4 Reduction for Derived Fragmentation

SELECT *FROM Branch b, Client cWHERE b.branchNo = c.branchNo AND

b.branchNo = ‘B003’;

B1 = branchNo=‘B003’ (Branch)B2 = branchNo!=‘B003’ (Branch)Ci = Client branchNo Bi i = 1, 2

Example 23.4 Reduction for Derived Fragmentation

Global Optimization

• Objective of this layer is to take the reduced query plan for the data localization layer and find a near-optimal execution strategy.

• In distributed environment, speed of network has to be considered when comparing strategies.

• If know topology is that of WAN, could ignore all costs other than network costs.

• LAN typically much faster than WAN, but still slower than disk access.

Oracle’s DDBMS Functionality

• Oracle does not support type of fragmentation discussed previously, although DBA can distribute data to achieve similar effect.

• Thus, fragmentation transparency is not supported although location transparency is.

• Discuss:– connectivity– global database names and database links– transactions– referential integrity– heterogeneous distributed databases– Distributed QO.

Database Links

• Used to build distributed databases.• Defines a communication path from one Oracle

database to another (possibly non-Oracle) database.

• Acts as a type of remote login to remote database.

CREATE PUBLIC DATABASE LINKRENTALS.GLASGOW.NORTH.COM;

SELECT * FROM Staff@RENTALS.GLASGOW.NORTH.COM;UPDATE Staff@RENTALS.GLASGOW.NORTH.COMSET salary = salary*1.05;

Heterogeneous Distributed Databases

• Here one of the local DBMSs is not Oracle.• Oracle Heterogeneous Services and a non-

Oracle system-specific agent can hide distribution and heterogeneity.

• Can be accessed through:– transparent gateways– generic connectivity.

Transparent Gateways

Generic Connectivity

Oracle Distributed Query Optimization

• A distributed query is decomposed by the local Oracle DBMS into a number of remote queries, which are sent to remote DBMS for execution.

• Remote DBMSs execute queries and send results back to local node.

• Local node then performs any necessary postprocessing and returns results to user.

• Only necessary data from remote tables are extracted, thereby reducing amount of data that needs to be transferred.

top related