chapter iv - shodhgangashodhganga.inflibnet.ac.in/bitstream/10603/49226/10... · 1950s a bummer...
TRANSCRIPT
CHAPTER - IV
RULE-BASED APPROACH FOR DISTRIBUTED QUERY OPTIMIZATION
Database systems are C Q m p ~ t e r s y ~ t e m a that store a i ~ r p e
number o f flcta about some subject. Dt?~ign and implementation of
database system ie developing and i e a most challenging field of
computer sciencel Many techniques have been developed to enable
the efficient representation, tora age and retrieval of a number of
facts. In centrallaed or distributed database myatem, query
optimization is most complex task. The subject, query processing,
becomes interesting when it require8 deductive reamoning with
facts in database while retrieving answers to given queries.
There are several problem8 in deeigning much intelllgent
information retrieval systems, First, i t is very difficult in
building a system that (:an understand queries stated in a natural
language like English. The language under~tanding problem is
solved by specifying some formal machine understandable query
languages but the second problem, deducing answers from stared
facts remains as it is, Third, understanding query and deducing
an answer t o it may require more knowledge than that i e
represented in database, This type of common knowledge ia often
required for solving suoh problems. Artificial intelligence
method. are required to represent S U Q ~ C O ~ o n knowledge and 8l.o
in deeigning eystems using common knowledge,
Many researchers developed intelligent extensible database
eysteme such as EXODUS 1 1 2 1 , PROBE [ 2 6 1 , Starburst 1 4 5 1 and
postgreat 101 1 , Freytag I 3 3 1 and Graef e I40 1 proposed a rule-bared
query optimization. In 161 an approach automatic rule derivation
for aemantic query optimization was prnpoaed, All theee papere
deal with query processing in centralised database Sy8tems. But
from the available literature , it ie clear that there ir no
intelligent system developed for distributed query prooessing.
There are many papers 118,35,36,80,92,97,1101 related to
distributed query procemeing. Some author0 [13,82,971 coneider
fragmentation technique as part of optimization problem and some
other [14,1021 separated fragmentation problem complet,ely from
query optimization problem,
There la no single approach which oontaine algebraia
trenaformations, query translation (query over global relations
into a query over fragment) and proces~ing of query ae part of
optimization problem.
Most of distributed query prooeesing algorithms oonaider only
Borne operation^ in query qualifications, auch as only loins in
Join queries eto.. Algorithms for general queries which
allows more available operations, are very rare*
Different algorithms are ~roposed depending on the basic
made and also objective funot.inn, Thie chapter
preesnts an architecture of distributed query processing eyatem
with rule-based approach. Thie model contains all three etagam,
algebraic transformatiori, Translation end prooeesing of query In
distributed environment. Qualification of query in this model may
include union and some other relational operations.
4 1 OVERVIEW OF ART1 FI CI AL INTELLIGENCE:
Many human mental activities such as writing oomputer
programs, solving 'mathematical problem^, engaging in common senee
reasoning, understanding language, and even an automobile driving
require 'Intelligence'. Over the past few decadaa, eeveral
computer systema were built that can perform tasks deecribed
above. Some computer systems are designed speoillcally for
solving problem such as diagnose of dieeaees, plan rynthssie of
complex organio chemical compounds, solving differential equationa
in aymbolio form, analyelng electronlo nircuits, nstural language
processing eto,, Such eystems possess some degree of artiflois1
intelligence, Those aystems are ooneidered under the field of
Artificial Intelligence ( A 1 1 , mince they uee some A 1 methods and
techniques,
4*1.1. EARLY WORKS I N AII
During 1950s eeveral event8 marked thle period as real
beginning of During thie period reeearohers like Claude
Shannon at H I T , Allen Newell at RAND oorporation developed chesr
playing Programsl Other types of game playing and elmulation
programs were also developed during t ,h ia period. Language
translation programs were developed during 1 9 5 5 ~ ~ During mid
1950s a Bummer workshop on A 1 was uponnored by IBM which i e
coneidered as date of birth of A I , In an seminar on A 1 during
1972, much discussion was focused on automatic theorem proving
and new programming. languages.
Some significant A 1 events of the 1 9 6 0 ~ include the
1962 --- G, W. Ernst made the first A 1 computer controlled
Rob0 t
1964 --- B. Ruphael worked on question-answering eyetern6
giving trivial databases
1961-65 --- A. L , Samuel developed a program which learned to
play checkers at master's levelB
1965 -..- J . A . Robinson introduced rseolutlon ae an lnferencs method in loglc
1965 --- Work on expert eystem DENDRAL was begun,
1968 --- Work on MACSYMA was initiated at MI7
During 1970s A1 S~Stefns were developed which had large
domain-~pecific knowledge bamea and extensive proosdural
knowledge, BY end of 1 9 7 0 ~ ~ the field of expert syrteme and
natural language Proce~sing had begun to emerge. By thta time A1
researah already affected many fields, including program
techniques in Mathem~t ics, Chemletry, P~ychology, Geology, Oil
exploration etc,, In 1980's , e p ~ c i a l A 1 hardware and software
development tools have begun to emerge, and commercial
applications were increased. At the same time research ha8 begun
on fundamental iasues of learning, memory structure, knowledge
representation and knowlerlge acquisition,
4 a 1 a 2. APPLI CAT1 ONS OF ART1 FI CI AL I NTELLI GENCEt
Real time application of A1 c lo~ely related to human beings
l e Natural Language Processing. Developing a computer system
capable of generating and under~t~anding fragments of a natural
language such as English i e conaidered as natural language
Proceeaing, Grosz 1 4 4 ) presents a good eurvey of current
techniques and problema in natural language processing,
Business application of A1 is Intelligent Retrieval from
Databases. Retrieving answers for given queries from database
using deductive reasoning with facts in the database. Paper8
describing various ap~licationa of A 1 and lagio to database
organization and retrieval are contained in a book edited by
Gallaire and Minker 1341,
A1 techniques are been applied in development of autornatio
consulting eyetems, Many such systems employ the A 1 technique of
rule-based deduction, Expert consulting systems have been
developed for a variety of domains, Expert conuulting eyeteme
/e been built that can diagnose diseases I1041, evaluate ore
deposita [ 2 9 1 , ~ u g g e e t ~tructurea for complex organic chemicals
1111 etc..
In general finding a proof for a theorem in mathematics in
considered a8 an intellectual t,ask, In A I , formalization of
deductive procees uses predicate logic. Early application of A1
in theorem proving are plans geomet,ry, propo~itional logic,
analysis, topology, g e t theory eta,, Theorem proving technique
applied in building excellent resolution based ~ysteme.
Much o f the theoretical reaearch in robotics was oonducted
thorough robot proJects in late 1960s and e a r l y 19708, More
reeearch o n practical applications of robotics i n industrial area
*ere conducted during late 1970s. Research on robotios in
theoretical ae well as practical applications has helped to
develop many A 1 ideas. There are plenty o f application of A t in
different fields* In automatic programming Manna and WaLdingcr
1741 describe logic baaed method fur program verification,
Lauriere I 6 4 1 present,^ a computer l n n g ~ l a g e and a syatem for
solving combinatorial problems u ~ i r ~ g A 1 mcthods, Many papers on
the problema o f visual perception by machine are contained in
volumes edited by Hanaon and Riaeman 1 4 8 1 , Muc:h research ia going
on in developing A1 technique^ in different field8 such aa
automatic design, education, organic chemical synt.hesir etc. .
In this chapter the technique of rule-based daduation for
intelligent retrievdl from database8 is used.
4.2. EXPERT SYSTEMS:
In 1841 Newell surveyed several organizational alternative0
for problem solvers . He was concerned with how one ehould
Proceed in designing problem-solving systems. Many techniques
have been developed in AI research and many expert eyatems have
been built, Expert systems are problem-aolving programa that
s o l v e aubstant ia l problem8 , generally considered as d l f f ioult and
requiring e x p e r t i s e They are knowledge based beoauee their
Performancs depend. critically On the Use o f fact. arid heurigtioa
Used by experts. Reoently some textbook8 on A 1 I79.811 present
examples of advanced programming technique. in A I . A textbook on
expert syatema containing a number of paper8 related to this
field, was edited by Amar gupta and P r a ~ a d 1 8 4 1 .
An expert systems knowledge is obtained from expert eourcem
and coded in a form suitable for the system to uee in its
reasoning process, The expert knowledge is generally obtained
from apecialists or other sources of expertise, such as texts,
journal articles and data basen. Thla knowledge is enooded in
required form, loaded into a knowledge bane, then tested and
refined continuously throughout the life of the yete em.
4.2.1, DEFINITION AND CHARACTERISTICS OF EXPERT SYSTEMS:
Expert eystems are non-conventional knowledge lnteneive
Programs that e o l v e problems normally requiring human expertise.
I t performs many of the secondary funct.ions that expert doee . By
examining the functioning of expert B Y U ~ ~ ~ R , common
charaotsrietice of expert system can be summarieed aa follows:
---They aolve very difficult problem as well as or better than
expert8
---They reason heuristically, using rule8 which experts conrider am
effective
---They interact with humans in appropriate ways. Inoluding the
use of natural language,
---They manipulate and reason about symbolic deecriptiona
---They function with orroneou8 data and uncertain judgmental
rules
---They prooela multiple hypotheeio simultaneously
---They explain why they are asking a question
---They Justify their concluelona
4.2.2. EVOLUTION AND BACKGROUND H I STORY I
Expert systems first emerged from refiearch laboratoriee of
few US Universities during 19608 and 19708, They were developed
a8 specialleed problem solvers which emphasized the ure of
knowledge rather than algorithms and general search methods.
Figure 4 , l porsitione expert symtems around 1980-81, when efforts
begun to commercialise the techtrology, The first company formed
exclueively to promote expert systems in the field of genetic
Engineering was, "INTELLI GENETICS". The first completed expert
System 'DENDRAL' was developed at Stanford University in late
1960e, The Stanford group developed the first nucne~mful learnlnfi
syatem, 'META-DENDRAL', and also a varlety of applioationa in
medicine as 'MYCIN', Now the range and depth of applicatione of
expert eystems are expanding in many areas.
MAT EYA ICAL pg88E6E6Xf Eoo18 TEOR T f ~ T O M A ~ I ~ R
) I I 1
)
4.2.3. IMPORTANCE AND APPLICATIONS:
The value of expert system was eutabliahed by the early
1980s. Theee systems proved to be cost effective in moet of the
Praotical applioations. In an sppllcation, difficult problem6 are
solved by experts. These experte may not be available always or
they may retire from their aervicee. So their knowledge or
experienoe can be utilieed forever to solve such stereotype
Problem by developing an exper,, aystema with experts' experience^.
There are many application in almost all areaR of bu8inees and
government, They include the areas such as;
---Different types of medical diagnonea
---Diagnosis of complex electronic and electromechanical Byateme
---Diagnoeie of dieeel electric locomotion systems
---Diagnosis of software development projects
---Planning experiment^ in biology, chemistry and molecular
genet i cs
---Forecasting Crop damage
---Identification of chemical compaund structures and chemical
compound8
---Evaluation of loan applicants for lending institutions
---Design o f VLSI system8
---Military applications ranging from battlefields asses~ment to
ocean surveillance
---~esessment of geologic structures from dip meter logs
---Teaching students specialized tasks
---planning curricula for student8
4-21 4, RULE-BASED SYSTEM ARCHITECTURE:
Expert uystem architectures are categorised lnto two models
as production rule system , known a l ~ o as rule-based aystem and
nOn production rule ayatem, based on rule representation iohernes.
Moat,oommon form of architecture i8 rule-baaed system. Thie type
of ~ymtern uae8 knowledge encoded in the form of production rules,
that is , i f . , . then.. . rules. Main components of rule-baaed
eyatem ie shown in figure 4 . 2 .
USER I EXPERT SYSTEM
I
INFERENCE CASE
WOLKXN
MEMORY
Figure 4 , 2 , Components of Expert system
KNOWL E L S E BASE:
Knowledge base contains facts and rules about some
epeoialised knowledge domain, For developing expert eyatem, much
domain knowledge is required. Then it may acoeaa much knowledge
t 0 give intelligent advice re la t ed to that domain. Thia element
of system is most important while oonstructing expert system and
system are also known as knowledge base systems, Knowledge bare
contains both declarative knowledge and procedural knowledge.
INFERENCE PROCESS:
Simply having acoesa to a great deal of knowledge doe. n o t
make an expert ayetern intelligent, The component that directs the
implementation of knowledge is known an inference engine. Th i a
inference engine accepts user input queries and reeponrse to
questions through 110 interface and uRes this information with the
knowledge stored in knowledge base. Thie procees is oarried out
recursively in three stages : 1, Match, 2 . Select and 3 , Execute.
During the match stage, t.he content^ of working memory are
compared to facts and rule^ contained in knowledge baae. When
consistent matches are found, corresponding rules are placed in
conflict set. One of the rules from conflict set i~ selected for
execution. Selected rule ie executed and action part of the rule
is carried out.
I / O I N T E R F A C E :
The I/Q interface permits the user to communicate with the
ayetern in a more natural way, u ~ i n g eimple selection menus or use
of a language cloae to natural language. The aornmunication
performed by 1/0 interface ia bidirectional. AI teohnique,
Natural language processing can be used in this interface to
Communioate with system in an ordinary Engllah and enable the
Computer to reepond in same language. This type of user interface
is oalled a natural language front end.
This i8 an area of memory used for storing a dcacription of
the problem constructed by the system from the facta s u p p l i e d by
the user or inferred from the knowledge bane during a
consul tation,
EXPL A l NATZ ON HOWLE;
This module provides the user with an explaination of the
reasoning process when requeated. This will explain the
responses to 'HOW query iu process' and 'Why query needs certain
information while processing',
L E A R N I N G MOWLE AND HISTORY F I L E :
Theee are not cammon components of expert systems, They are
provided to assist in building and refining the knowledge base.
4*2* 56 RULE-BASED QUERY OPT1 MI ZATI ON:
To minimize the ahanges needed to build an optimizer for a
new databaee system, recent re~earchere developed extensible query
optimizere as EXODUS 1121, PROBE 1261 and POSTGRES t1011, Both
Freytag 1331 and Graefe [ 4 0 1 have proposed a rule-based view of
query optimization This approach allows a database implementcr
to ~ p e c I f ~ algebraic tranaformationa as a set of rewrite rules.
his ~pecifioation is used to generate an exeeutablc conventional
query optimizer. Freytag [ 3 3 1 desorlben a rule-baled approaoh to
generate different query plans, given an initial query
a p e c i t i ~ a t i o n ~ He deeoribee an approach that e e l e c t ~ the optimal
Bet of algebraic traneformatione whlch can he applied to a given
query.
~ r a c f e ' e syrstem I 4 0 1 ia based on an optimizer generator,
which uses a set of algebraic tranaformationa t,o d e r i v e a
executable oonvent.iona1 query optimizer. Graefe considere problem
similar to those found in semantic query optimization, euoh a a
identification and select ion of transformat ions bseed on
approximate methods . These methods use c o ~ t formulae and pant
database performance to evaluate the worth of a given
transformation.
In 1981 authors describe the architecture o f a yete em having
~ W O interrelated components: combined oonventional/semsntio query
optimizer, and an automatic rule deriver. Semantic query
optimizer ia 8imple generalisation of oonventional rule-based
O ~ t i m i z e r , in which aemantia transformation heuri~tiel are used in
place of algebraic transformation heuristios.
B U ~ from the knowledge available . it is obvloua that there
is very little research on rule-based approach for Distributed
query optimization. In this chapksr an archlt,ecture for
rule-based approach for distributed query proceaulng itv propomed.
4.3. HEURISTIC RULE-BASED DISTRIBUTED QUERY OPTIMIZATION:
In general, distributed query prncesaing will be executed in
two major steps, ( 1 ) Tran~lation of a query over glotlnl rqlationa
into equivalent form of query over fragmentr and ( 2 )
Transformation of query into optimally equivalent form by applying
algebraic and semantic equivalent tran~formationa. In distributed
environment at each node local and global optimizer8 exist.
Global optimizer, translates given query into opLimal fragmented
query and non local ~egmenta are transmitted to carreaponding
aitsa where Local optimization procedure i e applied to that
segment to execute optimally. Therefore local optimizer is
analogous to an optimizer in Centraliaed ~ystem with alight
modifications. In centralised aysteme goal of optimization la to
reduce proceasing and 1/0 cogts whereas in dimtributed environment
goal of optimization is to reduce oommunioation cost-
Already rule-baaed approach 1331 for query processing and
also for semantic trall~formnt.ion [ g e l in centralieed syatem were
proposed. Same procedure can be adopted for second 8tcp of
distributed query optimization to generate an optimal plan by
adding more trsn~formation heuristics. Therefore main problem in
developing rule-baaed approach for dist,r i butcd query optimization
i s translation of a global query into fr~gmented query. In this
aectibn architecture of heuristic baaed query optimizer ie
presented and functioning of t,ranslation procensor J R discussed in
deta i 1.
Dietributed query optimizer i s decomposed into two
independent components: ( 1 ) Tran~lat~ion Processor nnd ( 2 )
Algebraic/Semantic optimizer a8 shown in Figure, 4 , 3 , A query
over global relations will be input to ' Translatior1 Processor '
and output of this component will be optimal fragmented query
which i e an input to second component ' Algabra!c/Semantic
optimizer', Output of ~ e c o n d component is an opt.fma1 query plrn
which is sent to query proceReor to execute according to
minimization of transmission cost, In most of distributed query
processing algorithms [18,80,97,l10ll authors are a ~ n u m e d that the
given query is algebraically and semantically in optimal form over
fragments. Those algorithmg can be applied directly to the output
of query plan reBulted from this heuri~tic based optimizer.
QUERY OVER ALOEBRAIC
OLOBAL RELATIONS PROCESSOR FRAOMENTS TRE! P ~ R M PTIMXZER
r- - - - -- -- -- -- .- --.-A I 1 APPL I C A T 1 ON RESULT OF QUERY - OF ANY D P P -
ALOOR I THM
Figure 4 , 3 , Distributed Query Processor
4+3* 1 0 TRANSLATION PROCESSOR :
This processor accepts query over global relat.lon~ am input.
A set of tranelation rules are applied to get an optimally
equivalent query . o v e r fragments. Fragment t ransf ormat. ion
heuristics are used to identify moat promising tranaformations.
The site of transformation set reduoea, on uaing these heuriatica.
4 * 3 + 1 * 1 * RULES AND FRAGMENTED TRANSFORMATIONS!
T h e following database scheme is used to illuatrate query
examples in this section.
DOCTOR ( DNUM, NAME, D E P T )
PATIENT ( PNUM, NAME, DEPT, TREAT, DNUM, AMOIJNT-DUE)
DEPARTMENT ( DEPT, LOCATION, D I R E C T O R )
These global relations are fragmented and are allocated to
different eitee, Each fragment will be in the form 9 j repremeriting ith fragment of relation F allocated at nits j . A l l
fragment. are disjoint and satisfy oompletences and reconstruction
[ 1 4 , 1 0 2 1 condition.. Let the fragmenta or above relation. be:
DOCTOR22 = SL DEPT = * PEDIATRICS' (DOCTOR
DOCTOR33 = SL DEPT O ' SUROERY' AND DEPT 0 ' PEDIATRICS ' (DOCTOR 1
PATIENT14 e PJ PNUM, NAME, AMOUNT-DUB: ( PATIENT)
PATIENT21 SLDEPT = ' SUROERY * ( PJ
PNUM, bEPT, TREAT, DENUM
(PATIENT) 1
PATIENT32 = SLDEPT ' PEDIATRIGS ' ( PJ
PNUM, PEPT, TREAT, DENVM
(PATIENT) 1
PATIENT43 = SLmzpT > * S U R ~ E I Y ' AND <) 'PEDIATRICS'
( PJ ' PNUM, DEPT, TREAT, DEHUM
I PATIENT) )
DEPARTMENT11 = SL LOCATION = ' S V X U S '
( DEPARTMENT )
DEPARTMENT22 I SLLOCATION l S Y I R * ( DEPARTMENT) 1
Reconstruction o f fragmenta into a global relation i r etored
in fragmentation rule-bame in form of equivalence rules. In
addition to theae reconstructing rules, eome inference rule8 are
also stared in rule-baae. These rules are provided by the system
t o aharacteriae the database, For example in this dstabase at
SVlMS only SURGERY department exist. Thia can be framed as rule
FR4, Rules are repreeented as shown in Figure 4 . 4 . . Eaah tow
repreeents s rule in four oolumna as Rule-name, Lett part of the
rule , Right part of the rule and the operator
RULE-NAME OPERAT OR LWS RHS
DOCTOR DOCTOR 1 lol UN DOCTOR2 ZQ1 UN
PAT I ENT ENT14QJ J N p N u H = p N u Y
P A T I E N T 4 3 Q g AND Q4 1
DEPARTMENT DEPARTMENTllQ6 UN
DEPARTMENT Q 7
Figure 4 . 4 . Sample Fragmented Rule-Baae
Where
Q : DEFT = ' S U R ~ E R Y '
cat : DEPT = 'PEDIATRICS'
QB : DEPT <> ' SUROERY' A N D DEPT <) ' PEDIATRICS'
a4 : P N U X , DEPT, TREAT, DNUM
QZI : PNUM, NAME, AMOUNT-DUE
Q6 : LOCATION : ' SVIMS'
0 7 : LOCATION = ' S V R R '
Here onward. we u.e qua1 ification identif ier8 (~i,~2,. . etc. ,)in
place of qualifications.
4.3.1.2. HEURISIIC BASED FRAGMENT TRANSFORMATIONS:
Tranalormation set for a query may be largo mnd i t I 8
impractioal to consider a11 possible tranaformationa. Yoreovor.
only a small percentage of the possible tranaformatione are
useful. Therefore to identify the most u~eful transformations,
transformations heuristics are developed. These are same as
algebraio tranaformntione according to the translatton procesm.
The proposed architecture for tranalation proceasor i shown in
Figure 4.5.
RULE - BASE
CANON 1 CAL QUERY TRANSDUCER
TREE --
- OLOBAL TO FRAUYENT OPTIMAL TRANSFORMATION FRACIMENTED
> MODULE QUERY PLAN rr > I
Figure 4 . 5 , Translation Processor
translation process is decomposed into a series of modulea.
each of which 18 straight forward. In this section first three
modules of thia proceasor are described and fourth module is
diecueeed in next section.
The procesa begins with a query in the form of relational
algebralo ex~reaalon. TRANSDUCER module tranalntca t h i s query
into a tree re~reaentat ion form. Succeeding modules opsralr on
t h i s tree structure,
N e x t module SUBSTITUTOR nubetitutes all global rela,,iona with
corresponding fragment expression ueing fragment rule-base, For
example, let given query be
P J DNUM,HAME
( St (DOCTOR J N Qb DEPT = PEPT
DEPARTMENT) ,,,(Ql)
Canonioal form QL ' from rules FR1 and FR3 of fragmented
rule-base (Figure 4 , 4 ) i a
For each global relation In query, Subatitutor module
searches for a matching In LHS column of rule-base, I f a correct
match is found with ' = ' aa operator in Operator column then the
global relation in query will be replaced with correapondlng RHS
Part of the rule in rule-base. In tree format generally these
global relations are at leaf nodes and are substituted by
carreaponding equivalent expressionsl
This fragmentation tranaformation module take8 t,he canonical
form of tree query as input, and trien to find a tran~formtion
heurietic that applies to aome nubtree, Same f raymcnted
t r a n s f o r m ~ t i ~ n heurintic. uaed in explaining example. in thin
chapter are given below:
Push eelections down to the leave8 of cannnical t r e e and
then apply on them relevant qualified algebraic rules; Substitute the eelcution reeult with empt.y rclstlon l f the qualification of the result i e contradictory
FTHI: Puah-Seleotlon Heuriat,ic
Qualifioation of operandm of Join are evaluntsd uring required gual if i c h algebraic rules, I f the q u ~ l I f ication of the result of the Join is contradictory then replaoe the corresponding subtree wlth empty relakion
FTH2 : Join-Distribution Heuristic
Unionr murt be purhad up beyond the Jo lna t o dlstrlhuts jo ine over union6
FT83: Union-Dietribution Heuristic
A selection on a qualified relation results i n a qualified relotion wlth predicates containing selection formula and qualification of the o r i g i n a l relation.
FTH4: Seleotion-Qualification Rauristic
The result of a proJecti0n on qualified relstion is original relation having same qualification with selected attributen.
FTHS: Projection-Qualification H e u r i ~ t i r
The result of any of the binary operatione CP, SJ or J N is empty i f one of the operands is an empty relation
FTH6: Binary-Empty Heuristic
Union operation UN with one operand ae empty relation results in the second operand
FTH7: Union-Empty Heurietic
In addition to above heuristics more number of
heuristic8 can be created according to fragmentation tranelation
formulas while implementing the ~ y s t e m , The following examplee
describe how a global query is translated into a equivalent
optimal fragmented query us ing fragment t.rsnsformst, ion heur lrt i o h
Let t h e given q u e r y be
SLDLPT = ' EHT' ( DOCTOR)
Canonioal torn of 92 from substitutor module uning Fragment
rule-base be
BY applying transformation heuristic FTHl and FTH4 , the
above query t r a r ~ rn:? into
First two operands of the union operation of Q2h reuult in
empty relations because of the oontrad ict on in qua1 t f l o a t ions.
Therefore by applying FTH6 , Q2b traneformr into fragmented form
DOCTOR33 DEFT=' ENT' A N D 09
whioh is equivalent to St ( DOCTOR33) , . . . . (020 ) DEPT = ' ENT'
Consider another query
J w u Y , D N u Y ( DOCTOR JN PATIENT) . . . . , , ,(Q3)
Canonical form of Q3 be
PJ ( DOCTORllQi UN DOCTOR22a2 UN DOCTOR30Q1) JN PNUM,DNUM DNUM=DNUM
( PAT I ENT 1 4Q5 JN ( PATIENT21a4 AND al UN PAT1 EN'T32Q4 at
UN PATIENT43 1 ) , d ,,..(Q3a) 04 AND a8
From FTH2 and FTH5, join operation over PATIEN'l'l4 results in
empty, since it does not contain the join attribute DNUM.
Query Q3a transforms into
J N ( PATIENT21Qq A N D Oi IJN PATIENT 3 2,, AND Q2 DNUM=DNUM
UN PATIENT43Q4 Am as 1 ) . , . , . , ( Q 3 b )
Using heur i ,,. PTHZ, F . 1 1 ~ 7 1 r I FTH7, query Q3b re8IJltl into
930 having only three Joins . All remaining joina result. into
empty relation8 because of the contradiction in qualification of
operands
J p N u ~ , ~ ~ ~ ~ ( ( DOCTOR1 lQi J N PATIENT21
04 AND Q1 1 UN
( DOCTOR22 J N PATIENT32 0 2 Q4 AND Q2 1 UN
DOCTOR33 JN PATIENT43 4 3 1 , , . , . ( Q ~ C )
Q4 AND Q3
Thie can be tranelated into final fragmented form Q3d u ~ i n g
P J PNUM,DNUM
( DOCTOR1 lQi J N PATIENT21Q4 AND 0i UN
PJ ( DOCTOR22 JN PATIENT32Q4 AND (11 ) \)N PNUb4,DNUM 4 2
P J ( DOCTOR33 JN PATIENT43 ,,,,,(Q3d) PNUM,DNUM QS 9 4 A N D 9 3
Let us consider another query
P J PNUM,AMQUNT-DUE
( PAT I EN? 1
Canonical form of 94 is
P J ( PATIENT14 Q5
J N PNUM,AMOUNT-DUE PNUM = PNVM
(PATI ENT2 lai and UN PATIENT32 UN PAT I EN'I'4 nQ3 and ) ) Qz and a4
In aecond operand of join operation, the required projected
attribute AMOUNT-DUE in not available , Both attributes are
available in first operand . Therefore from h e u r i s t , i c S FTH2 and
FTHS , Q4a is tranalated into fragmented form as
This fragment trannformation module f i n d s a rule which i u
u ~ e f u l to translate the query. These tran~formationr rrplace the
relevant subtrees with corresponding trannformed node.. A1 1
applicable transformation8 are aLorrd in art FR-SET. The
Tranaformation Selection module determlnro which tranmformation6
o f thia list FR-SET are to be used to produce learnt coat query.
4.3.1- 3. TRANSFORMATION SELECTION MODULEt
Ao in came of rulebased conventional query optimizer here
also the fragment transformatione which are applicable to query
are inc luded in list ,FR-SET, A ~ e t of a p p l i c a b l e t.r~n~format.ione
are created to each query. This module of procegsor ham to elect
the subset of t,ran~format ions to produce an opt. imnl fragmented
queryl A successive refinement approach to t,hc nelection proce~m
i s adopted which is similar to that described by Graefe 1401 for
selection of algebraic transformations in conventiunal query
optimization, Transformations are selected in order of greateat
expected cost eavlngs. The expected coet eavirrg ia an estimation
of saving that is resulted while applying a transformation rule.
For example, while applying FTHl and FTH4 heurietics ,nome
relations result in empty because of contradiction in
qua1 i f ication, Thone empty relation6 are ignored 60 that
prooeaaing coat n v r l 1 as cot v l ~ ( nt , ion cost ( i f i t is non-local
operation) can be saved, Cost saving ~at.imat ion for each
transformation can be evaluated using cnnt formulna ~ i v c n in
section 2 . 1 . . by calculating coat of query plan before application
o f tranaformation and cost of query after transformation heurietic.
Application of a selected tran~format ton nn t h e qurry reRults
in a new query tree, I f t h i s new query contain# a mlnglc
operation on a single fragment as in caRe of q u e r y 94b , then the
translation process end^. Otherwise , the new query t.rrr 18
examined again for additional t.ransformat inn^ t ,ha t may be
appl icable and these additional transformat. I ~ I I are add~d t.o
FR-SET, This process, i a cont irrrlsd unt i 1 no morc tranafurmat ion^
are selected from FR-SET, As shown in f~gurc? 4 , 5 , t.hn l'rnn.slat.~on
process is organized aa feedback l o o p , wh~ch term1 nates when t h e r e
are no more transforrnat.inn~ to be performed otr quf'ry. The
transformation selection process ~imulatcs R 1 1 1 1 1 climbing
technique for finding minimuml
4.3.2. ALGEBRAIC-SEMANf IC QUERY OPT1 MI B R t
Thia i m similar to rule-based opt,imizer given by Graefe 1401 .
Architecture of this optimizer i n given in Figurn 4 , 6 . T h i e i e
divided into a series of moduleno Input to sem~ntic/algebraic
transformations rr! , t f rr + ( . d query tree renulted from
translation processor. This module la integration o f aemantie and
conventions1 optimizere.
S B M A N T I C / A L O k B R h I C
T R A N S F O R M A T I O N
QUERY .--
P U E P Y PLAN ISELECT ION QUERY A N S V L I
LC, J 1 Figure 4 , 6 , Algebraic/Semantic Query Optimizer
Algebraic transformation and semantic traneformations are
identical and are eimilar to fragment traneformations, But while
applying semantic transformation proposed rule8 I 9 8 1 are generated
and are sent to rule-match module, This module finds a match from
rule set. I f a matching rule exist, the corresponding
tranmformstion is added to set OPEN otherwise the tranaformntion
i a ignored.
When a query enters the optimizer, all a lgebra ic
transformation8 are tested, Applicable transformations will be
added directly to OPEN, Semantlo tranaformntionm are te8t.d and
then sent to matching module, I f a match oocure then only
tranalormation i r ndded to sp' OPEN as in 1961 . Therefore the
reeulting trart~' on s f I contain both algebraio and
aemantic transformations. Either t y p r of transformation Is
selected form OPEN based on emtimated cant Raving, and spplled to
query tree as in previaua aection. As chnngem are made t.o the
tree representation, additional semantir/algebraic tranaformatlons
may become applicable. Those are again added to OPEN, end i m
sent to selection module, There i e feedback from ~ r l e c t l o n module
to algebraic/semantic traneformation module, Thla process is
continued until there ie no tren~formation t h ~ t can apply to query
or result of query can be obtained without aacenaing dnt.abeaeI981,
Therefore the output of aecorld component is eithcar remult nf
the query o r optimal fragmented query plan which atin be evaluated
using any simple distributed query processing algorithm,
In both componente of the rule-haaed optimizer, the problem
i e when and how the heuristic traneform~tions arc applied t o t.he
query, Each heuristic i e aseigned a phase or R R ~ . of phaeeu
defining when the heuristic is active. For exnmple fragment
transformation that pushes Releotion downwards t.o execut.e on ell
fragments of global relation can be applied In a l l phaaes. Rut
Join distribution heuristic and onrresponding qualified relational
algebraic rules can be applied only when join over union phase
occurs. Such ordering of tran~formations helps reduce the
transformation ru t ~ ~ t must l p ~ ied to query st any given
time. The uae of phases can also help to order trnnaformationa ao
t h a t early change. made to the query will produce new querien, in
which other transformatian can be applied, Tran~form~tiane which
are active in a g i v e n phase are a p p l i e d to the query. Suoh
ordering of transformation^ oocurs in EXODUS I121 optimizrr,
Another way to control the application o f heuri~ti(?n in t o
use syetem performance statistics, That im, uy~t.em oan maintain,
information that estimate8 t h e probability o f a heurimtic
transformation to be worthy. lleur i ~t ICR whoue eat; imate
i e sufficiently high would be tried, A simi l n r c:ontrol was
demonstrated in 1471; where n list of hcurlstio/aubquery paira
were ordered according to est.imated cost savirrga. Au pairs are
aelected, the heuristic i s applied to t h e ~ubquery, and any
resulting transformat ion is ueed ta change t,hs query, Th i a
process is continued until further selection of these paire i8 not
useful to query optimization.
In t h i a ahapter a brief survey on artificial intelligence and
expert sys tem is presented, A rule-baeed approach for query
proces8 ing in d i 3 t r i huted dat,ahnse i u propoged, Archi teoture of
rule-based appri for diaf ' 1 query prnces~ing i a pre~ented
and functioning of different modulee are explain~d using eimple
queries as examples.