Research ArticleDistributed Query Plan Generation UsingMultiobjective Genetic Algorithm
Shina Panicker1 and T V Vijay Kumar2
1 SFIO-NIC Division National Informatics Center Ministry of Information Technology New Delhi 110003 India2 School of Computer and Systems Sciences Jawaharlal Nehru University New Delhi 110067 India
Correspondence should be addressed to T V Vijay Kumar tvvijaykumarhotmailcom
Received 23 August 2013 Accepted 23 January 2014 Published 14 May 2014
Academic Editors L D S Coelho Y Jiang W Su and G A Trunfio
Copyright copy 2014 S Panicker and T V Vijay Kumar This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited
A distributed query processing strategy which is a key performance determinant in accessing distributed databases aims tominimize the total query processing cost One way to achieve this is by generating efficient distributed query plans that involvefewer sites for processing a query In the case of distributed relational databases the number of possible query plans increasesexponentially with respect to the number of relations accessed by the query and the number of sites where these relations resideConsequently computing optimal distributed query plans becomes a complex problem This distributed query plan generation(DQPG) problem has already been addressed using single objective genetic algorithm where the objective is to minimize the totalquery processing cost comprising the local processing cost (LPC) and the site-to-site communication cost (CC) In this paperthis DQPG problem is formulated and solved as a biobjective optimization problem with the two objectives being minimize totalLPC and minimize total CC These objectives are simultaneously optimized using a multiobjective genetic algorithm NSGA-IIExperimental comparison of the proposed NSGA-II based DQPG algorithm with the single objective genetic algorithm shows thatthe former performs comparatively better and converges quickly towards optimal solutions for an observed crossover andmutationprobability
1 Introduction
Advancement in technology has made it possible today togather timely and effective information from vast sourcesof data (sites) distributed geographically across a networkThe users at local sites can work independently as well ascommunicate with other sites to retrieve data for answeringglobal queries Such a setup is referred to as a DistributedDatabase System (DDS) [1 2] Query posed on a DDS isgenerally decomposed into subqueries which are processedat the respective local sites where the data resides beforebeing transmitted to another site for cumulative processingof distributed data fragments At the user end an integratedresult is displayed A distributed query processing strategyaims to minimize the overall cost of query processing insuch systems [3] The cost of query processing in a DDScomprises of two costs the local processing cost and the site-to-site communication or the transmission cost of relation
fragmentsThe total cost incurred in processing a distributedquery can thus be taken as the sum of the local processingcost at the individual participating sites and the cost ofdata communication among these sites Local processing costcomprises of the cost of join operation on relations accessedby the user query and the communication cost is proportionalto the size of relation fragments being transmitted amongsites and its cost of transmission These costs need to beminimized in order to minimize the total query processingcost
In todayrsquos scenario with a multifold increase in the size ofDDS the communication cost asserts a major impact on theoverall cost of query processing The cost incurred in com-municating data through a congested network path or thecommunication of large data units between sites with highercommunication costs can highly influence the cost of queryprocessing and thus the sequence of sites through which thedata fragments get processed has a significant impact on
Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 628471 17 pageshttpdxdoiorg1011552014628471
2 The Scientific World Journal
the overall query processing cost It thus also plays a key rolein determining the overall performance of a DDS There canbe a number of possible ways to process and communicaterelation fragments involved in the query A distributed queryprocessing strategy evaluates all possible sequence of sitescorresponding to relations accessed in the query referredto as a query plan and determines the most optimal queryplan that minimizes the total cost that is local processing(CPU IO) cost and communication cost [5ndash11]The numberof possible query plans grows at least exponentially withthe increase in number of relations accessed by the query[12 13]This number increases further if relations accessed bythe query are replicated across multiple sites Performing anexhaustive search on all possible combinations of query plansis not feasible due to a vast search space Therefore in largeDDS devising a query processing strategy that optimizes thetotal query processing cost is shown to be a combinatorialoptimization problem [10]
Over the last three decades many algorithms and tech-niques have been devised to solve the class of combinatorialoptimization problems Initially the rigorous mathemati-cal and search based techniques like simulated annealingrandom search algorithms dynamic programming and soforth were used to solve such problems which thoughworked well with moderate sized problems on cost heuristiccould not succeed with complex multiobjective problemsThese mechanisms suffered from a drawback at certaininstances where they converged to local optima withoutexploring the entire search space [8 13 14] However inthe last two decades evolutionary techniques have gainedimmense popularity due to their applicability in solving thesecomplex scientific and engineering optimization problemsThese algorithms are inspired by the Darwinian evolutionthat accentuates the concept of ldquoSurvival of the Fittestrdquo [15]It is thus metaphorical to the natural social behavior andbiological evolution of species The evolutionary techniquesare now proved to be themost proficientmethod of choice forsolving such problems Genetic algorithm based techniqueswhich belong to the class of evolutionary algorithms havealso been widely used in solving complex real life science andengineering problemsThe strength of GA as a metaheuristiccomes from its ability to combine the good features fromseveral solutions to create new and better solutions [16 17]over generations
Most real world scientific and engineering problemshave often conflicting and competing objectives that needto be optimized The evolutionary strategies are proved tobe best suited for this class of problems as they can simul-taneously optimize the different objectives and find efficienttradeoffs unlike the classic techniques where the objectiveswere separately optimized and weighed based on the priorknowledge about the problem in hand The first pioneeringstudy on multiobjective evolutionary optimization came outin mideighties [18] In subsequent years several differentevolutionary algorithms (VEGA [19] MOGA [20] NPGA[21] NSGA [22] NSGA-II [4] SPEA [23] SPEA-II [19] PAES[24] PESA [25]) have successfully been implemented to solvethe classic optimization problems for example the singlesource shortest path problem [26] the all-pairs shortest path
problem [27] the multiobjective shortest path problem [28]the travelling salesman problem [29] the knapsack problem[30] and so forth Recently new evolutionary techniques forexample particle swarm optimization [31] artificial immunesystems [32] frog leaping algorithm [33] ant colony opti-mization [34] and so forth have been successfully appliedto the multiobjective optimization paradigm
This paper addresses the distributed query plan gen-eration (DQPG) problem given in [3] This problem isbased on a heuristic that favors query plans involving lessnumber of sites participating to retrieve the results Furtherquery plans involving smaller relations transmitted over lesscostly communication channels would incur less communi-cation costs and are thus favored over others Query plansgenerated based on this heuristic would result in efficientquery processing This DQPG problem was formulated andsolved as a single objective optimization problem in [3]Since this DQPG heuristic comprises minimization of boththe local processing cost and the communication cost anattempt has been made in this paper to minimize these costssimultaneouslyThat is theDQPGproblem is formulated as abiobjective optimization problem comprising two objectivesnamely minimization of the total local processing cost andminimization of the total communication cost In this paperthis problem has been solved using themultiobjective geneticalgorithm NSGA-II (nondominated sorting genetic algo-rithm) [4] The proposed NSGA-II based DQPG algorithmattempts to simultaneously minimize the two objectives withthe aim of achieving an acceptable tradeoff amongst them Itis shown that the optimization of total query processing costusing the proposed algorithm gives considerable improve-ment with respect to the time taken to converge and thequality of solutions with respect to total query processingcost when compared to the single objective GA based DQPGalgorithm given in [3]
This paper is organized as follows Section 2 discussesthe DQPG problem and its solution using the simple geneticalgorithm (SGA) given in [3] Section 3 discusses DQPGusing the multiobjective genetic algorithm An exampleillustrating the use of the proposed NSGA-II based DQPGalgorithm for generating optimal query plans for a distributedquery is given in Section 4The experimental results are givenin Section 5 Section 6 is the conclusion
2 DQPG Using SGA
This paper addresses the DQPG problem given in [3] solvedusing SGATheDQPGproblem is discussed next followed bya brief example describing the underlying methodology
21 The DQPG Problem Query plan generation is a keydeterminant for the efficient processing of a distributed queryThis necessitates devising a query plan generation strategythat would result in efficient query processing This strategywould require minimizing the total cost of query processingThe total cost incurred comprises the joint cost that is the costincurred in processing the query locally at the individual sitesand the cost of communicating the relation fragments among
The Scientific World Journal 3
the sites A distributed query processing strategy is given in[3] which aims to minimize the total query processing cost(TC) given below [3]
TC =
119898
sum
119894=1
LPC119894times 119886119894+
119894=119898minus1 119895=119898
sum
119894=1 119895=119894+1
CC119894119895times 119887119894 (1)
where LPC119894is the local processing cost per byte at site 119894 CC
119894119895
is the communication cost per byte between sites 119894 and 119895119886119894is the bytes to be processed at site 119894 119887
119894is the bytes to be
communicated from site 119894 and119898 is the total number of sitesFor each relation 119877
119905 Card(119877
119905) represents its cardinality and
Size(119877119905) represents the size of a single tuple in bytes At each
site the relations are integrated on common attributes usingthe equijoin operator to arrive at a single relation [3]
For relations 119877119905 with cardinality Card(119877
119905) and 119877
119904with
cardinality Card(119877119904) at site 119894 the cardinality Card
119894of the
resultant relation 119877119894is given as [3]
Card119894=
Card (119877119905) times Card (119877
119904)
Dist119905119904timesmin (Card (119877
119905) Card (119877
119904))
(2)
where Dist119905119904is the number of distinct tuples in the smaller
relation among 119877119905and 119877
119904
The size of the resultant relation119877119894at site 119894 is given as [1 3]
Size119894= Size (119877
119905) + Size (119877
119904) (3)
For a given query plan the communication between sitesoccurs in the order starting from the site having a relationwith lower cardinality to the site having a relation withhigher cardinality [3] The communication cost CC
119894119895and
local processing cost LPC119894are known a priori
The number of bytes to be processed locally at site 119896 isgiven by 119886
119896[3]
119886119896= Card
119894times Size
119894 (4)
The number of bytes to be communicated from site 119895 tosite 119896 is given by 119887
119895[3]
119887119895= Card
119895times Size
119895 (5)
Distributed query plans based on the above heuristic isgenerated using simple GA (SGA) in [3] This SGA basedDQPG as given in [3] is discussed next
22 SGA Based DQPG As discussed above it is a very com-plex task to generate efficient query plans from among a largeset of possible query plans An SGA based DQPG strategybased on the heuristic defined above is given in [3] whichaims to minimize the total cost of query processing (TC)indicating the fitness of a particular solution as compared toothers in the population The algorithm considers relationsaccessed by the query crossover and mutation probabilityand the prespecified number of generations (119866) as inputand produces the Top-119870 query plans as output First thealgorithm randomly generates an initial population of validquery plans (chromosomes) where the size of a query plan
is equal to the number of relations accessed by the queryEach gene in a chromosome represents a relation and theordering of relations in a chromosome is in increasing orderof their cardinality The value of a gene is the site where thecorresponding relation resides As an example for a queryaccessing four relations (1198771 1198772 1198773 and 1198774) arranged in theincreasing order of cardinalities one of the encoding schemesfor the chromosome representation can be (1 1 4 3) implyingthat 1198771 and 1198772 are in site 1 1198773 is in site 4 and 1198774 is insite 3 The fitness (TC) value is computed for each of thequery plans and thereafter the query plans are selected forcrossover using the binary tournament selection technique[35] These selected query plans undergo random single-point crossover [15 36] with probability 119875
119888 and mutation
[15 36] with probability 119875119898 The resultant new population
replaces the old population and the above process is repeatedfor the prespecified number of generations 119866 Thereafter theTop-119870 query plans are produced as output In this paperthe above single objective DQPG problem is formulated andsolved as amultiobjectiveDQPGproblem aswill be discussednext
3 DQPG Using MultiobjectiveGenetic Algorithm
In this paper the single objective DQPG problem discussedabove is formulated as a biobjective DQPG problem Thisformulation is given next
31 Multiobjective DQPG Problem Formulation In the GAbased DQPG algorithm given in [3] there is a single objec-tive that is Minimize TC It can be observed that TC com-prises two costs namely the local processing cost incurredat participating sites that is total processing cost (TPC)and communication cost between the participating sites thatis total communication cost (TCC) Since minimizing TCwould require minimizing TPC and minimizing TCC thissingle objective (Minimize TC)DQPGproblem is formulatedas a biobjective DQPG problem comprising two objectives asMinimize TPC andMinimize TCC Consider
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(6)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894is
the local processing cost per byte at site 119894 119887119894is the bytes to be
communicated from site 119894 and 119886119894is the bytes to be processed
at site 119894 CC119894119895 LPC
119894 119887119894 and 119886
119894are as discussed in Section 21
If a site contains a single relation its LPC is consideredzero TCC and TPC need to be minimized simultaneously toachieve an acceptable tradeoff
4 The Scientific World Journal
The abovemultiobjectiveDQPGproblemhas been solvedusing the multiobjective genetic algorithm which is dis-cussed next
32 Multiobjective Genetic Algorithms Conceptualization ofmultiobjective problems using veridical models has a greatresemblance to many real world engineering and designproblems that involve more than one coextensive and oftencompeting objectives that is maximize profit maximizethroughput minimize cost minimize response time and soforth In such a scenario no single solution can be termedas optimal as in the case of single objective optimizationproblems but rather a set of alternative solutions can bevisualized as a tradeoff between the different objectives underconsideration This set of solutions is regarded superior toothers in the search space as no other recordedavailablesolution can better optimize all the objectives consideredtogether [37ndash39]
Multiobjective optimization approaches can be broadlyclassified into three categories [37] The approaches in thefirst two categories can be termed as the classical optimiza-tion approaches which combine all objectives into a singlecomposite function using some combination of arithmeticoperators ormove all but one objective into the constraint setThe approaches in the first category have limitations in regardto appropriate selection of weights and designing functions inaccordance to the problem It wouldmandate the user to havea priori knowledge of the behavior of each objective functionto some extent for providing the range of values to objectivesso that none of them dominate the others which is not alwayspossible [17] This approach is generally denominated asaggregating functions and it has been implemented at severaloccasionswith relative success in situationswhere behavior ofthe objective function ismore or lesswell-known Someof theaggregating functions include the weighted sum approachgoal programming 120576-constraintmethod and so forth [40] Inthe second approach moving the objectives into a constraintset requires that the boundary values for each of the objectivesbe known a priori which is almost impossible In either of thetwo cases the optimization method returns a single solutionrather than a set of solutions giving possible tradeoffs andtherefore the quality of solution in these approaches greatlydepends upon the correct problem formulation If feasiblethese would be the most efficient and simplest approacheswhich would give atleast sub optimal results in most cases
The third approach overcomes the problems faced inthe classical optimization approaches and emphasizes thedevelopment of alternative techniques based on exploringthe complete set of nondominated solutions and therebyenabling the decision maker to choose among the differentalternatives This set of solutions is referred to as the Paretooptimal set [13] A Pareto optimal set can be formally definedas a set of solutions that are nondominated with respectto each other that is replacing one solution with anotherwithin the Pareto optimal set will invariably lead to a lossto one objective against a gain obtained in another objective[41] Pareto optimal sets can have varied sizes but usuallythe size increases with increase in the number of objectives
[37 40] They are more preferred over single solutionsas they closely resemble real world problems where thedecision maker makes a decision based on tradeoffs betweenmultiple objectives A number of techniques were formulatedto generate the Pareto optimal set for example simulatedannealing [14] Tabu search [42] ant colony optimization[34] and so forth The problem with these algorithms wasthatmost often they get struck at local optima and thus renderit infeasible to venture out for identifying new tradeoffsEvolutionary algorithms such as GA on the other hand seemto be especially suited for this task as they enable parallelexploration of different areas in the search space eventuallyexploiting the solutions attained using operators such ascrossover and mutation [13] It would enable determiningmore members of a Pareto optimal set in a single run insteadof a series of runs required in other blind search strategiesAlso the evolutionary algorithms require very little a prioriknowledge of the problem at hand and therefore are lesssusceptible to the typical shape and continuity of the ParetofrontThe Pareto front can be defined as the points that lie onthe boundary of the Pareto optimal region These algorithmsthus avoid convergence to a suboptimal solution [43]
Mathematically a multiobjective optimization problemwith 119898 decision variables and 119899 objectives can be definedwithout any loss of generality as amaximization orminimiza-tion problem given by [13 38]
minimize = 119891 (119909) = (1198911 (119909) 1198912 (119909) 119891119899 (119909))
where119909 = (1199091 1199092 119909119898) isin 119883
119910 = (1199101 1199102 119910119899) isin 119884
(7)
Here 119909 is the decision vector 119883 refers to the parameterspace 119910 is the objective vector and 119884 defines the objectivespace These objectives may be conflicting in nature that isimprovement in one may lead to deterioration in anotherSo it may become impossible to optimize all objectivessimultaneously in a single solution Instead the best tradeoffsolution would be of interest to a decision maker Thesesolutions form a Pareto optimal set which was initially coinedby Edgeworth and Pareto and is formally defined as [13 38]
ldquoA decision vector 119886 is said to be Pareto optimal if andonly if 119886 is nondominated regarding 119883 A decision vector 119886is said to be nondominated regarding a set 1198831015840 sube 119883 if andonly if there is no vector in 119883
1015840 which dominates 119886 Formallyit can be defined aslt1198861015840 isin 119883
1015840
119886
1015840
≺ 119886rdquoAlso a decision vector 119886 isin 119883 is said to dominate a
decision vector 119887 isin 119883 (also written as 119886 ≺ 119887) if and onlyif
forall119894 isin 1 2 119899 119891119894(119886) le 119891
119894(119887)
forall119895 isin 1 2 119899 119891119895(119886) le 119891
119895(119887)
(8)
Several multiobjective algorithms exist in the literature[4 18ndash25 37 40 41 44 45] of whichGA basedmultiobjectiveoptimization algorithms have been widely used for solvingmultiobjective optimization problems In this paper NSGA-II has been used to solve the DQPG problem NSGA-II willbe discussed next
The Scientific World Journal 5
f(x
2)
f(x1)
minus 1
+ 1
Figure 1 Crowding distance for solution ldquoVrdquo [4]
321 NSGA-II The basis of NSGA-II [4] lies in the non-dominated sorting genetic algorithm (NSGA) introduced bySrinivas and Deb [22] As the name suggests NSGA usesnondominated ranking for each individual in the populationand assigns them accordingly into nondominated frontsTheindividuals in the first front or the nondominated individualsare then assigned large dummy fitness values All individualsin the front shared this fitness value based on a sharing func-tion Next the individuals in the second nondominated frontare considered and similarly assigned a dummy fitness lowerthan the fitness assigned in the previous front This processcontinues till the entire population is classified into frontsSince the solutions in the first front have themaximumfitnessvalue their chances of selection increase and eventually morecopies of such solutions get passed on to the next generationHowever NSGA suffered from some drawbacks such as highcomputational complexity119874(119898119873
3
) nonelitist approach andthe requirement of specifying a shared parameter [4] Theselimitations were addressed in NSGA-II proposed by Deb etal [4] as an improved version of NSGA [22] It alleviatesthe drawbacks in NSGA by reducing the computationalcomplexity to 119874(119898119873
2
) Further it uses a parameter-lesssharing approach by using a crowding distance measurefor selection The crowding distance is an estimate of thedensity of solutions surrounding a particular solution in theobjective space In Figure 1 the crowding distance of solutionrepresented as point V is computed as the average distancebetween the two closest solutions represented as points V minus 1
and V + 1 on either side of the points V along each of theobjectives 119891(1199091) and 119891(1199092)
NSGA-II uses a crowded-comparison operator for selec-tion which takes into account both the nondomination rankof a query plan in the population and its crowding distanceThe nondominated solutions are preferred over dominatedsolutions and between two solutions having the same ranka solution that resides in the less crowded region is preferredthat is a solution for which the crowding distance is higherTheNSGA-II does not use any externalmemory but it ensureselitism by combining the best parents with the best offspringobtained [19] In this paper an NSGA-II based multiobjective
DQPG algorithm is used to compute optimal query plansfor a given distributed query This algorithm is discussednext
33 NSGA-II Based DQPG Algorithm The proposed NSGA-II based DQPG algorithm takes the relations given in theFROM clause of the distributed query as input It arrangesthese relations in increasing order of their cardinalities Itthen generates a fixed set of feasible query plans (chromo-somes) based on the possible combinations of sites in whichthese relations are residing Each gene in a chromosomerepresents a relation and is arranged in increasing orderof the corresponding relationrsquos cardinality The value of agene represents the site in which the corresponding relationresides For example suppose that a query posed by the userhas 4 relations (1198771 1198772 1198773 and 1198774) arranged in ascendingorderThe relation 1198771 is stored in sites 1198781 and 1198783 1198772 is storedin 11987811198773 is stored in 1198781 and 1198782 and1198774 is stored in 1198781Then theinitial population of feasible query plans (chromosomes) canbe (1 1 1 1) (3 1 1 1) (3 1 2 1) and (1 1 2 1) This definesthe encoding scheme for the given problem The proposedDQPG algorithm based on NSGA-II is given in Algorithm 1The steps involved in this algorithm are discussed as follows
Step 1 (Initialize the Population [4 46]) A random popula-tion of query plans is generated as per the encoding schemediscussed above
Step 2 (EvaluateQuery Plans on theObjective Functions)For each of the query plans in the population the TCC andTPC values are computed as given below
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(9)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894
is the local processing cost per byte at site 119894 119887119894is the bytes
to be communicated from site 119894 and 119886119894is the bytes to be
processed at site 119894 The procedure to compute CC119894119895 LPC
119894
119887119894 and 119886
119894is given in Section 21 If a site contains a single
relation its LPC is considered zero TCC and TPC needto be minimized simultaneously to achieve an acceptabletradeoff
Step 3 (Perform Nondominated Sort [4 46]) On the givenpopulation a fast nondominated sorting is performed in thefollowing manner
Twoobjective functions are consideredThefirst objectiveis to minimize the total processing cost (TPC) and thesecond objective is to minimize the total communicationcost (TCC) NSGA-II attempts to find a tradeoff betweenthese two objectives that can result in minimum total queryprocessing cost (TC)
6 The Scientific World Journal
Input 119877119894 Relations participating in the query 119875
119888 Probability of crossover
119875119898 Probability of mutation 119866 Pre-defined number of generations 119875size Population Size
Output Top-n query plansMethodInitialize a random parent population of query plans PPwhere chromosome length ldquolenrdquo is the number of relations accessed in the query and the gene at the 119896th position in thechromosome represents the site of the 119896th relationWHILE generation le 119866 DOStep 1 Evaluate each query plan in PP on the following objective functions
1198981 Minimize TCC =
119894=119906minus1119895=119906
sum
119894=1119895=119894+1
CC119894119895times 119887119894
1198982 Minimize TPC =
119906
sum
119894=1
LPC119894times 119886119894
where 119906 is the number of sites accessed by the query plan in ascending order of cardinality per site CC119894119895is the communication
cost per byte between sites 119894 and 119895 LPC119894is the local processing cost per byte at site 119894 119887
119894is the bytes to be communicated from
site 119894 and 119886119894is the bytes to be processed at site 119894
Step 2 Perform Non-Dominated (ND) Sort on PP for ldquo1198981rdquo and ldquo119898
2rdquo separately and place each query plan (QP)
into corresponding ND fronts ldquo119894rdquo and sort the QPs within each ldquo
119894rdquo
Step 3 Evaluate Crowding Distance Function 119868(119889) for each objective functionAssign 119868(119889) = infin for smallest and highest values in each front ldquo
119894rdquo
For the remaining QPs 119868(119889) is calculated as
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
where 119868(119896)119898is the value of119898th objective function of 119896th query plan in Front
119894and 119891
max119898
and 119891
min119898
are themaximum and minimum values obtained for the objective function119898Step 4 Perform Selection from PP using binary tournament selection using crowded comparison operator (≺
119899)
Step 5 Perform random single point crossover on selected chromosomes with crossover probability 119875119888
Step 6 Apply mutation on resulting population with mutation probability 119875119898
Let the resulting child population be CPStep 7 Append CP into PP and let the resulting intermediate population be IPStep 8 Repeat Step 1 and Step 2 for population IPStep 9 Form the population PP for the next generation by picking query plans Front-wise from IP till thepopulation size = 119875sizeStep 10 Increment Generation by 1ENDDOReturn Top-119870 Query Plans from population PP
Algorithm 1 NSGA-II based DQPG algorithm
In order to performanondominated sort each query planis compared with every other query plan in the population tofind if it is dominated For each query plan ldquo119894rdquo the followingtwo entities are considered
(i) 119899119894 The number of query plans that dominate the
query plan 119894(ii) 119878119894 The set of query plans that query plan 119894 dominates
All query plans that have 119899119894= 0 are added to the set
1
Set = 1where is called the current front For each
element 119909 in the current front visit each member 119895 in the set119878119909and reduce the count of 119899
119895by 1 Now if 119899
119895gets reduced to
zero for some 119895 add it to the set 2 After evaluating all the
members of in a similar manner set = 2 This process
continues till all the query plans are assigned some frontThe fast nondominated sorting procedure takes the currentpopulation as input and produces a list of nondominatedfronts
119894as output
Step 4 (Density Estimation Using Crowding Distance [446]) After the nondominated sort the crowding distanceis computed for each query plan in
119894 Crowding distance
[4] is an estimate of the density of solutions surrounding aparticular solution point in the population It is defined asthe average distance of the two closest points on either sidesof the given point along each of the objectives The crowdingdistance 119868(119889119894119904119905119886119899119888119890) is computed in the following manner[4 46]
For each front 119894 let 119873 be the number of query plans
in front 119894 Initially the crowding distance for each query
plan in the front 119894is zero That is 119868(119889
119895) = 0 119895 =
1 119873 Next for each objective function the query plansin the front
119894are sorted based on their value of TPC (ie
the first objective function) and similarly also with respectto TCC (ie the second objective function) and placed in119868(119889119894119904119905119886119899119888119890) The query plans having the smallest and thehighest 119868(119889119894119904119905119886119899119888119890) values in both sets are assigned an infinite
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 The Scientific World Journal
the overall query processing cost It thus also plays a key rolein determining the overall performance of a DDS There canbe a number of possible ways to process and communicaterelation fragments involved in the query A distributed queryprocessing strategy evaluates all possible sequence of sitescorresponding to relations accessed in the query referredto as a query plan and determines the most optimal queryplan that minimizes the total cost that is local processing(CPU IO) cost and communication cost [5ndash11]The numberof possible query plans grows at least exponentially withthe increase in number of relations accessed by the query[12 13]This number increases further if relations accessed bythe query are replicated across multiple sites Performing anexhaustive search on all possible combinations of query plansis not feasible due to a vast search space Therefore in largeDDS devising a query processing strategy that optimizes thetotal query processing cost is shown to be a combinatorialoptimization problem [10]
Over the last three decades many algorithms and tech-niques have been devised to solve the class of combinatorialoptimization problems Initially the rigorous mathemati-cal and search based techniques like simulated annealingrandom search algorithms dynamic programming and soforth were used to solve such problems which thoughworked well with moderate sized problems on cost heuristiccould not succeed with complex multiobjective problemsThese mechanisms suffered from a drawback at certaininstances where they converged to local optima withoutexploring the entire search space [8 13 14] However inthe last two decades evolutionary techniques have gainedimmense popularity due to their applicability in solving thesecomplex scientific and engineering optimization problemsThese algorithms are inspired by the Darwinian evolutionthat accentuates the concept of ldquoSurvival of the Fittestrdquo [15]It is thus metaphorical to the natural social behavior andbiological evolution of species The evolutionary techniquesare now proved to be themost proficientmethod of choice forsolving such problems Genetic algorithm based techniqueswhich belong to the class of evolutionary algorithms havealso been widely used in solving complex real life science andengineering problemsThe strength of GA as a metaheuristiccomes from its ability to combine the good features fromseveral solutions to create new and better solutions [16 17]over generations
Most real world scientific and engineering problemshave often conflicting and competing objectives that needto be optimized The evolutionary strategies are proved tobe best suited for this class of problems as they can simul-taneously optimize the different objectives and find efficienttradeoffs unlike the classic techniques where the objectiveswere separately optimized and weighed based on the priorknowledge about the problem in hand The first pioneeringstudy on multiobjective evolutionary optimization came outin mideighties [18] In subsequent years several differentevolutionary algorithms (VEGA [19] MOGA [20] NPGA[21] NSGA [22] NSGA-II [4] SPEA [23] SPEA-II [19] PAES[24] PESA [25]) have successfully been implemented to solvethe classic optimization problems for example the singlesource shortest path problem [26] the all-pairs shortest path
problem [27] the multiobjective shortest path problem [28]the travelling salesman problem [29] the knapsack problem[30] and so forth Recently new evolutionary techniques forexample particle swarm optimization [31] artificial immunesystems [32] frog leaping algorithm [33] ant colony opti-mization [34] and so forth have been successfully appliedto the multiobjective optimization paradigm
This paper addresses the distributed query plan gen-eration (DQPG) problem given in [3] This problem isbased on a heuristic that favors query plans involving lessnumber of sites participating to retrieve the results Furtherquery plans involving smaller relations transmitted over lesscostly communication channels would incur less communi-cation costs and are thus favored over others Query plansgenerated based on this heuristic would result in efficientquery processing This DQPG problem was formulated andsolved as a single objective optimization problem in [3]Since this DQPG heuristic comprises minimization of boththe local processing cost and the communication cost anattempt has been made in this paper to minimize these costssimultaneouslyThat is theDQPGproblem is formulated as abiobjective optimization problem comprising two objectivesnamely minimization of the total local processing cost andminimization of the total communication cost In this paperthis problem has been solved using themultiobjective geneticalgorithm NSGA-II (nondominated sorting genetic algo-rithm) [4] The proposed NSGA-II based DQPG algorithmattempts to simultaneously minimize the two objectives withthe aim of achieving an acceptable tradeoff amongst them Itis shown that the optimization of total query processing costusing the proposed algorithm gives considerable improve-ment with respect to the time taken to converge and thequality of solutions with respect to total query processingcost when compared to the single objective GA based DQPGalgorithm given in [3]
This paper is organized as follows Section 2 discussesthe DQPG problem and its solution using the simple geneticalgorithm (SGA) given in [3] Section 3 discusses DQPGusing the multiobjective genetic algorithm An exampleillustrating the use of the proposed NSGA-II based DQPGalgorithm for generating optimal query plans for a distributedquery is given in Section 4The experimental results are givenin Section 5 Section 6 is the conclusion
2 DQPG Using SGA
This paper addresses the DQPG problem given in [3] solvedusing SGATheDQPGproblem is discussed next followed bya brief example describing the underlying methodology
21 The DQPG Problem Query plan generation is a keydeterminant for the efficient processing of a distributed queryThis necessitates devising a query plan generation strategythat would result in efficient query processing This strategywould require minimizing the total cost of query processingThe total cost incurred comprises the joint cost that is the costincurred in processing the query locally at the individual sitesand the cost of communicating the relation fragments among
The Scientific World Journal 3
the sites A distributed query processing strategy is given in[3] which aims to minimize the total query processing cost(TC) given below [3]
TC =
119898
sum
119894=1
LPC119894times 119886119894+
119894=119898minus1 119895=119898
sum
119894=1 119895=119894+1
CC119894119895times 119887119894 (1)
where LPC119894is the local processing cost per byte at site 119894 CC
119894119895
is the communication cost per byte between sites 119894 and 119895119886119894is the bytes to be processed at site 119894 119887
119894is the bytes to be
communicated from site 119894 and119898 is the total number of sitesFor each relation 119877
119905 Card(119877
119905) represents its cardinality and
Size(119877119905) represents the size of a single tuple in bytes At each
site the relations are integrated on common attributes usingthe equijoin operator to arrive at a single relation [3]
For relations 119877119905 with cardinality Card(119877
119905) and 119877
119904with
cardinality Card(119877119904) at site 119894 the cardinality Card
119894of the
resultant relation 119877119894is given as [3]
Card119894=
Card (119877119905) times Card (119877
119904)
Dist119905119904timesmin (Card (119877
119905) Card (119877
119904))
(2)
where Dist119905119904is the number of distinct tuples in the smaller
relation among 119877119905and 119877
119904
The size of the resultant relation119877119894at site 119894 is given as [1 3]
Size119894= Size (119877
119905) + Size (119877
119904) (3)
For a given query plan the communication between sitesoccurs in the order starting from the site having a relationwith lower cardinality to the site having a relation withhigher cardinality [3] The communication cost CC
119894119895and
local processing cost LPC119894are known a priori
The number of bytes to be processed locally at site 119896 isgiven by 119886
119896[3]
119886119896= Card
119894times Size
119894 (4)
The number of bytes to be communicated from site 119895 tosite 119896 is given by 119887
119895[3]
119887119895= Card
119895times Size
119895 (5)
Distributed query plans based on the above heuristic isgenerated using simple GA (SGA) in [3] This SGA basedDQPG as given in [3] is discussed next
22 SGA Based DQPG As discussed above it is a very com-plex task to generate efficient query plans from among a largeset of possible query plans An SGA based DQPG strategybased on the heuristic defined above is given in [3] whichaims to minimize the total cost of query processing (TC)indicating the fitness of a particular solution as compared toothers in the population The algorithm considers relationsaccessed by the query crossover and mutation probabilityand the prespecified number of generations (119866) as inputand produces the Top-119870 query plans as output First thealgorithm randomly generates an initial population of validquery plans (chromosomes) where the size of a query plan
is equal to the number of relations accessed by the queryEach gene in a chromosome represents a relation and theordering of relations in a chromosome is in increasing orderof their cardinality The value of a gene is the site where thecorresponding relation resides As an example for a queryaccessing four relations (1198771 1198772 1198773 and 1198774) arranged in theincreasing order of cardinalities one of the encoding schemesfor the chromosome representation can be (1 1 4 3) implyingthat 1198771 and 1198772 are in site 1 1198773 is in site 4 and 1198774 is insite 3 The fitness (TC) value is computed for each of thequery plans and thereafter the query plans are selected forcrossover using the binary tournament selection technique[35] These selected query plans undergo random single-point crossover [15 36] with probability 119875
119888 and mutation
[15 36] with probability 119875119898 The resultant new population
replaces the old population and the above process is repeatedfor the prespecified number of generations 119866 Thereafter theTop-119870 query plans are produced as output In this paperthe above single objective DQPG problem is formulated andsolved as amultiobjectiveDQPGproblem aswill be discussednext
3 DQPG Using MultiobjectiveGenetic Algorithm
In this paper the single objective DQPG problem discussedabove is formulated as a biobjective DQPG problem Thisformulation is given next
31 Multiobjective DQPG Problem Formulation In the GAbased DQPG algorithm given in [3] there is a single objec-tive that is Minimize TC It can be observed that TC com-prises two costs namely the local processing cost incurredat participating sites that is total processing cost (TPC)and communication cost between the participating sites thatis total communication cost (TCC) Since minimizing TCwould require minimizing TPC and minimizing TCC thissingle objective (Minimize TC)DQPGproblem is formulatedas a biobjective DQPG problem comprising two objectives asMinimize TPC andMinimize TCC Consider
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(6)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894is
the local processing cost per byte at site 119894 119887119894is the bytes to be
communicated from site 119894 and 119886119894is the bytes to be processed
at site 119894 CC119894119895 LPC
119894 119887119894 and 119886
119894are as discussed in Section 21
If a site contains a single relation its LPC is consideredzero TCC and TPC need to be minimized simultaneously toachieve an acceptable tradeoff
4 The Scientific World Journal
The abovemultiobjectiveDQPGproblemhas been solvedusing the multiobjective genetic algorithm which is dis-cussed next
32 Multiobjective Genetic Algorithms Conceptualization ofmultiobjective problems using veridical models has a greatresemblance to many real world engineering and designproblems that involve more than one coextensive and oftencompeting objectives that is maximize profit maximizethroughput minimize cost minimize response time and soforth In such a scenario no single solution can be termedas optimal as in the case of single objective optimizationproblems but rather a set of alternative solutions can bevisualized as a tradeoff between the different objectives underconsideration This set of solutions is regarded superior toothers in the search space as no other recordedavailablesolution can better optimize all the objectives consideredtogether [37ndash39]
Multiobjective optimization approaches can be broadlyclassified into three categories [37] The approaches in thefirst two categories can be termed as the classical optimiza-tion approaches which combine all objectives into a singlecomposite function using some combination of arithmeticoperators ormove all but one objective into the constraint setThe approaches in the first category have limitations in regardto appropriate selection of weights and designing functions inaccordance to the problem It wouldmandate the user to havea priori knowledge of the behavior of each objective functionto some extent for providing the range of values to objectivesso that none of them dominate the others which is not alwayspossible [17] This approach is generally denominated asaggregating functions and it has been implemented at severaloccasionswith relative success in situationswhere behavior ofthe objective function ismore or lesswell-known Someof theaggregating functions include the weighted sum approachgoal programming 120576-constraintmethod and so forth [40] Inthe second approach moving the objectives into a constraintset requires that the boundary values for each of the objectivesbe known a priori which is almost impossible In either of thetwo cases the optimization method returns a single solutionrather than a set of solutions giving possible tradeoffs andtherefore the quality of solution in these approaches greatlydepends upon the correct problem formulation If feasiblethese would be the most efficient and simplest approacheswhich would give atleast sub optimal results in most cases
The third approach overcomes the problems faced inthe classical optimization approaches and emphasizes thedevelopment of alternative techniques based on exploringthe complete set of nondominated solutions and therebyenabling the decision maker to choose among the differentalternatives This set of solutions is referred to as the Paretooptimal set [13] A Pareto optimal set can be formally definedas a set of solutions that are nondominated with respectto each other that is replacing one solution with anotherwithin the Pareto optimal set will invariably lead to a lossto one objective against a gain obtained in another objective[41] Pareto optimal sets can have varied sizes but usuallythe size increases with increase in the number of objectives
[37 40] They are more preferred over single solutionsas they closely resemble real world problems where thedecision maker makes a decision based on tradeoffs betweenmultiple objectives A number of techniques were formulatedto generate the Pareto optimal set for example simulatedannealing [14] Tabu search [42] ant colony optimization[34] and so forth The problem with these algorithms wasthatmost often they get struck at local optima and thus renderit infeasible to venture out for identifying new tradeoffsEvolutionary algorithms such as GA on the other hand seemto be especially suited for this task as they enable parallelexploration of different areas in the search space eventuallyexploiting the solutions attained using operators such ascrossover and mutation [13] It would enable determiningmore members of a Pareto optimal set in a single run insteadof a series of runs required in other blind search strategiesAlso the evolutionary algorithms require very little a prioriknowledge of the problem at hand and therefore are lesssusceptible to the typical shape and continuity of the ParetofrontThe Pareto front can be defined as the points that lie onthe boundary of the Pareto optimal region These algorithmsthus avoid convergence to a suboptimal solution [43]
Mathematically a multiobjective optimization problemwith 119898 decision variables and 119899 objectives can be definedwithout any loss of generality as amaximization orminimiza-tion problem given by [13 38]
minimize = 119891 (119909) = (1198911 (119909) 1198912 (119909) 119891119899 (119909))
where119909 = (1199091 1199092 119909119898) isin 119883
119910 = (1199101 1199102 119910119899) isin 119884
(7)
Here 119909 is the decision vector 119883 refers to the parameterspace 119910 is the objective vector and 119884 defines the objectivespace These objectives may be conflicting in nature that isimprovement in one may lead to deterioration in anotherSo it may become impossible to optimize all objectivessimultaneously in a single solution Instead the best tradeoffsolution would be of interest to a decision maker Thesesolutions form a Pareto optimal set which was initially coinedby Edgeworth and Pareto and is formally defined as [13 38]
ldquoA decision vector 119886 is said to be Pareto optimal if andonly if 119886 is nondominated regarding 119883 A decision vector 119886is said to be nondominated regarding a set 1198831015840 sube 119883 if andonly if there is no vector in 119883
1015840 which dominates 119886 Formallyit can be defined aslt1198861015840 isin 119883
1015840
119886
1015840
≺ 119886rdquoAlso a decision vector 119886 isin 119883 is said to dominate a
decision vector 119887 isin 119883 (also written as 119886 ≺ 119887) if and onlyif
forall119894 isin 1 2 119899 119891119894(119886) le 119891
119894(119887)
forall119895 isin 1 2 119899 119891119895(119886) le 119891
119895(119887)
(8)
Several multiobjective algorithms exist in the literature[4 18ndash25 37 40 41 44 45] of whichGA basedmultiobjectiveoptimization algorithms have been widely used for solvingmultiobjective optimization problems In this paper NSGA-II has been used to solve the DQPG problem NSGA-II willbe discussed next
The Scientific World Journal 5
f(x
2)
f(x1)
minus 1
+ 1
Figure 1 Crowding distance for solution ldquoVrdquo [4]
321 NSGA-II The basis of NSGA-II [4] lies in the non-dominated sorting genetic algorithm (NSGA) introduced bySrinivas and Deb [22] As the name suggests NSGA usesnondominated ranking for each individual in the populationand assigns them accordingly into nondominated frontsTheindividuals in the first front or the nondominated individualsare then assigned large dummy fitness values All individualsin the front shared this fitness value based on a sharing func-tion Next the individuals in the second nondominated frontare considered and similarly assigned a dummy fitness lowerthan the fitness assigned in the previous front This processcontinues till the entire population is classified into frontsSince the solutions in the first front have themaximumfitnessvalue their chances of selection increase and eventually morecopies of such solutions get passed on to the next generationHowever NSGA suffered from some drawbacks such as highcomputational complexity119874(119898119873
3
) nonelitist approach andthe requirement of specifying a shared parameter [4] Theselimitations were addressed in NSGA-II proposed by Deb etal [4] as an improved version of NSGA [22] It alleviatesthe drawbacks in NSGA by reducing the computationalcomplexity to 119874(119898119873
2
) Further it uses a parameter-lesssharing approach by using a crowding distance measurefor selection The crowding distance is an estimate of thedensity of solutions surrounding a particular solution in theobjective space In Figure 1 the crowding distance of solutionrepresented as point V is computed as the average distancebetween the two closest solutions represented as points V minus 1
and V + 1 on either side of the points V along each of theobjectives 119891(1199091) and 119891(1199092)
NSGA-II uses a crowded-comparison operator for selec-tion which takes into account both the nondomination rankof a query plan in the population and its crowding distanceThe nondominated solutions are preferred over dominatedsolutions and between two solutions having the same ranka solution that resides in the less crowded region is preferredthat is a solution for which the crowding distance is higherTheNSGA-II does not use any externalmemory but it ensureselitism by combining the best parents with the best offspringobtained [19] In this paper an NSGA-II based multiobjective
DQPG algorithm is used to compute optimal query plansfor a given distributed query This algorithm is discussednext
33 NSGA-II Based DQPG Algorithm The proposed NSGA-II based DQPG algorithm takes the relations given in theFROM clause of the distributed query as input It arrangesthese relations in increasing order of their cardinalities Itthen generates a fixed set of feasible query plans (chromo-somes) based on the possible combinations of sites in whichthese relations are residing Each gene in a chromosomerepresents a relation and is arranged in increasing orderof the corresponding relationrsquos cardinality The value of agene represents the site in which the corresponding relationresides For example suppose that a query posed by the userhas 4 relations (1198771 1198772 1198773 and 1198774) arranged in ascendingorderThe relation 1198771 is stored in sites 1198781 and 1198783 1198772 is storedin 11987811198773 is stored in 1198781 and 1198782 and1198774 is stored in 1198781Then theinitial population of feasible query plans (chromosomes) canbe (1 1 1 1) (3 1 1 1) (3 1 2 1) and (1 1 2 1) This definesthe encoding scheme for the given problem The proposedDQPG algorithm based on NSGA-II is given in Algorithm 1The steps involved in this algorithm are discussed as follows
Step 1 (Initialize the Population [4 46]) A random popula-tion of query plans is generated as per the encoding schemediscussed above
Step 2 (EvaluateQuery Plans on theObjective Functions)For each of the query plans in the population the TCC andTPC values are computed as given below
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(9)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894
is the local processing cost per byte at site 119894 119887119894is the bytes
to be communicated from site 119894 and 119886119894is the bytes to be
processed at site 119894 The procedure to compute CC119894119895 LPC
119894
119887119894 and 119886
119894is given in Section 21 If a site contains a single
relation its LPC is considered zero TCC and TPC needto be minimized simultaneously to achieve an acceptabletradeoff
Step 3 (Perform Nondominated Sort [4 46]) On the givenpopulation a fast nondominated sorting is performed in thefollowing manner
Twoobjective functions are consideredThefirst objectiveis to minimize the total processing cost (TPC) and thesecond objective is to minimize the total communicationcost (TCC) NSGA-II attempts to find a tradeoff betweenthese two objectives that can result in minimum total queryprocessing cost (TC)
6 The Scientific World Journal
Input 119877119894 Relations participating in the query 119875
119888 Probability of crossover
119875119898 Probability of mutation 119866 Pre-defined number of generations 119875size Population Size
Output Top-n query plansMethodInitialize a random parent population of query plans PPwhere chromosome length ldquolenrdquo is the number of relations accessed in the query and the gene at the 119896th position in thechromosome represents the site of the 119896th relationWHILE generation le 119866 DOStep 1 Evaluate each query plan in PP on the following objective functions
1198981 Minimize TCC =
119894=119906minus1119895=119906
sum
119894=1119895=119894+1
CC119894119895times 119887119894
1198982 Minimize TPC =
119906
sum
119894=1
LPC119894times 119886119894
where 119906 is the number of sites accessed by the query plan in ascending order of cardinality per site CC119894119895is the communication
cost per byte between sites 119894 and 119895 LPC119894is the local processing cost per byte at site 119894 119887
119894is the bytes to be communicated from
site 119894 and 119886119894is the bytes to be processed at site 119894
Step 2 Perform Non-Dominated (ND) Sort on PP for ldquo1198981rdquo and ldquo119898
2rdquo separately and place each query plan (QP)
into corresponding ND fronts ldquo119894rdquo and sort the QPs within each ldquo
119894rdquo
Step 3 Evaluate Crowding Distance Function 119868(119889) for each objective functionAssign 119868(119889) = infin for smallest and highest values in each front ldquo
119894rdquo
For the remaining QPs 119868(119889) is calculated as
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
where 119868(119896)119898is the value of119898th objective function of 119896th query plan in Front
119894and 119891
max119898
and 119891
min119898
are themaximum and minimum values obtained for the objective function119898Step 4 Perform Selection from PP using binary tournament selection using crowded comparison operator (≺
119899)
Step 5 Perform random single point crossover on selected chromosomes with crossover probability 119875119888
Step 6 Apply mutation on resulting population with mutation probability 119875119898
Let the resulting child population be CPStep 7 Append CP into PP and let the resulting intermediate population be IPStep 8 Repeat Step 1 and Step 2 for population IPStep 9 Form the population PP for the next generation by picking query plans Front-wise from IP till thepopulation size = 119875sizeStep 10 Increment Generation by 1ENDDOReturn Top-119870 Query Plans from population PP
Algorithm 1 NSGA-II based DQPG algorithm
In order to performanondominated sort each query planis compared with every other query plan in the population tofind if it is dominated For each query plan ldquo119894rdquo the followingtwo entities are considered
(i) 119899119894 The number of query plans that dominate the
query plan 119894(ii) 119878119894 The set of query plans that query plan 119894 dominates
All query plans that have 119899119894= 0 are added to the set
1
Set = 1where is called the current front For each
element 119909 in the current front visit each member 119895 in the set119878119909and reduce the count of 119899
119895by 1 Now if 119899
119895gets reduced to
zero for some 119895 add it to the set 2 After evaluating all the
members of in a similar manner set = 2 This process
continues till all the query plans are assigned some frontThe fast nondominated sorting procedure takes the currentpopulation as input and produces a list of nondominatedfronts
119894as output
Step 4 (Density Estimation Using Crowding Distance [446]) After the nondominated sort the crowding distanceis computed for each query plan in
119894 Crowding distance
[4] is an estimate of the density of solutions surrounding aparticular solution point in the population It is defined asthe average distance of the two closest points on either sidesof the given point along each of the objectives The crowdingdistance 119868(119889119894119904119905119886119899119888119890) is computed in the following manner[4 46]
For each front 119894 let 119873 be the number of query plans
in front 119894 Initially the crowding distance for each query
plan in the front 119894is zero That is 119868(119889
119895) = 0 119895 =
1 119873 Next for each objective function the query plansin the front
119894are sorted based on their value of TPC (ie
the first objective function) and similarly also with respectto TCC (ie the second objective function) and placed in119868(119889119894119904119905119886119899119888119890) The query plans having the smallest and thehighest 119868(119889119894119904119905119886119899119888119890) values in both sets are assigned an infinite
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 3
the sites A distributed query processing strategy is given in[3] which aims to minimize the total query processing cost(TC) given below [3]
TC =
119898
sum
119894=1
LPC119894times 119886119894+
119894=119898minus1 119895=119898
sum
119894=1 119895=119894+1
CC119894119895times 119887119894 (1)
where LPC119894is the local processing cost per byte at site 119894 CC
119894119895
is the communication cost per byte between sites 119894 and 119895119886119894is the bytes to be processed at site 119894 119887
119894is the bytes to be
communicated from site 119894 and119898 is the total number of sitesFor each relation 119877
119905 Card(119877
119905) represents its cardinality and
Size(119877119905) represents the size of a single tuple in bytes At each
site the relations are integrated on common attributes usingthe equijoin operator to arrive at a single relation [3]
For relations 119877119905 with cardinality Card(119877
119905) and 119877
119904with
cardinality Card(119877119904) at site 119894 the cardinality Card
119894of the
resultant relation 119877119894is given as [3]
Card119894=
Card (119877119905) times Card (119877
119904)
Dist119905119904timesmin (Card (119877
119905) Card (119877
119904))
(2)
where Dist119905119904is the number of distinct tuples in the smaller
relation among 119877119905and 119877
119904
The size of the resultant relation119877119894at site 119894 is given as [1 3]
Size119894= Size (119877
119905) + Size (119877
119904) (3)
For a given query plan the communication between sitesoccurs in the order starting from the site having a relationwith lower cardinality to the site having a relation withhigher cardinality [3] The communication cost CC
119894119895and
local processing cost LPC119894are known a priori
The number of bytes to be processed locally at site 119896 isgiven by 119886
119896[3]
119886119896= Card
119894times Size
119894 (4)
The number of bytes to be communicated from site 119895 tosite 119896 is given by 119887
119895[3]
119887119895= Card
119895times Size
119895 (5)
Distributed query plans based on the above heuristic isgenerated using simple GA (SGA) in [3] This SGA basedDQPG as given in [3] is discussed next
22 SGA Based DQPG As discussed above it is a very com-plex task to generate efficient query plans from among a largeset of possible query plans An SGA based DQPG strategybased on the heuristic defined above is given in [3] whichaims to minimize the total cost of query processing (TC)indicating the fitness of a particular solution as compared toothers in the population The algorithm considers relationsaccessed by the query crossover and mutation probabilityand the prespecified number of generations (119866) as inputand produces the Top-119870 query plans as output First thealgorithm randomly generates an initial population of validquery plans (chromosomes) where the size of a query plan
is equal to the number of relations accessed by the queryEach gene in a chromosome represents a relation and theordering of relations in a chromosome is in increasing orderof their cardinality The value of a gene is the site where thecorresponding relation resides As an example for a queryaccessing four relations (1198771 1198772 1198773 and 1198774) arranged in theincreasing order of cardinalities one of the encoding schemesfor the chromosome representation can be (1 1 4 3) implyingthat 1198771 and 1198772 are in site 1 1198773 is in site 4 and 1198774 is insite 3 The fitness (TC) value is computed for each of thequery plans and thereafter the query plans are selected forcrossover using the binary tournament selection technique[35] These selected query plans undergo random single-point crossover [15 36] with probability 119875
119888 and mutation
[15 36] with probability 119875119898 The resultant new population
replaces the old population and the above process is repeatedfor the prespecified number of generations 119866 Thereafter theTop-119870 query plans are produced as output In this paperthe above single objective DQPG problem is formulated andsolved as amultiobjectiveDQPGproblem aswill be discussednext
3 DQPG Using MultiobjectiveGenetic Algorithm
In this paper the single objective DQPG problem discussedabove is formulated as a biobjective DQPG problem Thisformulation is given next
31 Multiobjective DQPG Problem Formulation In the GAbased DQPG algorithm given in [3] there is a single objec-tive that is Minimize TC It can be observed that TC com-prises two costs namely the local processing cost incurredat participating sites that is total processing cost (TPC)and communication cost between the participating sites thatis total communication cost (TCC) Since minimizing TCwould require minimizing TPC and minimizing TCC thissingle objective (Minimize TC)DQPGproblem is formulatedas a biobjective DQPG problem comprising two objectives asMinimize TPC andMinimize TCC Consider
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(6)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894is
the local processing cost per byte at site 119894 119887119894is the bytes to be
communicated from site 119894 and 119886119894is the bytes to be processed
at site 119894 CC119894119895 LPC
119894 119887119894 and 119886
119894are as discussed in Section 21
If a site contains a single relation its LPC is consideredzero TCC and TPC need to be minimized simultaneously toachieve an acceptable tradeoff
4 The Scientific World Journal
The abovemultiobjectiveDQPGproblemhas been solvedusing the multiobjective genetic algorithm which is dis-cussed next
32 Multiobjective Genetic Algorithms Conceptualization ofmultiobjective problems using veridical models has a greatresemblance to many real world engineering and designproblems that involve more than one coextensive and oftencompeting objectives that is maximize profit maximizethroughput minimize cost minimize response time and soforth In such a scenario no single solution can be termedas optimal as in the case of single objective optimizationproblems but rather a set of alternative solutions can bevisualized as a tradeoff between the different objectives underconsideration This set of solutions is regarded superior toothers in the search space as no other recordedavailablesolution can better optimize all the objectives consideredtogether [37ndash39]
Multiobjective optimization approaches can be broadlyclassified into three categories [37] The approaches in thefirst two categories can be termed as the classical optimiza-tion approaches which combine all objectives into a singlecomposite function using some combination of arithmeticoperators ormove all but one objective into the constraint setThe approaches in the first category have limitations in regardto appropriate selection of weights and designing functions inaccordance to the problem It wouldmandate the user to havea priori knowledge of the behavior of each objective functionto some extent for providing the range of values to objectivesso that none of them dominate the others which is not alwayspossible [17] This approach is generally denominated asaggregating functions and it has been implemented at severaloccasionswith relative success in situationswhere behavior ofthe objective function ismore or lesswell-known Someof theaggregating functions include the weighted sum approachgoal programming 120576-constraintmethod and so forth [40] Inthe second approach moving the objectives into a constraintset requires that the boundary values for each of the objectivesbe known a priori which is almost impossible In either of thetwo cases the optimization method returns a single solutionrather than a set of solutions giving possible tradeoffs andtherefore the quality of solution in these approaches greatlydepends upon the correct problem formulation If feasiblethese would be the most efficient and simplest approacheswhich would give atleast sub optimal results in most cases
The third approach overcomes the problems faced inthe classical optimization approaches and emphasizes thedevelopment of alternative techniques based on exploringthe complete set of nondominated solutions and therebyenabling the decision maker to choose among the differentalternatives This set of solutions is referred to as the Paretooptimal set [13] A Pareto optimal set can be formally definedas a set of solutions that are nondominated with respectto each other that is replacing one solution with anotherwithin the Pareto optimal set will invariably lead to a lossto one objective against a gain obtained in another objective[41] Pareto optimal sets can have varied sizes but usuallythe size increases with increase in the number of objectives
[37 40] They are more preferred over single solutionsas they closely resemble real world problems where thedecision maker makes a decision based on tradeoffs betweenmultiple objectives A number of techniques were formulatedto generate the Pareto optimal set for example simulatedannealing [14] Tabu search [42] ant colony optimization[34] and so forth The problem with these algorithms wasthatmost often they get struck at local optima and thus renderit infeasible to venture out for identifying new tradeoffsEvolutionary algorithms such as GA on the other hand seemto be especially suited for this task as they enable parallelexploration of different areas in the search space eventuallyexploiting the solutions attained using operators such ascrossover and mutation [13] It would enable determiningmore members of a Pareto optimal set in a single run insteadof a series of runs required in other blind search strategiesAlso the evolutionary algorithms require very little a prioriknowledge of the problem at hand and therefore are lesssusceptible to the typical shape and continuity of the ParetofrontThe Pareto front can be defined as the points that lie onthe boundary of the Pareto optimal region These algorithmsthus avoid convergence to a suboptimal solution [43]
Mathematically a multiobjective optimization problemwith 119898 decision variables and 119899 objectives can be definedwithout any loss of generality as amaximization orminimiza-tion problem given by [13 38]
minimize = 119891 (119909) = (1198911 (119909) 1198912 (119909) 119891119899 (119909))
where119909 = (1199091 1199092 119909119898) isin 119883
119910 = (1199101 1199102 119910119899) isin 119884
(7)
Here 119909 is the decision vector 119883 refers to the parameterspace 119910 is the objective vector and 119884 defines the objectivespace These objectives may be conflicting in nature that isimprovement in one may lead to deterioration in anotherSo it may become impossible to optimize all objectivessimultaneously in a single solution Instead the best tradeoffsolution would be of interest to a decision maker Thesesolutions form a Pareto optimal set which was initially coinedby Edgeworth and Pareto and is formally defined as [13 38]
ldquoA decision vector 119886 is said to be Pareto optimal if andonly if 119886 is nondominated regarding 119883 A decision vector 119886is said to be nondominated regarding a set 1198831015840 sube 119883 if andonly if there is no vector in 119883
1015840 which dominates 119886 Formallyit can be defined aslt1198861015840 isin 119883
1015840
119886
1015840
≺ 119886rdquoAlso a decision vector 119886 isin 119883 is said to dominate a
decision vector 119887 isin 119883 (also written as 119886 ≺ 119887) if and onlyif
forall119894 isin 1 2 119899 119891119894(119886) le 119891
119894(119887)
forall119895 isin 1 2 119899 119891119895(119886) le 119891
119895(119887)
(8)
Several multiobjective algorithms exist in the literature[4 18ndash25 37 40 41 44 45] of whichGA basedmultiobjectiveoptimization algorithms have been widely used for solvingmultiobjective optimization problems In this paper NSGA-II has been used to solve the DQPG problem NSGA-II willbe discussed next
The Scientific World Journal 5
f(x
2)
f(x1)
minus 1
+ 1
Figure 1 Crowding distance for solution ldquoVrdquo [4]
321 NSGA-II The basis of NSGA-II [4] lies in the non-dominated sorting genetic algorithm (NSGA) introduced bySrinivas and Deb [22] As the name suggests NSGA usesnondominated ranking for each individual in the populationand assigns them accordingly into nondominated frontsTheindividuals in the first front or the nondominated individualsare then assigned large dummy fitness values All individualsin the front shared this fitness value based on a sharing func-tion Next the individuals in the second nondominated frontare considered and similarly assigned a dummy fitness lowerthan the fitness assigned in the previous front This processcontinues till the entire population is classified into frontsSince the solutions in the first front have themaximumfitnessvalue their chances of selection increase and eventually morecopies of such solutions get passed on to the next generationHowever NSGA suffered from some drawbacks such as highcomputational complexity119874(119898119873
3
) nonelitist approach andthe requirement of specifying a shared parameter [4] Theselimitations were addressed in NSGA-II proposed by Deb etal [4] as an improved version of NSGA [22] It alleviatesthe drawbacks in NSGA by reducing the computationalcomplexity to 119874(119898119873
2
) Further it uses a parameter-lesssharing approach by using a crowding distance measurefor selection The crowding distance is an estimate of thedensity of solutions surrounding a particular solution in theobjective space In Figure 1 the crowding distance of solutionrepresented as point V is computed as the average distancebetween the two closest solutions represented as points V minus 1
and V + 1 on either side of the points V along each of theobjectives 119891(1199091) and 119891(1199092)
NSGA-II uses a crowded-comparison operator for selec-tion which takes into account both the nondomination rankof a query plan in the population and its crowding distanceThe nondominated solutions are preferred over dominatedsolutions and between two solutions having the same ranka solution that resides in the less crowded region is preferredthat is a solution for which the crowding distance is higherTheNSGA-II does not use any externalmemory but it ensureselitism by combining the best parents with the best offspringobtained [19] In this paper an NSGA-II based multiobjective
DQPG algorithm is used to compute optimal query plansfor a given distributed query This algorithm is discussednext
33 NSGA-II Based DQPG Algorithm The proposed NSGA-II based DQPG algorithm takes the relations given in theFROM clause of the distributed query as input It arrangesthese relations in increasing order of their cardinalities Itthen generates a fixed set of feasible query plans (chromo-somes) based on the possible combinations of sites in whichthese relations are residing Each gene in a chromosomerepresents a relation and is arranged in increasing orderof the corresponding relationrsquos cardinality The value of agene represents the site in which the corresponding relationresides For example suppose that a query posed by the userhas 4 relations (1198771 1198772 1198773 and 1198774) arranged in ascendingorderThe relation 1198771 is stored in sites 1198781 and 1198783 1198772 is storedin 11987811198773 is stored in 1198781 and 1198782 and1198774 is stored in 1198781Then theinitial population of feasible query plans (chromosomes) canbe (1 1 1 1) (3 1 1 1) (3 1 2 1) and (1 1 2 1) This definesthe encoding scheme for the given problem The proposedDQPG algorithm based on NSGA-II is given in Algorithm 1The steps involved in this algorithm are discussed as follows
Step 1 (Initialize the Population [4 46]) A random popula-tion of query plans is generated as per the encoding schemediscussed above
Step 2 (EvaluateQuery Plans on theObjective Functions)For each of the query plans in the population the TCC andTPC values are computed as given below
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(9)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894
is the local processing cost per byte at site 119894 119887119894is the bytes
to be communicated from site 119894 and 119886119894is the bytes to be
processed at site 119894 The procedure to compute CC119894119895 LPC
119894
119887119894 and 119886
119894is given in Section 21 If a site contains a single
relation its LPC is considered zero TCC and TPC needto be minimized simultaneously to achieve an acceptabletradeoff
Step 3 (Perform Nondominated Sort [4 46]) On the givenpopulation a fast nondominated sorting is performed in thefollowing manner
Twoobjective functions are consideredThefirst objectiveis to minimize the total processing cost (TPC) and thesecond objective is to minimize the total communicationcost (TCC) NSGA-II attempts to find a tradeoff betweenthese two objectives that can result in minimum total queryprocessing cost (TC)
6 The Scientific World Journal
Input 119877119894 Relations participating in the query 119875
119888 Probability of crossover
119875119898 Probability of mutation 119866 Pre-defined number of generations 119875size Population Size
Output Top-n query plansMethodInitialize a random parent population of query plans PPwhere chromosome length ldquolenrdquo is the number of relations accessed in the query and the gene at the 119896th position in thechromosome represents the site of the 119896th relationWHILE generation le 119866 DOStep 1 Evaluate each query plan in PP on the following objective functions
1198981 Minimize TCC =
119894=119906minus1119895=119906
sum
119894=1119895=119894+1
CC119894119895times 119887119894
1198982 Minimize TPC =
119906
sum
119894=1
LPC119894times 119886119894
where 119906 is the number of sites accessed by the query plan in ascending order of cardinality per site CC119894119895is the communication
cost per byte between sites 119894 and 119895 LPC119894is the local processing cost per byte at site 119894 119887
119894is the bytes to be communicated from
site 119894 and 119886119894is the bytes to be processed at site 119894
Step 2 Perform Non-Dominated (ND) Sort on PP for ldquo1198981rdquo and ldquo119898
2rdquo separately and place each query plan (QP)
into corresponding ND fronts ldquo119894rdquo and sort the QPs within each ldquo
119894rdquo
Step 3 Evaluate Crowding Distance Function 119868(119889) for each objective functionAssign 119868(119889) = infin for smallest and highest values in each front ldquo
119894rdquo
For the remaining QPs 119868(119889) is calculated as
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
where 119868(119896)119898is the value of119898th objective function of 119896th query plan in Front
119894and 119891
max119898
and 119891
min119898
are themaximum and minimum values obtained for the objective function119898Step 4 Perform Selection from PP using binary tournament selection using crowded comparison operator (≺
119899)
Step 5 Perform random single point crossover on selected chromosomes with crossover probability 119875119888
Step 6 Apply mutation on resulting population with mutation probability 119875119898
Let the resulting child population be CPStep 7 Append CP into PP and let the resulting intermediate population be IPStep 8 Repeat Step 1 and Step 2 for population IPStep 9 Form the population PP for the next generation by picking query plans Front-wise from IP till thepopulation size = 119875sizeStep 10 Increment Generation by 1ENDDOReturn Top-119870 Query Plans from population PP
Algorithm 1 NSGA-II based DQPG algorithm
In order to performanondominated sort each query planis compared with every other query plan in the population tofind if it is dominated For each query plan ldquo119894rdquo the followingtwo entities are considered
(i) 119899119894 The number of query plans that dominate the
query plan 119894(ii) 119878119894 The set of query plans that query plan 119894 dominates
All query plans that have 119899119894= 0 are added to the set
1
Set = 1where is called the current front For each
element 119909 in the current front visit each member 119895 in the set119878119909and reduce the count of 119899
119895by 1 Now if 119899
119895gets reduced to
zero for some 119895 add it to the set 2 After evaluating all the
members of in a similar manner set = 2 This process
continues till all the query plans are assigned some frontThe fast nondominated sorting procedure takes the currentpopulation as input and produces a list of nondominatedfronts
119894as output
Step 4 (Density Estimation Using Crowding Distance [446]) After the nondominated sort the crowding distanceis computed for each query plan in
119894 Crowding distance
[4] is an estimate of the density of solutions surrounding aparticular solution point in the population It is defined asthe average distance of the two closest points on either sidesof the given point along each of the objectives The crowdingdistance 119868(119889119894119904119905119886119899119888119890) is computed in the following manner[4 46]
For each front 119894 let 119873 be the number of query plans
in front 119894 Initially the crowding distance for each query
plan in the front 119894is zero That is 119868(119889
119895) = 0 119895 =
1 119873 Next for each objective function the query plansin the front
119894are sorted based on their value of TPC (ie
the first objective function) and similarly also with respectto TCC (ie the second objective function) and placed in119868(119889119894119904119905119886119899119888119890) The query plans having the smallest and thehighest 119868(119889119894119904119905119886119899119888119890) values in both sets are assigned an infinite
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 The Scientific World Journal
The abovemultiobjectiveDQPGproblemhas been solvedusing the multiobjective genetic algorithm which is dis-cussed next
32 Multiobjective Genetic Algorithms Conceptualization ofmultiobjective problems using veridical models has a greatresemblance to many real world engineering and designproblems that involve more than one coextensive and oftencompeting objectives that is maximize profit maximizethroughput minimize cost minimize response time and soforth In such a scenario no single solution can be termedas optimal as in the case of single objective optimizationproblems but rather a set of alternative solutions can bevisualized as a tradeoff between the different objectives underconsideration This set of solutions is regarded superior toothers in the search space as no other recordedavailablesolution can better optimize all the objectives consideredtogether [37ndash39]
Multiobjective optimization approaches can be broadlyclassified into three categories [37] The approaches in thefirst two categories can be termed as the classical optimiza-tion approaches which combine all objectives into a singlecomposite function using some combination of arithmeticoperators ormove all but one objective into the constraint setThe approaches in the first category have limitations in regardto appropriate selection of weights and designing functions inaccordance to the problem It wouldmandate the user to havea priori knowledge of the behavior of each objective functionto some extent for providing the range of values to objectivesso that none of them dominate the others which is not alwayspossible [17] This approach is generally denominated asaggregating functions and it has been implemented at severaloccasionswith relative success in situationswhere behavior ofthe objective function ismore or lesswell-known Someof theaggregating functions include the weighted sum approachgoal programming 120576-constraintmethod and so forth [40] Inthe second approach moving the objectives into a constraintset requires that the boundary values for each of the objectivesbe known a priori which is almost impossible In either of thetwo cases the optimization method returns a single solutionrather than a set of solutions giving possible tradeoffs andtherefore the quality of solution in these approaches greatlydepends upon the correct problem formulation If feasiblethese would be the most efficient and simplest approacheswhich would give atleast sub optimal results in most cases
The third approach overcomes the problems faced inthe classical optimization approaches and emphasizes thedevelopment of alternative techniques based on exploringthe complete set of nondominated solutions and therebyenabling the decision maker to choose among the differentalternatives This set of solutions is referred to as the Paretooptimal set [13] A Pareto optimal set can be formally definedas a set of solutions that are nondominated with respectto each other that is replacing one solution with anotherwithin the Pareto optimal set will invariably lead to a lossto one objective against a gain obtained in another objective[41] Pareto optimal sets can have varied sizes but usuallythe size increases with increase in the number of objectives
[37 40] They are more preferred over single solutionsas they closely resemble real world problems where thedecision maker makes a decision based on tradeoffs betweenmultiple objectives A number of techniques were formulatedto generate the Pareto optimal set for example simulatedannealing [14] Tabu search [42] ant colony optimization[34] and so forth The problem with these algorithms wasthatmost often they get struck at local optima and thus renderit infeasible to venture out for identifying new tradeoffsEvolutionary algorithms such as GA on the other hand seemto be especially suited for this task as they enable parallelexploration of different areas in the search space eventuallyexploiting the solutions attained using operators such ascrossover and mutation [13] It would enable determiningmore members of a Pareto optimal set in a single run insteadof a series of runs required in other blind search strategiesAlso the evolutionary algorithms require very little a prioriknowledge of the problem at hand and therefore are lesssusceptible to the typical shape and continuity of the ParetofrontThe Pareto front can be defined as the points that lie onthe boundary of the Pareto optimal region These algorithmsthus avoid convergence to a suboptimal solution [43]
Mathematically a multiobjective optimization problemwith 119898 decision variables and 119899 objectives can be definedwithout any loss of generality as amaximization orminimiza-tion problem given by [13 38]
minimize = 119891 (119909) = (1198911 (119909) 1198912 (119909) 119891119899 (119909))
where119909 = (1199091 1199092 119909119898) isin 119883
119910 = (1199101 1199102 119910119899) isin 119884
(7)
Here 119909 is the decision vector 119883 refers to the parameterspace 119910 is the objective vector and 119884 defines the objectivespace These objectives may be conflicting in nature that isimprovement in one may lead to deterioration in anotherSo it may become impossible to optimize all objectivessimultaneously in a single solution Instead the best tradeoffsolution would be of interest to a decision maker Thesesolutions form a Pareto optimal set which was initially coinedby Edgeworth and Pareto and is formally defined as [13 38]
ldquoA decision vector 119886 is said to be Pareto optimal if andonly if 119886 is nondominated regarding 119883 A decision vector 119886is said to be nondominated regarding a set 1198831015840 sube 119883 if andonly if there is no vector in 119883
1015840 which dominates 119886 Formallyit can be defined aslt1198861015840 isin 119883
1015840
119886
1015840
≺ 119886rdquoAlso a decision vector 119886 isin 119883 is said to dominate a
decision vector 119887 isin 119883 (also written as 119886 ≺ 119887) if and onlyif
forall119894 isin 1 2 119899 119891119894(119886) le 119891
119894(119887)
forall119895 isin 1 2 119899 119891119895(119886) le 119891
119895(119887)
(8)
Several multiobjective algorithms exist in the literature[4 18ndash25 37 40 41 44 45] of whichGA basedmultiobjectiveoptimization algorithms have been widely used for solvingmultiobjective optimization problems In this paper NSGA-II has been used to solve the DQPG problem NSGA-II willbe discussed next
The Scientific World Journal 5
f(x
2)
f(x1)
minus 1
+ 1
Figure 1 Crowding distance for solution ldquoVrdquo [4]
321 NSGA-II The basis of NSGA-II [4] lies in the non-dominated sorting genetic algorithm (NSGA) introduced bySrinivas and Deb [22] As the name suggests NSGA usesnondominated ranking for each individual in the populationand assigns them accordingly into nondominated frontsTheindividuals in the first front or the nondominated individualsare then assigned large dummy fitness values All individualsin the front shared this fitness value based on a sharing func-tion Next the individuals in the second nondominated frontare considered and similarly assigned a dummy fitness lowerthan the fitness assigned in the previous front This processcontinues till the entire population is classified into frontsSince the solutions in the first front have themaximumfitnessvalue their chances of selection increase and eventually morecopies of such solutions get passed on to the next generationHowever NSGA suffered from some drawbacks such as highcomputational complexity119874(119898119873
3
) nonelitist approach andthe requirement of specifying a shared parameter [4] Theselimitations were addressed in NSGA-II proposed by Deb etal [4] as an improved version of NSGA [22] It alleviatesthe drawbacks in NSGA by reducing the computationalcomplexity to 119874(119898119873
2
) Further it uses a parameter-lesssharing approach by using a crowding distance measurefor selection The crowding distance is an estimate of thedensity of solutions surrounding a particular solution in theobjective space In Figure 1 the crowding distance of solutionrepresented as point V is computed as the average distancebetween the two closest solutions represented as points V minus 1
and V + 1 on either side of the points V along each of theobjectives 119891(1199091) and 119891(1199092)
NSGA-II uses a crowded-comparison operator for selec-tion which takes into account both the nondomination rankof a query plan in the population and its crowding distanceThe nondominated solutions are preferred over dominatedsolutions and between two solutions having the same ranka solution that resides in the less crowded region is preferredthat is a solution for which the crowding distance is higherTheNSGA-II does not use any externalmemory but it ensureselitism by combining the best parents with the best offspringobtained [19] In this paper an NSGA-II based multiobjective
DQPG algorithm is used to compute optimal query plansfor a given distributed query This algorithm is discussednext
33 NSGA-II Based DQPG Algorithm The proposed NSGA-II based DQPG algorithm takes the relations given in theFROM clause of the distributed query as input It arrangesthese relations in increasing order of their cardinalities Itthen generates a fixed set of feasible query plans (chromo-somes) based on the possible combinations of sites in whichthese relations are residing Each gene in a chromosomerepresents a relation and is arranged in increasing orderof the corresponding relationrsquos cardinality The value of agene represents the site in which the corresponding relationresides For example suppose that a query posed by the userhas 4 relations (1198771 1198772 1198773 and 1198774) arranged in ascendingorderThe relation 1198771 is stored in sites 1198781 and 1198783 1198772 is storedin 11987811198773 is stored in 1198781 and 1198782 and1198774 is stored in 1198781Then theinitial population of feasible query plans (chromosomes) canbe (1 1 1 1) (3 1 1 1) (3 1 2 1) and (1 1 2 1) This definesthe encoding scheme for the given problem The proposedDQPG algorithm based on NSGA-II is given in Algorithm 1The steps involved in this algorithm are discussed as follows
Step 1 (Initialize the Population [4 46]) A random popula-tion of query plans is generated as per the encoding schemediscussed above
Step 2 (EvaluateQuery Plans on theObjective Functions)For each of the query plans in the population the TCC andTPC values are computed as given below
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(9)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894
is the local processing cost per byte at site 119894 119887119894is the bytes
to be communicated from site 119894 and 119886119894is the bytes to be
processed at site 119894 The procedure to compute CC119894119895 LPC
119894
119887119894 and 119886
119894is given in Section 21 If a site contains a single
relation its LPC is considered zero TCC and TPC needto be minimized simultaneously to achieve an acceptabletradeoff
Step 3 (Perform Nondominated Sort [4 46]) On the givenpopulation a fast nondominated sorting is performed in thefollowing manner
Twoobjective functions are consideredThefirst objectiveis to minimize the total processing cost (TPC) and thesecond objective is to minimize the total communicationcost (TCC) NSGA-II attempts to find a tradeoff betweenthese two objectives that can result in minimum total queryprocessing cost (TC)
6 The Scientific World Journal
Input 119877119894 Relations participating in the query 119875
119888 Probability of crossover
119875119898 Probability of mutation 119866 Pre-defined number of generations 119875size Population Size
Output Top-n query plansMethodInitialize a random parent population of query plans PPwhere chromosome length ldquolenrdquo is the number of relations accessed in the query and the gene at the 119896th position in thechromosome represents the site of the 119896th relationWHILE generation le 119866 DOStep 1 Evaluate each query plan in PP on the following objective functions
1198981 Minimize TCC =
119894=119906minus1119895=119906
sum
119894=1119895=119894+1
CC119894119895times 119887119894
1198982 Minimize TPC =
119906
sum
119894=1
LPC119894times 119886119894
where 119906 is the number of sites accessed by the query plan in ascending order of cardinality per site CC119894119895is the communication
cost per byte between sites 119894 and 119895 LPC119894is the local processing cost per byte at site 119894 119887
119894is the bytes to be communicated from
site 119894 and 119886119894is the bytes to be processed at site 119894
Step 2 Perform Non-Dominated (ND) Sort on PP for ldquo1198981rdquo and ldquo119898
2rdquo separately and place each query plan (QP)
into corresponding ND fronts ldquo119894rdquo and sort the QPs within each ldquo
119894rdquo
Step 3 Evaluate Crowding Distance Function 119868(119889) for each objective functionAssign 119868(119889) = infin for smallest and highest values in each front ldquo
119894rdquo
For the remaining QPs 119868(119889) is calculated as
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
where 119868(119896)119898is the value of119898th objective function of 119896th query plan in Front
119894and 119891
max119898
and 119891
min119898
are themaximum and minimum values obtained for the objective function119898Step 4 Perform Selection from PP using binary tournament selection using crowded comparison operator (≺
119899)
Step 5 Perform random single point crossover on selected chromosomes with crossover probability 119875119888
Step 6 Apply mutation on resulting population with mutation probability 119875119898
Let the resulting child population be CPStep 7 Append CP into PP and let the resulting intermediate population be IPStep 8 Repeat Step 1 and Step 2 for population IPStep 9 Form the population PP for the next generation by picking query plans Front-wise from IP till thepopulation size = 119875sizeStep 10 Increment Generation by 1ENDDOReturn Top-119870 Query Plans from population PP
Algorithm 1 NSGA-II based DQPG algorithm
In order to performanondominated sort each query planis compared with every other query plan in the population tofind if it is dominated For each query plan ldquo119894rdquo the followingtwo entities are considered
(i) 119899119894 The number of query plans that dominate the
query plan 119894(ii) 119878119894 The set of query plans that query plan 119894 dominates
All query plans that have 119899119894= 0 are added to the set
1
Set = 1where is called the current front For each
element 119909 in the current front visit each member 119895 in the set119878119909and reduce the count of 119899
119895by 1 Now if 119899
119895gets reduced to
zero for some 119895 add it to the set 2 After evaluating all the
members of in a similar manner set = 2 This process
continues till all the query plans are assigned some frontThe fast nondominated sorting procedure takes the currentpopulation as input and produces a list of nondominatedfronts
119894as output
Step 4 (Density Estimation Using Crowding Distance [446]) After the nondominated sort the crowding distanceis computed for each query plan in
119894 Crowding distance
[4] is an estimate of the density of solutions surrounding aparticular solution point in the population It is defined asthe average distance of the two closest points on either sidesof the given point along each of the objectives The crowdingdistance 119868(119889119894119904119905119886119899119888119890) is computed in the following manner[4 46]
For each front 119894 let 119873 be the number of query plans
in front 119894 Initially the crowding distance for each query
plan in the front 119894is zero That is 119868(119889
119895) = 0 119895 =
1 119873 Next for each objective function the query plansin the front
119894are sorted based on their value of TPC (ie
the first objective function) and similarly also with respectto TCC (ie the second objective function) and placed in119868(119889119894119904119905119886119899119888119890) The query plans having the smallest and thehighest 119868(119889119894119904119905119886119899119888119890) values in both sets are assigned an infinite
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 5
f(x
2)
f(x1)
minus 1
+ 1
Figure 1 Crowding distance for solution ldquoVrdquo [4]
321 NSGA-II The basis of NSGA-II [4] lies in the non-dominated sorting genetic algorithm (NSGA) introduced bySrinivas and Deb [22] As the name suggests NSGA usesnondominated ranking for each individual in the populationand assigns them accordingly into nondominated frontsTheindividuals in the first front or the nondominated individualsare then assigned large dummy fitness values All individualsin the front shared this fitness value based on a sharing func-tion Next the individuals in the second nondominated frontare considered and similarly assigned a dummy fitness lowerthan the fitness assigned in the previous front This processcontinues till the entire population is classified into frontsSince the solutions in the first front have themaximumfitnessvalue their chances of selection increase and eventually morecopies of such solutions get passed on to the next generationHowever NSGA suffered from some drawbacks such as highcomputational complexity119874(119898119873
3
) nonelitist approach andthe requirement of specifying a shared parameter [4] Theselimitations were addressed in NSGA-II proposed by Deb etal [4] as an improved version of NSGA [22] It alleviatesthe drawbacks in NSGA by reducing the computationalcomplexity to 119874(119898119873
2
) Further it uses a parameter-lesssharing approach by using a crowding distance measurefor selection The crowding distance is an estimate of thedensity of solutions surrounding a particular solution in theobjective space In Figure 1 the crowding distance of solutionrepresented as point V is computed as the average distancebetween the two closest solutions represented as points V minus 1
and V + 1 on either side of the points V along each of theobjectives 119891(1199091) and 119891(1199092)
NSGA-II uses a crowded-comparison operator for selec-tion which takes into account both the nondomination rankof a query plan in the population and its crowding distanceThe nondominated solutions are preferred over dominatedsolutions and between two solutions having the same ranka solution that resides in the less crowded region is preferredthat is a solution for which the crowding distance is higherTheNSGA-II does not use any externalmemory but it ensureselitism by combining the best parents with the best offspringobtained [19] In this paper an NSGA-II based multiobjective
DQPG algorithm is used to compute optimal query plansfor a given distributed query This algorithm is discussednext
33 NSGA-II Based DQPG Algorithm The proposed NSGA-II based DQPG algorithm takes the relations given in theFROM clause of the distributed query as input It arrangesthese relations in increasing order of their cardinalities Itthen generates a fixed set of feasible query plans (chromo-somes) based on the possible combinations of sites in whichthese relations are residing Each gene in a chromosomerepresents a relation and is arranged in increasing orderof the corresponding relationrsquos cardinality The value of agene represents the site in which the corresponding relationresides For example suppose that a query posed by the userhas 4 relations (1198771 1198772 1198773 and 1198774) arranged in ascendingorderThe relation 1198771 is stored in sites 1198781 and 1198783 1198772 is storedin 11987811198773 is stored in 1198781 and 1198782 and1198774 is stored in 1198781Then theinitial population of feasible query plans (chromosomes) canbe (1 1 1 1) (3 1 1 1) (3 1 2 1) and (1 1 2 1) This definesthe encoding scheme for the given problem The proposedDQPG algorithm based on NSGA-II is given in Algorithm 1The steps involved in this algorithm are discussed as follows
Step 1 (Initialize the Population [4 46]) A random popula-tion of query plans is generated as per the encoding schemediscussed above
Step 2 (EvaluateQuery Plans on theObjective Functions)For each of the query plans in the population the TCC andTPC values are computed as given below
TCC =
119894=119906minus1 119895=119906
sum
119894=1 119895=119894+1
CC119894119895times 119887119894
TPC =
119906
sum
119894=1
LPC119894times 119886119894
(9)
where 119906 is the number of sites accessed by the queryplan in ascending order of cardinality per site CC
119894119895is the
communication cost per byte between sites 119894 and 119895 LPC119894
is the local processing cost per byte at site 119894 119887119894is the bytes
to be communicated from site 119894 and 119886119894is the bytes to be
processed at site 119894 The procedure to compute CC119894119895 LPC
119894
119887119894 and 119886
119894is given in Section 21 If a site contains a single
relation its LPC is considered zero TCC and TPC needto be minimized simultaneously to achieve an acceptabletradeoff
Step 3 (Perform Nondominated Sort [4 46]) On the givenpopulation a fast nondominated sorting is performed in thefollowing manner
Twoobjective functions are consideredThefirst objectiveis to minimize the total processing cost (TPC) and thesecond objective is to minimize the total communicationcost (TCC) NSGA-II attempts to find a tradeoff betweenthese two objectives that can result in minimum total queryprocessing cost (TC)
6 The Scientific World Journal
Input 119877119894 Relations participating in the query 119875
119888 Probability of crossover
119875119898 Probability of mutation 119866 Pre-defined number of generations 119875size Population Size
Output Top-n query plansMethodInitialize a random parent population of query plans PPwhere chromosome length ldquolenrdquo is the number of relations accessed in the query and the gene at the 119896th position in thechromosome represents the site of the 119896th relationWHILE generation le 119866 DOStep 1 Evaluate each query plan in PP on the following objective functions
1198981 Minimize TCC =
119894=119906minus1119895=119906
sum
119894=1119895=119894+1
CC119894119895times 119887119894
1198982 Minimize TPC =
119906
sum
119894=1
LPC119894times 119886119894
where 119906 is the number of sites accessed by the query plan in ascending order of cardinality per site CC119894119895is the communication
cost per byte between sites 119894 and 119895 LPC119894is the local processing cost per byte at site 119894 119887
119894is the bytes to be communicated from
site 119894 and 119886119894is the bytes to be processed at site 119894
Step 2 Perform Non-Dominated (ND) Sort on PP for ldquo1198981rdquo and ldquo119898
2rdquo separately and place each query plan (QP)
into corresponding ND fronts ldquo119894rdquo and sort the QPs within each ldquo
119894rdquo
Step 3 Evaluate Crowding Distance Function 119868(119889) for each objective functionAssign 119868(119889) = infin for smallest and highest values in each front ldquo
119894rdquo
For the remaining QPs 119868(119889) is calculated as
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
where 119868(119896)119898is the value of119898th objective function of 119896th query plan in Front
119894and 119891
max119898
and 119891
min119898
are themaximum and minimum values obtained for the objective function119898Step 4 Perform Selection from PP using binary tournament selection using crowded comparison operator (≺
119899)
Step 5 Perform random single point crossover on selected chromosomes with crossover probability 119875119888
Step 6 Apply mutation on resulting population with mutation probability 119875119898
Let the resulting child population be CPStep 7 Append CP into PP and let the resulting intermediate population be IPStep 8 Repeat Step 1 and Step 2 for population IPStep 9 Form the population PP for the next generation by picking query plans Front-wise from IP till thepopulation size = 119875sizeStep 10 Increment Generation by 1ENDDOReturn Top-119870 Query Plans from population PP
Algorithm 1 NSGA-II based DQPG algorithm
In order to performanondominated sort each query planis compared with every other query plan in the population tofind if it is dominated For each query plan ldquo119894rdquo the followingtwo entities are considered
(i) 119899119894 The number of query plans that dominate the
query plan 119894(ii) 119878119894 The set of query plans that query plan 119894 dominates
All query plans that have 119899119894= 0 are added to the set
1
Set = 1where is called the current front For each
element 119909 in the current front visit each member 119895 in the set119878119909and reduce the count of 119899
119895by 1 Now if 119899
119895gets reduced to
zero for some 119895 add it to the set 2 After evaluating all the
members of in a similar manner set = 2 This process
continues till all the query plans are assigned some frontThe fast nondominated sorting procedure takes the currentpopulation as input and produces a list of nondominatedfronts
119894as output
Step 4 (Density Estimation Using Crowding Distance [446]) After the nondominated sort the crowding distanceis computed for each query plan in
119894 Crowding distance
[4] is an estimate of the density of solutions surrounding aparticular solution point in the population It is defined asthe average distance of the two closest points on either sidesof the given point along each of the objectives The crowdingdistance 119868(119889119894119904119905119886119899119888119890) is computed in the following manner[4 46]
For each front 119894 let 119873 be the number of query plans
in front 119894 Initially the crowding distance for each query
plan in the front 119894is zero That is 119868(119889
119895) = 0 119895 =
1 119873 Next for each objective function the query plansin the front
119894are sorted based on their value of TPC (ie
the first objective function) and similarly also with respectto TCC (ie the second objective function) and placed in119868(119889119894119904119905119886119899119888119890) The query plans having the smallest and thehighest 119868(119889119894119904119905119886119899119888119890) values in both sets are assigned an infinite
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 The Scientific World Journal
Input 119877119894 Relations participating in the query 119875
119888 Probability of crossover
119875119898 Probability of mutation 119866 Pre-defined number of generations 119875size Population Size
Output Top-n query plansMethodInitialize a random parent population of query plans PPwhere chromosome length ldquolenrdquo is the number of relations accessed in the query and the gene at the 119896th position in thechromosome represents the site of the 119896th relationWHILE generation le 119866 DOStep 1 Evaluate each query plan in PP on the following objective functions
1198981 Minimize TCC =
119894=119906minus1119895=119906
sum
119894=1119895=119894+1
CC119894119895times 119887119894
1198982 Minimize TPC =
119906
sum
119894=1
LPC119894times 119886119894
where 119906 is the number of sites accessed by the query plan in ascending order of cardinality per site CC119894119895is the communication
cost per byte between sites 119894 and 119895 LPC119894is the local processing cost per byte at site 119894 119887
119894is the bytes to be communicated from
site 119894 and 119886119894is the bytes to be processed at site 119894
Step 2 Perform Non-Dominated (ND) Sort on PP for ldquo1198981rdquo and ldquo119898
2rdquo separately and place each query plan (QP)
into corresponding ND fronts ldquo119894rdquo and sort the QPs within each ldquo
119894rdquo
Step 3 Evaluate Crowding Distance Function 119868(119889) for each objective functionAssign 119868(119889) = infin for smallest and highest values in each front ldquo
119894rdquo
For the remaining QPs 119868(119889) is calculated as
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
where 119868(119896)119898is the value of119898th objective function of 119896th query plan in Front
119894and 119891
max119898
and 119891
min119898
are themaximum and minimum values obtained for the objective function119898Step 4 Perform Selection from PP using binary tournament selection using crowded comparison operator (≺
119899)
Step 5 Perform random single point crossover on selected chromosomes with crossover probability 119875119888
Step 6 Apply mutation on resulting population with mutation probability 119875119898
Let the resulting child population be CPStep 7 Append CP into PP and let the resulting intermediate population be IPStep 8 Repeat Step 1 and Step 2 for population IPStep 9 Form the population PP for the next generation by picking query plans Front-wise from IP till thepopulation size = 119875sizeStep 10 Increment Generation by 1ENDDOReturn Top-119870 Query Plans from population PP
Algorithm 1 NSGA-II based DQPG algorithm
In order to performanondominated sort each query planis compared with every other query plan in the population tofind if it is dominated For each query plan ldquo119894rdquo the followingtwo entities are considered
(i) 119899119894 The number of query plans that dominate the
query plan 119894(ii) 119878119894 The set of query plans that query plan 119894 dominates
All query plans that have 119899119894= 0 are added to the set
1
Set = 1where is called the current front For each
element 119909 in the current front visit each member 119895 in the set119878119909and reduce the count of 119899
119895by 1 Now if 119899
119895gets reduced to
zero for some 119895 add it to the set 2 After evaluating all the
members of in a similar manner set = 2 This process
continues till all the query plans are assigned some frontThe fast nondominated sorting procedure takes the currentpopulation as input and produces a list of nondominatedfronts
119894as output
Step 4 (Density Estimation Using Crowding Distance [446]) After the nondominated sort the crowding distanceis computed for each query plan in
119894 Crowding distance
[4] is an estimate of the density of solutions surrounding aparticular solution point in the population It is defined asthe average distance of the two closest points on either sidesof the given point along each of the objectives The crowdingdistance 119868(119889119894119904119905119886119899119888119890) is computed in the following manner[4 46]
For each front 119894 let 119873 be the number of query plans
in front 119894 Initially the crowding distance for each query
plan in the front 119894is zero That is 119868(119889
119895) = 0 119895 =
1 119873 Next for each objective function the query plansin the front
119894are sorted based on their value of TPC (ie
the first objective function) and similarly also with respectto TCC (ie the second objective function) and placed in119868(119889119894119904119905119886119899119888119890) The query plans having the smallest and thehighest 119868(119889119894119904119905119886119899119888119890) values in both sets are assigned an infinite
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 7
Rejected
Crowding distance sort PP
CP
IP
PP
Nondominatedsort
Ŧ1
Ŧ2
Ŧ3
Figure 2 NSGA-II procedure preserving elitism [4]
value for 119868(119889119894119904119905119886119899119888119890) that is 119868(1198891) = infin and 119868(119889
119873) = infin For
remaining query plans that is 119896 = 2 119873 minus 1 119868(119889119894119904119905119886119899119888119890)is computed as follows [4 46]
119868 (119889119896) = 119868 (119889
119896) +
119868(119896 + 1)119898minus 119868(119896 minus 1)
119898
119891
max119898
minus 119891
min119898
(10)
where 119868(119896)119898is the value of 119898th objective function of 119896th
query plan in front 119894and 119891
max119898
and 119891
min119898
are the maximumand minimum values obtained for the objective function119898
Step 5 (Binary Tournament Selection) After assigning thecrowding distance to the query plans in each front a selectionprocess is carried outThe selection scheme used is the binarytournament selection and it is carried out using the crowdedcomparison operator (≺
119899) [4 46] It uses two parameters as
given below(i) 120588rank (nondomination rank)The query plans in front
119894will have 120588rank = 119894
(ii) 119868119894(119889119895) (crowding distance in front 119894) 119895 = 1 119873
The crowded comparison is performed as described below [446]
For any two query plans 1199021199011and 119902119901
2 1199021199011is selected if
(1199021199011≺1198991199021199012) 1199021199011≺1198991199021199012is true if either one of the following
holds(i) 120588rank(1199021199011) lt 120588rank(1199021199012)(ii) if 119902119901
1and 119902119901
2belong to the same front (ie if
120588rank(1199021199011) = 120588rank(1199021199012)) then 119868119894(1198891199021199011
) gt 119868119894(1198891199021199012
)
Step 6 (Crossover andMutation) Crossover is performed onthe selected query plans with a given crossover probability119875119888 It ensures proper exploration of the search space by
combining the best features of the parent query plans (chro-mosomes) Mutation is performed on the given populationwith a given probability 119875
119898 It randomly changes the site
(gene) in which the corresponding relation resides within aquery plan (chromosome) The mutated gene always takes arandom value from a set of valid sites for a particular relationAfter going through the above steps the first generationpopulation is formed NSGA-II follows a different methodto produce subsequent generations in order to incorporateelitism as described next
Table 1 Initial parent population PP
[2 1 5 1] [3 5 4 5]
[3 3 1 1] [3 1 1 1]
[3 3 3 3] [3 4 4 4]
[2 4 1 5] [2 3 3 3]
[3 1 1 3] [2 1 3 4]
Step 7 (Preserving Good Solutions (Elitism) [4]) In subse-quent generations the new population after each generationis combined with the parent population and a new interme-diate population IP is created of size PP + CP where PP is theparent population and CP is the child population as shown inFigure 2
The non-dominated sort is applied to this intermediatepopulation and fronts are formed as described in Step 3Finally the population for the next generation is formedby adding solutions from each front till the population sizeexceeds 119875 If the last front to be included was
119888 which led
to the population overflow then query plans in Front 119888are
selected based on their crowding distance measure (Step 4)in descending order until the population size exceeds 119875
The above steps are repeated for ldquo119866rdquo generations and theTop-119870 query plans are produced as output
An example illustrating the use of the above NSGA-IIbased DQPG algorithm to generate query plans for a givendistributed query is given next
4 An Example
Consider the site relation matrix the communication-cost matrix the local processing cost matrix the distinct-tuple matrix and the size matrix used to compute thefitness of query plans given in [3] and shown below inFigure 3 Suppose the initial parent population PP com-prises of 10 query plans given in Table 1 Consider aquery that accesses four relations (1198771 1198772 1198773 and 1198774)which are distributed among five sites (1198781 1198782 1198783 1198784 and1198785)
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 The Scientific World Journal
R1 R2 R3 R4300200 500 700
Cardinality
R1 R2 R3 R430 40 50 20
Size
S1 S2 S3 S4 S52 2 3 3 2
S1 S2 S3 S4 S5
S1 150 180 150 160
S2 165 185 170 170
S3 160 160 170 150
S4 150 150 180 160
S5 170 160 190 150
Communication costmatrix
R1 R2 R3 R4
R1 04 05 04 04
R2 05 02 02 03
R3 04 02 02 01
R4 04 03 01 03
matrix for join
R1 R2 R3 R4S1 0 1 1 1S2 1 0 0 0S3 1 1 1 1S4 0 1 1 1S5 0 0 1 1
matrixLPCi
Distinct-tuple- Relation-site
mdash
mdashmdash
mdash
mdash
Figure 3 Matrices used for computing fitness [3]
Table 2 Nondominated sort on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC 119878
119894119899119894
Front[1] [2 1 5 1] 18020000 3746700 [4 10] 1
2
[2] [3 3 1 1] 6720000 6006000 [4 10] 1 2
[3] [3 3 3 3] 0 4200000 [2 4 7 8 9 10] mdash 1
[4] [2 4 1 5] 43440000 70793333 [10] 6 3
[5] [3 1 1 3] 14000000 2112000 [1 4 6 10] mdash 1
[6] [3 5 4 5] 17020000 3847000 [4 10] 1 2
[7] [3 1 1 1] 960000 6500000 [4 8 9 10] 1 2
[8] [3 4 4 4] 1020000 9750000 [10] 2 3
[9] [2 3 3 3] 1110000 9750000 [10] 2 3
[10] [2 1 3 4] 45090000 10514000 mdash 9 4
Table 3 Sorted fronts based on TCC and TPC
Query plans sorted on1198981= TCC
Query plans sorted on1198982= TPC
1
3 5 1
5 32
7 2 6 1 2
1 6 2 73
8 9 4 3
4 8 94
10 4
10
The computations of TPC and TCC for the query plan[2 4 1 5] are given as follows
TPC4= (LPC
2times 1198862) + (LPC
4times 1198864)
+ (LPC5times 1198865) + (LPC
1times 1198861)
TCC4= (CC
24times 11988724
) + (CC45
times 11988745
) + (CC51
times 11988751
)
(11)
whereLPC2times 1198862= 0
CC24
times 11988724
= CC24
times (Card (1198771) times Size (1198771))
= 170 times (200 times 30) = 1020000
LPC4times 1198864= LPC
4times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)) )
= 2 times (
200 times 300
05 times 200
times (70)) = 126000
CC45
times 11988745
= CC45
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times (Size (1198771) + Size (1198772)))
= 170 times (
200 times 300
05 times 200
times (70)) = 6720000
LPC5times 1198865
= LPC5
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 420000
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 9
Table 4 Results for the objective functions on PP
Index [119894] Population PP 1198981= TCC 119898
2= TPC Front Crowding distance (CD) = 119868[119894]
[1] [2 1 5 1] 18020000 3746700 2
infin
[2] [3 3 1 1] 6720000 6006000 2
0672[3] [3 3 3 3] 0 4200000
1infin
[4] [2 4 1 5] 43440000 70793333 3
infin
[5] [3 1 1 3] 14000000 2112000 1
infin
[6] [3 5 4 5] 17020000 3847000 2
05195[7] [3 1 1 1] 960000 6500000
2infin
[8] [3 4 4 4] 1020000 9750000 3
infin
[9] [2 3 3 3] 1110000 9750000 3
infin
[10] [2 1 3 4] 45090000 10514000 4
infin
Table 5 Selection of query plans using binary tournament selection technique
IndexRandomly generated
indexes[119894] and [119895]
Tournament betweenquery plans
[119875(119894)] and [119875(119895)]
Front comparison Query plan selected (lowerfront or higher CD)
[1] [1] and [4] [2 1 5 1] and [2 4 1 5] [2] and [
3] [2 1 5 1]
[2] [2] and [3] [3 3 1 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[3] [3] and [4] [3 3 3 3] and [2 4 1 5] [1] and [
3] [3 3 3 3]
[4] [2] and [6] [3 3 1 1] and [3 5 4 5] [2] and [
2] [3 3 1 1]
[5] [2] and [4] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
[6] [1] and [3] [2 1 5 1] and [3 3 3 3] [2] and [
1] [3 3 3 3]
[7] [2] and [5] [3 3 1 1] and [3 1 1 3] [2] and [
1] [3 1 1 3]
[8] [2] and [8] [3 3 1 1] and [3 1 1 3] [2] and [
3] [3 3 1 1]
[9] [7] and [8] [3 1 1 1] and [3 1 1 3] [2] and [
3] [3 1 1 1]
[10] [2] and [9] [3 3 1 1] and [2 4 1 5] [2] and [
3] [3 3 1 1]
Table 6 Population CP after crossover and mutation
Index Population after crossover Population aftermutation (CP)
[1] [2 1 1 1] [2 1 1 1]
[2] [3 3 1 1] [3 3 1 1]
[3] [3 3 5 1] [3 3 4 1][4] [3 3 1 1] [3 3 1 1]
[5] [3 3 3 1] [3 3 3 1]
[6] [3 3 3 3] [3 3 3 3]
[7] [3 1 1 3] [3 1 1 3]
[8] [3 3 1 3] [3 3 1 3]
[9] [3 1 3 3] [3 3 3 3]
[10] [3 3 1 1] [3 3 1 1]
CC51
times 11988751
= CC51
times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times (Size (1198771) + Size (1198772) + Size (1198774)) )
170 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
)
times 90 = 35700000
LPC1times 1198861
= LPC1times (
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
times Card (1198774)
times (Dist24
times
Card (1198771) times Card (1198772)
Dist12
times Card (1198771)
)
minus1
times
Card (1198773)
Dist43
times Card (1198773)
times (
4
sum
119894=1
Size (119877119894)))
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 The Scientific World Journal
Table 7 Front assignment for query plans in IP
Index Intermediate population IP TCC TPC Front[1] [2 1 5 1] 18020000 3747000
2
[2] [3 3 1 1] 6720000 6006000 3
[3] [3 3 3 3] 0 4200000 1
[4] [2 4 1 5] 43440000 7079000 4
[5] [3 1 1 3] 14000000 2112000 1
[6] [3 5 4 5] 17020000 3847000 2
[7] [3 1 1 1] 960000 6500000 2
[8] [3 1 1 3] 1020000 9750000 3
[9] [2 3 3 3] 1110000 9750000 3
[10] [2 1 3 4] 45090000 10514000 5
[11] [2 1 1 1] 990000 6500000 2
[12] [3 3 1 1] 6720000 6006000 3
[13] [3 3 4 1] 90300000 8946000 5
[14] [3 3 1 1] 6720000 6006000 3
[15] [3 3 3 1] 2520000 4230000 2
[16] [3 3 3 3] 0 4200000 1
[17] [3 1 1 3] 14000000 2112000 1
[18] [3 3 1 3] 4500000 2640000 1
[19] [3 3 3 3] 0 4200000 1
[20] [3 3 1 1] 6720000 6006000 3
Table 8 Nondominated sorting for query plans in IP
Index Intermediatepopulation IP Front Crowding
distance[3] [3 3 3 3]
1
[5] [3 1 1 3] 1
[16] [3 3 3 3] 1
[17] [3 1 1 3] 1
[18] [3 3 1 3] 1
[19] [3 3 3 3] 1
[1] [2 1 5 1] 2 infin
[7] [3 1 1 1] 2 infin
[11] [2 1 1 1] 2 infin
[15] [3 3 3 1] 2 04933
[6] [3 5 4 5] 2 02292
[2] [3 3 1 1] 3
[9] [2 3 3 3] 3
[8] [3 1 1 3] 3
[12] [3 3 1 1] 3
[14] [3 3 1 1] 3
[20] [3 3 1 1] 3
[4] [2 4 1 5] 4
[10] [2 1 3 4] 5
[13] [3 3 4 1] 4
= 2 times (
200 times 300
05 times 200
times
700
03 times (200 times 300) (05 times 200)
times
500
01 times 500
) times 140 = 65333333
(12)
Table 9 Population PP in the 2nd generation
Index Population PP[1] [3 3 3 3]
[2] [3 1 1 3]
[3] [3 3 3 3]
[4] [3 1 1 3]
[5] [3 3 1 3]
[6] [3 3 3 3]
[7] [2 1 5 1]
[8] [3 1 1 1]
[9] [2 1 1 1]
[10] [3 3 3 1]
So
TPC4= 126000 + 420000 + 65333333 = 70793333
TCC4= 1020000 + 6720000 + 35700000 = 43440000
(13)
Similarly TCC and TPC values of the other nine queryplans are computed Consider TCC and TPC of the 10 queryplans are given in Table 2 The population is then sorted intodifferent nondominated fronts as described in Step 3 of theproposed algorithm For example for query plan 1 that is[2 1 5 1] the set 119878
1= number of query plans that dominate
query plan 1 Since TCC [1] lt TCC [4] TPC [1] lt TPC [4]TCC [1] lt TCC [10] and TPC [1] lt TPC [10] the elementsof 1198781
= 4 10 Similarly the sets 1198782 119878
10are computed
and are given in Table 2 119899119894stores the count of query plans
that dominate 119894 So using the values in 119878119894 1198991
= 1 as onlyquery plan 5 is dominating 1 and 119899
2= 1 as only query plan 3
is dominating 2 Similarly 1198992 119899
10are computed and are
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 11
given in Table 2 From Table 2 it can be noted that queryplans 3 and 5 are not dominated as 119899
3= 0 119899
5= 0 So they
are assigned to the first nondominated front1The elementsin the next front are computed by reducing the count in 119899
119894
for each 119894 isin 1198783and 119894 isin 119878
5 So 119899
119894= 0 0 minus 4 minus 0 0 1 1 7
So the second front has query plans 1 2 6 and 7 Thisprocess continues till all the query plans in the population areassigned to their respective nondominated fronts The fronts1 2 3 and
4are formed and are given in Table 2
Finally the query plans are sorted separately on the valuesof TCC and TPC within each front as shown in Table 3
After the population is sorted into different fronts thecrowding distance (119868[119889119894119904119905119886119899119888119890]) computation is performedfor each query plan using the formula given in Step 4 ofthe proposed algorithm Query plans having the maximumand minimum values in each front are assigned infin distancevalues that is 119868[3] 119868[5] 119868[7] 119868[1] 119868[8] 119868[4] 119868[9] and119868[10] are assigned infin For the rest of the query plans theircrowding distance (CD) values are computed based on thesorted order of query plans in each front Initially they areassigned 119868[119889119894119904119905119886119899119888119890] = 0 For query plan 2 in front
2 the
CD computation is performed as follows
1198681198981[2] = 119868 [2] +
TCC [6] minus TCC [7]
max (TCC) minusmin (TCC)
= 0 +
17020000 minus 960000
45090000 minus 0
= 03562
119868 [2] = 1198681198981[2] +
TPC [7] minus TPC [6]
max (TPC) minusmin (TPC)
= 03562 +
6500000 minus 3847000
10514000 minus 2112000
= 06720
(14)
CD of the query plans in the given population is given inTable 4
Next binary tournament selection is performed on thepopulation on the basis of crowded comparison operator ≺
119899
This selection process is shown in Table 5The selected query plans undergo random single point
crossover with crossover probability 119875119888
= 05 Mutation isperformed on the selected population with mutation proba-bility 119875
119898= 002 The child population CP after crossover and
mutation is shown in Table 6Now in accordance with NSGA-II algorithm the popu-
lations from the second generation onward have to ensureelitism For this purpose the child population CP is com-bined with the parent population PP to generate intermediatepopulation IP This population is subjected to nondominatedsort and fronts are formed as given in Table 7
The population for the second generation is arrived atby selecting query plans based on front and within it basedon crowding distance-wise as described in Step 7 from theintermediate population IP till the actual population size 10is exceeded This selection is shown in Table 8
The population PP for the second generation is given inTable 9
The above process is repeated till a predefined number ofgenerations119866have elapsedThereafter Top-119870 query plans areproduced as output
0
01
02
03
04
05
06
07
08
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-5 query plans)
Figure 4
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
DQPGNSGA(top-10 query plans)
Figure 5
5 Experimental Results
The proposed NSGA-II based algorithm is implemented inMATLAB 77 in Windows 7 professional 64 bit OS with Intelcore i3CPUat 213GHzhaving 4GBRAMExperimentswerecarried out for a population of 100 query plans with eachquery plan involving 10 relations distributed over 50 sitesThese were performed on four datasets each comprising adifferent relation-site matrix Graphs were plotted to observechange in average TC (ATC) with respect to generations
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
12 The Scientific World Journal
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-15 query plans)
Figure 6
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
0
01
02
03
04
05
06
07
08
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGNSGA(top-20 query plans)
Figure 7
and Top-119870 query plans for different pairs of crossover andmutation rates
First graphs showing the ATC values Top-5 Top-10Top-15 and Top-20 query plans averaged over four datasetsover 1000 generations for distinct pairs of crossover andmutation probability 119875
119888 119875119898 = 07 001 07 0005
075 001 075 0005 08 001 08 0005 085 001085 0005 using NSGA-II based DQPG algorithm(DQPGNSGA) were plotted These are shown in Figures 45 6 and 7 It can be observed from these graphs that the
0
005
01
015
02
025
03
035
04
045
05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
(after 1000 generations)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGNSGA
Top-K query plans
Figure 8
0
01
02
03
04
05
06
07
08
09
1
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-5 query plans)
Figure 9
convergence to ATC is lowest in the case of 119875119888 119875119898 =
085 001 Furthermore the graph showing ATC valuesaveraged over four datasets versus Top-119870 query plans after1000 generations (Figure 8) also shows that the lowest ATCvalues achieved are for 119875
119888 119875119898 = 085 001 Thus it
can be said that DQPGNSGA performs reasonably well for119875119888 119875119898 = 085 001 In order to compare DQPGNSGA with
SGA based DQPG algorithm (DQPGSGA) similar graphswere plotted for DQPGSGA These are shown in Figures 9 1011 12 and 13 It is noted from these graphs that DQPGSGA
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 13
0
02
04
06
08
1
12
14
1618
21 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-10 query plans)
Figure 10
0
02
04
06
08
1
12
14
16
18
2
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Generation07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
Aver
age t
otal
cost
(ATC
)
DQPGSGA(top-15 query plans)
Figure 11
also achieves convergence to the lowest ATC values for119875119888 119875119898 = 085 001
Since the two algorithms DQPGNSGA and DQPGSGAconverge to a lower ATC value for the same crossover andmutation probabilities that is 119875
119888 119875119898 = 085 001 the
comparisons of the two algorithms can be carried out forthese observed probabilities
First the two algorithmsDQPGNSGA andDQPGSGA werecompared for each of the four datasets (Dataset-1 Dataset-2 Dataset-3 and Dataset-4) on the ATC values of Top-5Top-10 Top-15 and Top-20 query plans generated over 1000
0
04
08
12
16
2
24
28
32
36
4
1 54 107
160
213
266
319
372
425
478
531
584
637
690
743
796
849
902
955
Generation
Aver
age t
otal
cost
(ATC
)
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
DQPGSGA(top-20 query plans)
Figure 12
0
01
02
03
04
05
06
07
08
09
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
07 00107 0005075 001075 0005
08 00108 0005085 001085 0005
(after 1000 generations)
Aver
age t
otal
cost
(ATC
)
DQPGSGA
Top-K query plans
Figure 13
generations for 119875119888 119875119898 = 085 001 The corresponding
graphs for each dataset are shown in Figures 14 15 16 and 17It can be observed from the graphs thatDQPGNSGA convergesto lower ATC values for Top-5 Top-10 Top-15 and Top-20query plans generated for all datasets
Furthermore graphs comparing the ATC values of Top-119870 query plans produced by DQPGNSGA and DQPGSGA after1000 generations for the four datasets were plotted andare shown in Figures 18 19 20 and 21 These graphs alsoshow average TCC (ATCC) and average TPC (ATPC) valuesof Top-119870 query plans generated by DQPGNSGA It can beobserved from these graphs thatDQPGNSGA is able to achieve
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
14 The Scientific World Journal
0
05
1
15
2
25
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-1)
Figure 14
0
03
06
09
12
15
18
21
24
GenerationSGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-2)
Figure 15
an acceptable tradeoff between ATPC and ATCC which inturn leads to a comparatively lower ATC for the Top-119870 queryplans generated by it
Next a graph comparing the ATC values of Top-119870 queryplans generated by DQPGNSGA and DQPGSGA on all fourdatasets (DS-1 DS-2 DS-3 and DS-4) after 1000 generationsfor observed probabilities 119875
119888 119875119898 = 085 001were plotted
and is shown in Figure 22 It is noted from the graph thatDQPGNSGA performs better than DQPGSGA on the ATCvalues of Top-119870 query plans generated by the two algorithmsfor each of the four data sets
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
0
015
03
045
06
075
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-3)
Figure 16
SGA-top 5SGA-top 10SGA-top 15SGA-top 20
NSGA-top 5NSGA-top 10NSGA-top 15NSGA-top 20
Generation
1 38 75 112
149
186
223
260
297
334
371
408
445
482
519
556
593
630
667
704
741
778
815
852
889
926
963
1000
0
01
02
03
04
05
06
07
08
09
1
Aver
age t
otal
cost
(ATC
)
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(Dataset-4)
Figure 17
It can be reasonably inferred from all the above graphsthat DQPGNSGA is able to generate Top-119870 query plans withlower ATC when compared to those generated byDQPGSGAThis may be attributed to acceptable tradeoffs achieved whilesimultaneously optimizing TPC and TCC which results inlower TC in case of DQPGNSGA
6 Conclusion
In this paper DQPGproblemgiven in [3] has been addressedwhere query plans are generated for a distributed relational
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 15
0
005
01
015
02
025
03
035
04
045
1 3 5 7 9 11 13 15 17 19
Cos
t
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-1)
Figure 18
0
01
02
03
04
05
06
07
Cos
t
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
Top-K query plans
(Dataset-2)
Figure 19
query that incurs minimum total query processing costGenetic algorithms have been used to generate these queryplansThe total query processing cost TC in [3] can be viewedas comprising broadly of TPC and TCC and thereforeminimizing TPC and TCC would result in minimizing TCThus in this paper the single-objective DQPG problemin [3] has been formulated and solved as a biobjectiveDQPG problem with the two objectives being minimizingTPC and minimizing TCC These objectives are minimizedsimultaneously using the multiobjective genetic algorithmNSGA-II
Experiments were performed and DQPGNSGA is com-pared with DQPGSGA given in [3] It was observed thatboth the algorithms individually gave good results for thecrossover and mutation probabilities 085 and 001 respec-tively The two algorithms were then compared on the ATC
0
002
004
006
008
01
012
014
016
Cos
t
1 3 5 7 9 11 13 15 17 19
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-3)
Figure 20
0
005
01
015
02
025
Cos
t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
ATC-SGAATCC-NSGA-II
ATPC-NSGA-IIATC-NSGA-II
DQPGNSGA versus DQPGSGAPc Pm = 085 001 (after 1000 generations)
Top-K query plans
(Dataset-4)
Figure 21
values of the Top-119870 query plans generated by them for theobserved crossover and mutation probabilities The resultsshowed that DQPGNSGA performed better than DQPGSGAAlso the performance of the former was better when the twoalgorithmswere compared on theATCvalues of Top-119870 queryplansThe better performance of DQPGNSGA over DQPGSGAmay be attributed to DQPGNSGA achieving acceptable trade-offs between TPC and TCC while minimizing TPC and TCCof Top-119870 query plans simultaneously
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
16 The Scientific World Journal
Top-K query plans
0
01
02
03
04
05
06
07
Aver
age t
otal
cost
(ATC
)
SGA(DS-1)SGA(DS-2)SGA(DS-3)SGA(DS-4)
NSGA(DS-1)NSGA(DS-2)NSGA(DS-3)NSGA(DS-4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
DQPGNSGA versus DQPGSGAPc Pm = 085 001
(after 1000 generations)
Figure 22
References
[1] S Ceri and G Pelgatti Distributed Databases Principles ampSystems International Edition McGraw-Hill Computer ScienceSeries 1985
[2] A R Hevner and S B Yao ldquoQuery processing in distributeddatabase systemsrdquo IEEE Transactions on Software Engineeringvol 5 no 3 pp 177ndash187 1979
[3] T V Vijay Kumar and S Panicker ldquoGenerating query plansfor distributed query processing using genetic algorithmrdquo inInformation Computing and Applications vol 7030 of LectureNotes in Computer Science pp 765ndash772 2011
[4] K Deb A Pratap S Agarwal and T Meyarivan ldquoA fastand elitist multiobjective genetic algorithm NSGA-IIrdquo IEEETransactions on Evolutionary Computation vol 6 no 2 pp 182ndash197 2002
[5] P Black and W Luk ldquoA new heuristic for generating semi-joinprograms for distributed query processingrdquo in Proceedings ofthe IEEE 6th International Computer Software and ApplicationConference pp 581ndash588 IEEE Chicago Ill USA November1982
[6] P Bodorik and J S Riordon ldquoA threshold mechanism fordistributed query processingrdquo in Proceedings of the 16th AnnualConference on Computer Science pp 616ndash625 Atlanta GaUnited States February 1988
[7] J Chang ldquoA heuristic approach to distributed query process-ingrdquo in Proceedings of the 8th International Conference on VeryLarge Data Bases (VLDB rsquo82) pp 54ndash61 Saratoga Calif USA1982
[8] D Kossmann ldquoThe state of the art in distributed queryprocessingrdquo ACM Computing Surveys vol 32 no 4 pp 422ndash469 2000
[9] L Stiphane and W Eugene ldquoA state transition model fordistributed query processingrdquo ACM Transactions on DatabaseSystems vol 11 no 3 pp 249ndash322 1986
[10] T V Vijay Kumar V Singh and A K Verma ldquoDistributedquery processing plans generation using genetic algorithmrdquo
International Journal of Computer Theory and Engineering vol3 no 1 pp 38ndash45 2011
[11] C T Yu and C C Chang ldquoDistributed query processingrdquoACMComputing Surveys vol 16 no 4 pp 399ndash433 1984
[12] K Bennett M C Ferris and Y E Ioannidis ldquoA Geneticalgorithm for database query optimizationrdquo in Proceedings ofthe 4th International Conference onGenetic Algorithms pp 400ndash407 1991
[13] Y E Ioannidis and Y Cha Kang ldquoRandomized algorithms foroptimizing large join queriesrdquo in Proceedings of the 1990 ACMSIGMOD International Conference on Management of Data pp312ndash321 May 1990
[14] Y Ioannidis and E Wong ldquoQuery optimization by simulatedannealingrdquo in Proceedings of the ACM SIGMOD InternationalConference on Management of Data (SIGMOD rsquo87) pp 9ndash221987
[15] A A R Townsend Genetic Algorithms A Tutorial 2003[16] M Gregory Genetic Algorithm Optimization of Distributed
Database Queries IEEE 1998[17] M Mitchell An Introduction to Genetic Algorithms The MIT
Press Cambridge Mass USA 1999[18] J D Schaffer ldquoMultiple objective optimization with vector
evaluated genetic algorithmsrdquo inProceedings of the InternationalConference on Genetic Algorithm andTheir Applications pp 93ndash100 July 1985
[19] V Guliashki H Toshev andC Korsemov ldquoSurvey of evolution-ary algorithms used in multiobjective optimizationrdquo Problemsof EngineeringCybernetics andRobotics vol 60 pp 42ndash54 2009
[20] C M Fonseca and P J Fleming ldquoGenetic algorithms formultiobjective optimization formulation discussion and gen-eralizationrdquo inProceedings of the 5th International Conference onGenetic Algorithms S Forrest Ed pp 416ndash423 San FranciscoCalif USA 1993
[21] J Horn N Nafpliotis and D E Goldberg ldquoA niched Paretogenetic algorithm for multiobjective optimizationrdquo in Proceed-ings of the 1st IEEE Conference on Evolutionary ComputationIEEE World Congress on Computational Intelligence pp 82ndash87IEEE Orlando Fla USA June 1994
[22] N Srinivas and K Deb ldquoMultiobjective optimization usingnondominated sorting in genetic algorithmsrdquo EvolutionaryComputation vol 2 no 3 pp 221ndash248 1994
[23] E Zitzler and L Thiele ldquoMultiobjective evolutionary algo-rithms a comparative case study and the strength Paretoapproachrdquo IEEETransactions onEvolutionaryComputation vol3 no 4 pp 257ndash271 1999
[24] J D Knowles and D W Corne ldquoApproximating the non-dominated front using the Pareto archived evolution strategyrdquoEvolutionary computation vol 8 no 2 pp 149ndash172 2000
[25] D W Corne J D Knowles and M J Oates ldquoThe Paretoenvelope-based selection algorithm for multiobjective opti-mizationrdquo in Proceedings of the 6th International Conference onParallel Problem Solving from Nature Springer Paris FranceSeptember 2000
[26] J Scharnow K Tinnefeld and I Wegener ldquoThe analysis of evo-lutionary algorithms on sorting and shortest paths problemsrdquoJournal of Mathematical Modelling and Algorithms vol 3 no 4pp 349ndash366 2004
[27] B Doerr E Happ and C Klein ldquoCrossover can provablybe useful in evolutionary computationrdquo in Proceedings of the10th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo08) pp 539ndash546 ACM Press July 2008
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 17
[28] C Horoba ldquoAnalysis of a simple evolutionary algorithm for themultiobjective shortest path problemrdquo in Proceedings of the 10thACM SIGEVO Workshop on Foundations of Genetic Algorithms(FOGA rsquo09) pp 113ndash120 ACM Press New York NY USAJanuary 2009
[29] MTheile ldquoExact solutions to the traveling salesperson problemby a population-based evolutionary algorithmrdquo in Proceedingsof the 9th European Conference on Evolutionary Computation inCombinatorial Optimisation (EvoCOP rsquo09) vol 5482 of LectureNotes in Computer Science pp 145ndash155 Springer April 2009
[30] A V Eremeev On Linking Dynamic Programming and Multi-Objective Evolutionary Algorithms Omsk StateUniversity 2008
[31] A V Eremeev ldquoA fully polynomial randomized approximationscheme based on an evolutionary algorithmrdquo Diskretnyi Analizi Issledovanie Operatsii vol 17 no 4 pp 3ndash17 2010
[32] T Friedrich N Hebbinghaus F Neumann J He and CWitt ldquoApproximating covering problems by randomized searchheuristics using multi-objective modelsrdquo in Proceedings of the9th Annual Genetic and Evolutionary Computation Conference(GECCO rsquo07) pp 797ndash804 ACM Press July 2007
[33] E Elbeltagi T Hegazy and D Grierson ldquoA modified shuffledfrog-leaping optimization algorithm applications to projectmanagementrdquo Structure and Infrastructure Engineering vol 3no 1 pp 53ndash60 2007
[34] M Dorigo E Bonabeau and GTheraulaz ldquoAnt algorithms andstigmergyrdquo Future Generation Computer Systems vol 16 no 8pp 851ndash871 2000
[35] D E Goldberg and K Deb ldquoA comparative analysis of selectionschemes used in genetic algorithmsrdquo in Foundations of GeneticAlgorithms pp 69ndash93 Morgan Kaufmann San Mateo CalifUSA 1991
[36] D E GoldbergGenetic Algorithms in Search Optimization andMachine Learning Addison-Wesley Reading Mass USA 1989
[37] H Dong and Y Liang ldquoGenetic algorithms for large join queryoptimizationrdquo in Proceedings of the 9th Annual Genetic andEvolutionary Computation Conference (GECCO rsquo07) pp 1211ndash1218 London UK July 2007
[38] I F Sbalzariniy M Sibylle and P Koumoutsakosyz ldquoMulti-objective optimization using evolutionary algorithmsrdquo Centerfor Turbulence Research Proceedings of the Summer Program2000
[39] E Zitzler and L Thiele ldquoMultiobjective optimization usingevolutionary algorithmsmdasha comparative case studyrdquo in ParallelProblem Solving from Nature pp 292ndash301 Springer Amster-dam The Netherlands 1998
[40] C A C Coello ldquoAn updated survey of GA-basedmultiobjectiveoptimization techniquesrdquo ACM Computing Surveys vol 32 no2 pp 137ndash143 2000
[41] D A Van Veldhuizen and G B Lamont ldquoMultiobjective evolu-tionary algorithms analyzing the state-of-the-artrdquo Evolutionarycomputation vol 8 no 2 pp 125ndash147 2000
[42] F Glover and S Hanafi ldquoTabu search and finite convergencerdquoDiscrete Applied Mathematics vol 119 no 1-2 pp 3ndash36 2002
[43] A C Nearchou ldquoThe effect of various operators on the geneticsearch for large scheduling problemsrdquo International Journal ofProduction Economics vol 88 no 2 pp 191ndash203 2004
[44] W W Chu and P Hurley ldquoOptimal query processing fordistributed database systemsrdquo IEEE Transactions on Computersvol 31 no 9 pp 835ndash850 1982
[45] M Serpell and J E Smith ldquoSelf-adaptation ofmutation operatorand probability for permutation representations in genetic
algorithmsrdquo Evolutionary Computation vol 18 no 3 pp 491ndash514 2010
[46] A Seshadri A Fast Elitist Multiobjective Genetic AlgorithmNSGA-II MATLAB Central 2006
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014