query optimization in object databases
DESCRIPTION
Query Optimization in Object Databases. Georges GARDARIN. Laboratoire PRiSM/UVSQ. G. Gardarin. 1. Introduction. Object models provide ADTs, inheritance, complex structures, relationships and object identity - PowerPoint PPT PresentationTRANSCRIPT
1 G. GardarinG. Gardarin
Query Optimization in Object Query Optimization in Object DatabasesDatabases
Georges GARDARIN
Laboratoire PRiSM/UVSQ
2 G. Gardarin
1. Introduction 1. Introduction
Object models provide ADTs, inheritance, complex structures, relationships and object identity
Query Optimizers transform query in query plans composed of low level operations to evaluate on the object collections
New techniques are required for supporting the object-oriented features
3 G. Gardarin
OutlineOutline
Object Query Languages Complex Object Algebra Operator Algorithms Query Plan Transformations Cost Models Search Strategies Open Problems
4 G. Gardarin
OverviewOverview
Presentation of the various topics of Query Processing in OODBMSs
Topics are not independent Operators depend on data structures (index) Search strategies depend on cost model
An optimizer has to consider all aspects Complex piece of software Has to be extensible for additional features
data types access methods operators
5 G. Gardarin
VocabularyVocabulary
Collection : a set, list, array or bag of objects Query : a user query in high level language Predicate : a term of a query criteria Qualification : a logical expression of predicates Operator : a low level accessor to 1 or several collections Annotation : the selected algorithm for executing an
operator Query plan : a program of annotated operator Cluster : a group of related objects stored together in a
bucket Index : an accelerator by value of an attribute Path index : an accelerator by values along a path
6 G. Gardarin
2. Object Query Languages2. Object Query Languages
Extension of SQL with : user defined functions in predicates and results user defined comparison predicates path expressions to traverse relationships flattening, grouping and degrouping operators automatic scan of inheritance hierarchies
Two “standards” are under construction : The object standard of ODMG (OQL) The object-relational standard of ISO/ANSI (SQL3)
7 G. Gardarin
Database Example (1)Database Example (1)
Vehicle
Company
Employee
String String
Number Color
Maker
String
String
Name
City
President
Float
String Int
Ssn
Name
BirthDate
8 G. Gardarin
Query ExampleQuery Example
Object identity : SELECT E.Name, C.Name FROM Employee E, Company C WHERE C.President == E
Paths and method : SELECT Number FROM Vehicle WHERE Color = "Red" AND Vehicle.Maker.City= "Paris" AND Vehicle.Maker.President.age() < 50
9 G. Gardarin
Database Example (2)Database Example (2)
Company
Person
Vehicle
Employs
Owns
Name
Name
Number Power
Age
City
String String
String
IntInt
Int
10 G. Gardarin
Qualified Path ExpressionQualified Path Expression
OQL form :SELECT C.Name, P.Name, V.Number
FROM C IN Companies, P IN C.Employs, V IN P.Owns
WHERE C.City="Paris" AND P.Age<30 AND V.Pow>10
Direct form :SELECT C.Name, P.Name, V.Number
FROM Companies C, Persons P, Vehicles V
WHERE
C[City="Paris"].Employs.P[Age<30].Owns.V[Pow>10]
11 G. Gardarin
Exercice QueriesExercice Queries
Express in OQL, then with qualified path expressions, a set of given queries.
12 G. Gardarin
3. Complex Object Algebra3. Complex Object Algebra
Generalization of relational algebra Set-oriented processing of objects A set of operations on collections of objects
generating collections of objects Different types of collections :
class extent, set, bag, list, array
Any query can be expressed as a complex object algebra expression
Logical algebra annotated for execution
13 G. Gardarin
The LORA AlgebraThe LORA Algebra
SearchOp
FilterMap
SetOp GroupOp
- Union- Intersect- Difference
- RemoveDup- Aggregate- Nest- Unnest
Join Sort
LoraOp
UpdateOp TransactOp
RJoin VJoin
Finance and Gardarin 1991
14 G. Gardarin
Main Operator SignaturesMain Operator Signatures
JOIN:Col,Exp,Col =>Col OUTER_JOIN: Col,Exp,Col =>
Col SORT: Col, Exp => Col AGG: Col, Exp, Exp => Col NEST: Col, Nest_Exp => Col UNNEST: Col, Nest_exp =>Col RDUPLICATE: Col, Exp =>
Col FILTER: Col, Qual =>Col MAP: Col, Exp,Qual => Col
MINUS: Col, Col => Col DIVIDE: Col, Col =>Col UNION: Col, Col => Col OUTER_UNION: Col, Col =>
Col INTERSECT: Col, Col=> Col UPDATE: Col, Col, Ident,
Assignement =>Col INSERT: Col, Col => Col DELETE: Col, Col, Ident =>
Col
15 G. Gardarin
Algebraic TreeAlgebraic Tree
EmployeeCompany
Filter(age()<50,*)
RJoin(President)
Filter(City="Paris",*)
Filter(Color="Red")
Vehicle
RJoin(Maker)
Filter(*,Number)
SELECT NumberFROM Vehicle
WHERE Color = "Red"
AND Vehicle.Maker.City= "Paris"
AND Vehicle.Maker.President.age() < 50
16 G. Gardarin
The ENCORE Algebra (1)The ENCORE Algebra (1)
Shaw and Zdonik 1990
Select(InputCollection, p) = {s(s in InputCollection) p(s)}
Image(InputCollection, f : T) = {f(s)s in InputCollection}
Project(InputCollection,<(A1, f1), ...,(An, fn)>) = {<A1 : f1(s), ...,An : fn(s)>(s in InputCollection)}
17 G. Gardarin
The ENCORE Algebra (2)The ENCORE Algebra (2)
Nest(InputCollection,Ai) = {<A1 : s.A1, ...,Ai : t, ...,An : s.An>r s (r in t s in InputCollection s.Ai = r)}
UnNest(InputCollection,Ai) = {<A1 : s.A1, ...,Ai : t, ...,An : s.An>s in InputCollectiont in Ai}
Flatten(InputCollection) = {rt in InputCollection r in t}
DupEliminate (InputCollection) Coalesce (InputCollection, Ai)
18 G. Gardarin
The ENCORE Algebra (3)The ENCORE Algebra (3)
OJoin(InputCollection1,InputCollection2,A1,A2,p)={<A1 : s, A2 : r>s in InputCollection1 r in
InputCollection2 p(s,r)}
Set-oriented operations : Union Intersection Difference
with set membership based on object identity
19 G. Gardarin
The OFL OperatorsThe OFL Operators
Gardarin & Machucca 1995 Navigational traversal often interesting :
Existential quantification Better control of query plans, smaller granularity
Mixing navigational and set-oriented traversal Based on Bachus’ functional approach Processing of collections of objects Side effect introduced through cursors
20 G. Gardarin
The OFL LanguageThe OFL Language
Definition : Abstract Collection A container of objects encapsulated by a finite set of
behavioral and traversal functions.
Constructions : Composition f.g (x) = f(g(x)) Path expressions f0.f1....fn(x) Conditional If_Then_Else (p, f1, f2) (x) Iteration While (p,f) Sequence Sequence(f1, f2, …, fn)
21 G. Gardarin
Collection Traversal in OFLCollection Traversal in OFL
Quantified function Apply to all A second order function of signature ForAll(C, p, f) that
applies a function f to all objects of a collection C satisfying a predicate p.
Quantified function Apply to any A second order function of signature ForAny(C, p, f) that
applies a function f to any object of a collection C satisfying a predicate p.
Iterator and Annotations Each quantified function works on an iterator Set-oriented or navigational traversal is selected
22 G. Gardarin
Person VehicleOwner Composed
String
Part
PartLabelLastName
Color
String
BirthyearSalary
Price
Database ExampleDatabase Example
23 G. Gardarin
Translating Query in OFLTranslating Query in OFL
SELECT p.lastname FROM p in Person WHERE exists v in p.owner : v.color = "Red"
ForAll(Person P, null,
ForAny(Owner(P) V, StringEqual(Color(V),"Red"), LastName(P) ) );
24 G. Gardarin
A More Complex QueryA More Complex Query
OQL Query SELECT tuple(p.lastname, v.price, c.partlabel) FROM p in Person, v in p.owner, c in v.composed WHERE p.age = 16 and v.price=c.price
OFL translation : ForAll(Find(AgeIndex,16) P, null, ForAll(Owner(P) V, null, ForAll(Composed(V)T,IntegerEqual(Price(V),Price(T)), Tuple (LastName(P), Price(V), PartLabel(T))))));
25 G. Gardarin
Further OperatorsFurther Operators
Recursive operators FixPoint(ResultCollection,
InitializationExpression,RecursivePredicate, RecursiveExpression, FinalExpression)
gives the OFL program: Sequence(OFLInitializationExpression,
While(OFLRecursivePredicate, OFLRecursiveExpression),
OFLFinalExpression)
26 G. Gardarin
Exercice AlgebraExercice Algebra
Write in OFL the definition of LORA operations exemple :
Join(InputCollection1,InputCollection2,ResultCollection, JoinPredicate, ProjectionExpression)
gives the OFL program: ForAll(InputCollection1,null,
ForAll(InputCollection2, OFLJoinPredicate,
InsertResultCollection(ResultCollection,
OFLProjectionExpression))))
27 G. Gardarin
4. Algebraic Operator Algorithms4. Algebraic Operator Algorithms
Classical relational operators still valid ... Filtering with a predicate (Restriction)
Sequential scan Index scan, clustered or non-clustered
Value-based join Nested loop join :
iterate on the outer collection and compare each outer object with each object in the inner collection
Merge join : sort on join fields the two collections and then merge
Hash join : hash the outer collection on join fields, scan the inner table and probe the
hashed collection
28 G. Gardarin
Path TraversalsPath Traversals
Paths may involve multiple collections Each collection can be qualified by predicates
29 G. Gardarin
Depth-First-FetchDepth-First-Fetch
Depth-First-Fetch (DFF) is the natural algorithm for evaluating a path expression.
It follows the path from the root to the target collection, using a depth first graph traversal algorithm.
The corresponding operator is an n-ary operator denoted DFF.
Advantages : no intermediate results, simple pointer chasing result are assembled one at a time allowing pipeline efficient when the memory size is large enough to avoid
swapping of objects
30 G. Gardarin
Breadth-First-FetchBreadth-First-Fetch
Breadth-First-Fetch (BFF) traversal processes the tree of objects using a Forward Join (FJ) algorithm which is based on pointer chasing between two collections.
Successive binary joins of collections are performed from the source collection to the target, following the path in a forward order.
Advantages : no multiple fetch of objects requires the construction of hashed support table to memorize
FJ results
31 G. Gardarin
Reverse-Breadth-First-FetchReverse-Breadth-First-Fetch
Reverse-Breadth-First-Fetch (RBFF) performs a sequence of binary joins between two neighbor collections to traverse the path, but it proceeds in the reverse order of the path.
Thus, each join is called a Reverse Join (RJ). The join criterion is the member-ship of the second collection object identifier to the first collection pointer attribute values.
Advantages : efficient when predicate(s) in last collection(s) selective requires supporting tables and value-based joins
32 G. Gardarin
Illustration of BFF & RBFFIllustration of BFF & RBFF
RJ
A(Oid) C(Oid)
a1 a2a3
a4
a5
a6a7
c5c2c1c3c2c4c1
a2
Tb
A B
FJA(Oid) D(Oid)
a1 a2a3
a4
a5
a6 a6
d5d7d1d2d8d4d2
a1
Tc
Tb C
FJ
a6
E(Oid) D(Oid)e6
a2e1e4e4e7
e5
d2d5d4d7d4d2d1
e2
Te
DE
RJ
Ta
Tc Te
A(Oid) E(Oid)
a1 a2
(a) (b)
(c) (d)
a3 e5a4 e7a6 e4a6 e1
a1 e4e2
a6 e6
33 G. Gardarin
Further AlgorithmsFurther Algorithms
Various algorithms are available for each operator Combination of operators can be applied :
to traverse long paths to derive new algorithms
hash both & sort buckets & merge buckets limited breadth-first-fetch
Cost is dependent of many factors : physical organization of objects size of collections selectivity of predicates available memory size possible degree of parallelism
34 G. Gardarin
Exercice AlgorithmsExercice Algorithms
List all the possible annotated query plans to process the query : SELECT C.Name, P.Name, V.Number FROM C IN Companies, P IN C.Employs, V IN P.Owns WHERE C.City="Paris" AND P.Age<30 AND V.Pow>10
over the database schema :
Company
Person
Vehicle
Employs
Owns
Name
Name
Number Power
Age
City
String String
String
IntInt
Int
35 G. Gardarin
5. Query Plan Transformations5. Query Plan Transformations
Query rewrite : Algebraic rewrite of query tree semantic transformations based on properties of data types and
integrity constraints syntactic transformation based on properties of operators
Query planning : Selection of best algorithms annotation of logical operators with selected algorithms cost of an annotated algorithm often dependent of result of
previous algorithm e.g., no sort needed if result sorted
Query rewrite and query planning are not independent
36 G. Gardarin
Extensible OptimizersExtensible Optimizers
Closed Optimizer set of operators and transformations fixed heuristic-based or cost-based selection of plans efficient but hard to modify and extend e.g., Oracle 7.3, SQL Server 10, ...
Extensible Optimizer extensible set of operators and transformations rule-based generation of query plans selection of "best" plan using a search strategy e.g., Exodus, Starbust and DB2 CS, Esprit EDS & IDEA,
Illustra, ...
37 G. Gardarin
Rewrite Rule BaseRewrite Rule Base
STRATEGY
Common Expression Detection
Syntactic
Optimization
Semantic
Optimization
Predicate
Simplification
Common Expression Detection
Syntactic
Optimization
Semantic
Optimization
Predicate
Simplification
Modular
Rule
Base
From Gardarin, Finance DKE 93
Query Plan
Optimized Query Plan
Cost model
Heuristics
38 G. Gardarin
Syntactic Rewrite Rules (1)Syntactic Rewrite Rules (1)
Restrict through Union Pushing Rule :Restrict(Union(C1,C2)) <=>
Union(Restrict(C1,C2))
Restrictions through Super Class Pushing Rule :Restrict (Super(C1,C2)) <=> Super(Restrict(C1),Restrict(C2))
39 G. Gardarin
Syntactic Rewrite Rules (2)Syntactic Rewrite Rules (2)
Join CommutativityJoin (C1,C2) <=>
Join (C2,C1)
Join Associativity(C1 Join C2) Join C3) <=>
C1 Join (C2 Join C3)
Restrict through Join Pushing RuleRestrict(Join(C1,C2)) <=>
Join(Restrict(C1),Restrict(C2))
40 G. Gardarin
Planning RulesPlanning Rules
Join method choice JoinNL (C1,C2) <=> JoinSM (C1,C2)
JoinHP (C1,C2) <=> JoinSM (C1,C2)
Depth First Fetch introduction DFF(C1,C2,C3) <=> Join(C1,Join(C2,C3))
Index Scan introductionScan(C1,P) <=> Scan(IScan(C1,I(P)),P~I(P))
41 G. Gardarin
Semantic RulesSemantic Rules
Integrity constraints Type(x) = Square <=> Type(x) = Polygon and large(x) = long(x)
User function properties draw(x+y) = draw(x) + draw(y)
42 G. Gardarin
What Rule Language ?What Rule Language ?
Rules are often complex to express Conditions on qualifications, operators, results, ...
Proposed rule languages : C rewriting procedure [Exodus, Starbust]
if <C procedure> is true then <C procedure> Practical but hard to extend optimizer
Side effective rule language [Finance91] WHEN <Query Expression> IF <Condition> THEN <Query Expression'> UNDER <Action>
Complex to implement for pattern matching OQL Query equivalence [Florescu95]
Parametrized Query ~ Parametrized Query Lack of generality (e.g., query planning not possible)
43 G. Gardarin
Choice of Best Query PlanChoice of Best Query Plan
Query Plan
Generator
Algebraic
Tree
Database
Schema
Query
Plans
action: { }cost: floatgoak: boolean
Search
Strategy Cost Model
Transformation
Rule base
"Best" Query Plan
44 G. Gardarin
Exercice RulesExercice Rules
Given a linear path expression from collection 1 to i, determine the number of distinct query plans to process the query, assuming that 3 algorithms are available to process any path expression (DFF, BFF, RBFF)
Give the rule base to generate those plans
C1 Ci-1C2 Ci........
45 G. Gardarin
6. Cost Models6. Cost Models
Extension of relational cost model to handle : Object identifiers Path indexes Object linking and embedding Clustering
Takes into account CPU cost and I/O cost : CPU cost = * Number of examined objects I/O cost = * Number of pages read
46 G. Gardarin
Collection ParametersCollection Parameters
|C| = number of pages of collection C ||C|| = number of objects of collection C |Ci| = number of pages of cluster i of collection C
||Ci|| = number of objects of cluster i of collection C
SC = average object size in collection C
SProj = average size of projection result
M = available memory size Sel(Qual) = selectivity of qualification Qual Sel(Pred) = selectivity of indexed predicate Pred
47 G. Gardarin
I/O Scan FormulasI/O Scan Formulas
Sequential scan I/O cost =I/OScan + I/OResult
I/OScan = |C| I/OResult =Sel(Qual)*|C|*SProj/SC - M if > 0 else 0
Unclustered index scan I/O cost = I/OIndex + I/OHit+I/OResult
I/OIndex = Blevel(I) I/OHit = Yao(|C|,||C||,Sel(Pred)*|C|)
Clustered index scan I/O cost = I/OIndex + I/OHit+I/OResult
I/OHit = Sel(Pred)*|C|
48 G. Gardarin
Object ClusteringObject Clustering
Clustering par classe Regroupement de toutes les instances d'une même classe dans
un même fichier
Clustering par composition Regroupement d' un objet d'une classe avec un ou plusieurs de
ses objets composants. Placement adapté aux parcours de chemin
Clustering aléatoire les objets sont placés dans l'ordre de leur création, dans un
espace unique.
49 G. Gardarin
Clustered Collection CasesClustered Collection Cases
Cluster objects on disk Reduce the number of IO’s
(Placement trees are represented by directed graphs)
COMPANY
Default clustering
Simple clusteringCOMPANY
PRODUCT
50 G. Gardarin
More Clustering Cases More Clustering Cases
COMPANY
PRODUCT COMMAND
COMPANY
COMMAND
PROPOSAL
5 10
Conjunctive clustering
Disjunctive clustering
51 G. Gardarin
Clustering : Statistics on PartitionsClustering : Statistics on Partitions
SA : Average Size of Object : Average Size of ObjectSp : Available Page Size: Available Page SizeDA,B : Average Number of distinct references: Average Number of distinct references.....
• Can be maintained by the system• Can be evaluated
How many pages will have to be loaded to scan the collection A ?
IIAII: Cardinality: CardinalityIAI: Number of disk blocks: Number of disk blocks
52 G. Gardarin
Clustering : Statistics ExampleClustering : Statistics Example
A ClA XA, B
Sp
SA
B ClB B A * DA,B
Sp
SB
A ClAB
A X A,B
Sp
SClAB
if SClAB S p
A X A,B if SClAB
Sp
A ClA if SClAB Sp
else
A XA ,B if ZA ,B * SB Sp
A XA, B * Z A, B * SB
Sp
if Z A, B * SB Sp
with SClAB SA Z A, B * SB
B ClAB
+
+
ICOMPANYI
IPRODUCTI
53 G. Gardarin
Yao function : Yao( IICII , ICI , k : number of selected objects )
returns the number of block hits
Yao' function : sum Yao functions applied on each involved cluster
Given a clustered collection C and p the number of partitions to be scanned, we have :
where ki is the number of objects to be selected in cluster i
Yao’ : Number of Clustered Block HitsYao’ : Number of Clustered Block Hits
Yao' (C,k) Yao Ci , Ci , ki i1
p
54 G. Gardarin
Yao’ : ExampleYao’ : Example
•x in Companies, x.asset>100 000 kF
•x in Companies, x.asset>100 000 kF and x.product.year<1980
•x in Companies, x.asset>100 000 kF and x.product.year<1980and x.product.command.N°<1000
COMPANY
PRODUCT COMMAND
Yao( ) + Yao( ) + Yao( ) + Yao( )
Yao( ) + Yao( )
Yao( )
55 G. Gardarin
I/O Join FormulasI/O Join Formulas
Nested loops ||C1|| + ||C1||*||C2||
Merge join cost(sort(C1)) + cost(sort(C2)) + cost(merge(C1,C2)) +
cost(Result) *||C1||*log||C1|| + *||C2||*log||C2|| + ||C1||+||C2|| + ...
Hash join cost(hash(C1)) + cost(scan(C2)) +|C2|cost(probe(C1))+
cost(Result) ||C1|| + ||C2|| + ...
56 G. Gardarin
Parameters for LinksParameters for Links
fanC1,C2 = average number of references from a C1 object to a C2 object
DC1,C2 = number of distinct references from a C1 object to a C2 object
XC1,C2 = number of C1 objects having no reference to C2 object
ZC1,C2 = average number of distinct references from C1 objects having at least one reference to C2 object
ZC1,C2 = DC1,C2 * ||C1|| / (||C1||- XC1,C2 )
57 G. Gardarin
I/O Path Traversal FormulasI/O Path Traversal Formulas
Cost of DFF [Gardarin, Gruser, Tang 96] Large memory (no swap)
Small memory (worst case)
DFF is efficient with large memory
C fan Cj Cjj
i
i
nSeli1 1
11
1* ( , ) *
C yao Ci Ci Xii
n1
2
( , , )
58 G. Gardarin
Exercice Cost ModelExercice Cost Model
Compare the I/O costs of DFF, BFF and RBFF Discuss the advantage of each of them according to
memory size and predicate selectivity
59 G. Gardarin
Search
Strategy
RandomizedEnumerative
AugmentationHeuristic
SimulatedAnnealing
GeneticAlgorithm
from Lanzelotte 1992
IterativeImprovement
TabuSearch
ExhaustiveSearch
Mixte (2 phases)
7. Search Strategies7. Search Strategies
60 G. Gardarin
Exhaustive SearchExhaustive Search
Function Exhaustive(Query)
p:= Parse(Query) ; // Set the initial plan
S := {p} ; // S is the set of all investigated plans
while not StopCond()
{ p' := Transform (p) ; // Apply a transformation rule
if p' S then
p ::= p';
Insert (S, p') ; // Maintain the set of investigated plans
}
}
return Optimal(S) ; // Select best plan among all generated plans
61 G. Gardarin
Illustration of ESIllustration of ES
Parse(Query)
Up to StopCond(Exhausted time)
r1 r2 r3 r4
r5
r6r7r8r9
r10
r11 SELECTMINIMAL COST
PLAN
62 G. Gardarin
Classical AmeliorationsClassical Ameliorations
Reduce search space and control rule selection Select profitable/best move at each step Introduce a gain estimator for each rule Apply only rules with best estimators Avoids loops and applying rules in both directions
Such approaches find only a local minimum Risk of fall into a hole Iterative improvement minimize risk
63 G. Gardarin
Iterative Improvement SchemeIterative Improvement Scheme
II randomly chooses an initial processing tree. It then accepts only those downhill moves.
This is called local optimization.
When the local condition is reached, II picks up a new random state, and performs local optimization from that state.
The process is repeated until a stopping condition is met.
The global minimum is the best local minimum found till now.
64 G. Gardarin
Iterative Improvement ProcedureIterative Improvement Procedure
Procedure II() {
p = Initialize(); // set an initial state, i.e., pick a random PT for evaluating the query
OptimalPlan = p; // Initialize optimal plan
while not(stopping_condition) do { // Loop for global optimization (on various initial states)
while not(local_condition) do { // Loop for local optimization
p’ = move(p); // Apply a valid transformation to p
if (Cost(p’)<Cost(p)) then p = p’; // Keep plan if less costly
}
if cost(p)<cost(OptimalPlan) then OptimalPlan = p; // Select optimal plan
p = RandomPlan ; // Move to next random selected plan
}
Return(OptimalPlan);
}
65 G. Gardarin
Illustration of I.I.Illustration of I.I.
Parse(Query)
Profitabler1
Profitabler2
SELECTMINIMAL COST
PLAN
Profitabler'1
Profitabler"1
Profitabler"2
Rand(Parse(Query)) Rand(Rand((Parse(Query)))
66 G. Gardarin
P e t t t t temperature (cos ( ) cos ( ))/1
Simulated Annealing SchemeSimulated Annealing Scheme
SA also starts at a random processing tree and generates the next state by applying a transformation rule on the current processing tree.
Differently from II, SA accepts both downhill and uphill moves. Uphill moves are allowed with the probability
The parameter temperature decreases when the inner block reaches an equilibrium point. Thus the uphill moves are being accepted with less and less probability.
When a stopping condition is satisfied, the best traversed plan is selected as optimal.
67 G. Gardarin
Simulated Annealing ProcedureSimulated Annealing Procedure
Procedure SA() {
p = Initialize(); // set an initial state, i.e., pick a random PT for evaluating the query
OptimalPlan = p; // Initialize optimal plan
T=T0; // Initialize temperature
while not(stopping_condition) do { // Loop for global optimization (on various initial states)
while not(equilibrium) do { // Loop for local optimization
p’ = move(p); // Apply a valid transformation to p
delta=cost(p’)-cost(p); // Compute differential cost
if (delta<0) then p = p’; // If cost reduced pick new plan
if (delta>0) then p = p’ with probability e-delta/T // If cost increased, accept if hot
if cost(p)<cost(OptimalPlan) then OptimalPlan = p; // Maintain optimal plan
}
T=reduce(T); // Reduce temperature
}
Return(OptimalPlan); }
68 G. Gardarin
Illustration of S.A.Illustration of S.A.
Parse(Query)
SELECTEDPLAN
Profitabler1
Non Profit.r2
Profitabler3
Profitabler4
Non Profit.r5
Profitabler6
Cost
Moves
69 G. Gardarin
Tabu Search SchemeTabu Search Scheme
TS is a general meta-heuristic procedure for global optimization, which performs an aggressive exploration of the state space ( best possible move, with restriction list)
TS starts from a randomly generated initial state, and repeatedly performs moves from a state to a neighbor one.
At each iteration the procedure generates a subset V* of the set N(S) of the neighbors of the current state S and select the best.
The subset does not contain any state which is recorded in the Tabu list. This avoids the cycling or at least reduces its probability. The Tabu list is updated each time the current state is updated. This forbids moves which should bring back to a previous explored state.
70 G. Gardarin
Tabu Search ProcedureTabu Search Procedure
Procedure TS() {
p = Initialize(); // set an initial state, i.e., pick a random PT for evaluating the query
OptimalPlan = p; // Initialize optimal plan
T = ; // initialize Tabu list
while not(stopping condition) do { // global loop
generate the set V*N(S)-T by applying move(S); // All move accepted except tabu
choose the best solution p V*; // Pick best move
T= (T-(oldest)) {p}; // Update the tabu list by removing oldest plans and adding picked
if cost(p)<cost(OptimalPlan) then OptimalPlan = p; // Maintain optimal plan
}
return(OptimalPlan);
}
71 G. Gardarin
Comparison of StrategiesComparison of Strategies
Cost of Best Plan
0200 400 600 800 1000 1200 1400
SA
II Swap
II Join exchange
II is the best with goodrandom sampling
Tabu looks attractive
Tabu
Number of moves
Tabu
72 G. Gardarin
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm (GA) is a non-gradient optimization algorithm used for the search of local extremes (minimum or maximum) of functions with many variables and functional extremes.
These functions are usually defined on very complex and discrete domains.
The basic idea of GA is to use principles of evolution of organisms in nature.
Instead of working on one particular solution at a time, it considers a population of solutions.
73 G. Gardarin
GA PrincipleGA Principle
Initialisation
Mutation
Sort
Evaluation
Crossover
Selection
Terminate
Yes
No
74 G. Gardarin
GA PhasesGA Phases
Initialization - randomly generate an initial small population of solutions (i.e., processing trees) from the whole search space.
Mutation - choose one solution (i.e., processing tree) from the population, and apply transformation rules to it.
Crossover - randomly choose two solutions from the population, and exchange their common subtrees in order to generate two new processing trees.
Evaluation - for each solution, evaluate the value of its fitness function (i.e., cost function),
Sort - sort all solutions according to their cost values. Selection - choose certain number of the best solutions from the
result of Sort as the parents of the next generation. Termination - check termination criteria for stopping the
optimization.
75 G. Gardarin
Gene Base for 5 collectionsGene Base for 5 collections
DFF/PI (0, 4)
DFF/PI (0, 3)
DFF/PI (1, 4)
DFF/PI (0, 2)
DFF/PI (1, 3)
DFF/PI (2, 4)
FJ/PI/RJ (0, 1)
1 2 3 4
DFF/PI (2, 0)
FJ/PI/RJ (1, 2)
FJ/PI/RJ (2, 3)
FJ/PI/RJ (3, 4)
0
FJ/PI/RJ (1, 0)
FJ/PI/RJ (2, 1)
FJ/PI/RJ (3, 2)
FJ/PI/RJ (4, 3)
DFF/PI (3, 1)
DFF/PI (4, 2)
DFF/PI (3, 0)
DFF/PI (4, 1)
DFF/PI (4, 0)
76 G. Gardarin
Mutation OperatorMutation Operator
DFF
C DA B
RJ
DFFA
B C D
FJ
A B
FJ
B A
RJ
CFJ
A B
RJ
AFJ
B C
DFF
A B C
PI
A B C
Reverse Link
77 G. Gardarin
Crossover operatorCrossover operator
A
crossover
RJ
FJ
DFF
DFF
B C D
E F G
RJ
RJFJ
RJFJ
FJ F G
A B
C
D E
78 G. Gardarin
Improved GAImproved GA
Initialisation
Mutation
Sort
Evaluation
Crossover
Selection
Terminate
Yes
No
Random Gene Generator
79 G. GardarinG. Gardarin
GA ProcedureGA ProcedureProcedure GA() {
Generate the initial population : Popu[BasePopu] ; // Initialize the base population at randomSort(Popu) ; // Sort population of PTs on increasing cost OptimalPlan = Popu[0] ; // Keep the best traversed plan While not (stopping_condition) do {
Percent = 0;While Percent < Part * BasePopu do { // Apply Crossover to Part of the population
p1 = Popu[Random(BasePopu)] ; // Randomly choose p1 and p2 from Popup2 = Popu[Random(BasePopu)] ;Crossover(p1, p2) ; // Apply Crossover if possiblePercent = Percent + 2 ; }
For the rest in Popu do Mutation ; // Apply Mutation for the rest of the populationFor (i=0 ; i < NewPopu ; i++) do evaluate(Popu[i]) ; // Compute cost for new populationSort(Popu) ; // Sort population of PTs on increasing cost if (Popu[0] < OptimalPlan) then OptimalPlan = Popu[0] ; // Keep the best traversed plan// Optional replacement of the worst elementsPercent = 0 ; i = BasePopu ; // Initialize for replacementWhile Percent < Repl * BasePopu do { // Apply replacement to Repl of the population
p = RandomPlan() ; // Pick a random PT for insertion in PopuPopu[i] = p ; // Replace worst plan by pi = i-1 ; // Prepare for next replacementPercent = Percent + 1 ; } }
return(OptimalPlan) ;}
80 G. GardarinG. Gardarin
8. Open Problems8. Open Problems
Efficient control of rule applications What is the best strategy (Genetic ?) Simple but accurate rule gain estimator (Priorities ?)
Estimation of method costs Statistics : keep average and variations at each call Revelation : the user provide a cost estimate attribute Disencapsulation : understand the method code Problem : late binding complexifies the estimation
Querying bulk types Collections may be list, array, trees or matrices Ordered collections may requires additional operators Cost model should be extended to capture aggregates
81 G. GardarinG. Gardarin
Exercice StrategiesExercice Strategies
Discuss the advantages and inconvenients of each search strategy
Compare them in case of a small rule base and a large rule base
82 G. GardarinG. Gardarin
ConclusionConclusion
The Query Optimizer is a key component of a DBMS Relational techniques can be generalized
Object algebra Cost model Search strategies
New techniques are required for : Extensibility of data types Optimizing path expressions Optimizing method calls Optimizing bulk data types Path index maintenance and access
Not much is done on new features in OODBMS
83 G. GardarinG. Gardarin
For More Informations (1)For More Informations (1)
Finance, Gardarin, IEEE DE 91 LORA Algebra and EDS extensible optimizer
Finance, Gardarin, DKE 93 Rule Language for extensible optimizer
Gardarin,Gruser, Tang, VLDB95 Cost model for OODBs : Scan, DFF, validation on O2
Gardarin,Gruser, Tang, VLDB96 Analytical & experimental comparisons of DFF,BFF,RBFF
Gardarin, in Advances in OO DB Systems, Springer94 Object rule language, optimisation of recursive updates &
extension of LORA to recursion
84 G. GardarinG. Gardarin
For More Informations (2)For More Informations (2)
Mitchell, Zdonik, Dayal, in Advances in OO DB Systems , Springer V. 94 Optimization of OO query languages : Problems and
Approaches
Cluet, Delobel, SIGMOD 92 A General Framework for the Optimization of Object-Oriented
Queries
Kemper, Moerkotte, VLDB 90 Advance Query Processing in An Object Bases Using Access
Support Relations
Lanzelotte, Valduriez, VLDB 91 Extending the Search Strategy in a Query Optimizer
85 G. GardarinG. Gardarin
Path IndexPath Index
Multi-index [Gemstone] les chemins sont des séquences de variables appartenant à la structure des objets. Les index sont définis pour chaque lien du chemin les index représentent des identifiants d'objet ou les valeurs des variables. Implémentés comme des B+ trees.
Nested index [Bertino 89] une seule entrée définie pour toute la longueur d'un chemin. Accès au début seulement en connaissant la fin du chemin. Utile seulement si l'on connait parfaitement le chemin et qu'il est utilisé souvent. Difficile à maintenir.
Path index [Bertino 89] [Kemper 90] associe la fin du chemin avec tous les suffixes du chemin il fonctionne avec des sous-chemins Implémenté comme des relations :
chaque colonne d'un tuple correspond à un pas du chemin chaque champs du tuple contient un identifiant d'objet ou une valeur.