efficient computation of trade-off skylines

37
Efficient Computation of Trade-Off Skylines In: 13 th International Conference on Extending Database Technology (EDBT). 2010. Christoph Lofi, [email protected] Ulrich GΓΌntzer, [email protected] Wolf-Tilo Balke , [email protected] Presented by Sarath 25/03/2013

Upload: questa

Post on 25-Feb-2016

57 views

Category:

Documents


1 download

DESCRIPTION

Efficient Computation of Trade-Off Skylines . In: 13 th International Conference on Extending Database Technology (EDBT). 2010. Christoph Lofi, [email protected] Ulrich GΓΌntzer, [email protected] Wolf-Tilo Balke , [email protected] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Efficient Computation of Trade-Off Skylines

Efficient Computation of Trade-Off Skylines

In: 13th International Conference on Extending Database Technology (EDBT). 2010.

Christoph Lofi, [email protected] Ulrich GΓΌntzer, [email protected] Wolf-Tilo Balke , [email protected]

Presented by Sarath 25/03/2013

Page 2: Efficient Computation of Trade-Off Skylines

Outline

β€’ Introductionβ€’ Basic Conceptsβ€’ Trade-off Skylinesβ€’ Algorithmsβ€’ Experimental Evaluationβ€’ Conclusion

Page 3: Efficient Computation of Trade-Off Skylines

Introductionβ€’ Selecting alternatives from large amounts of data is primarily reflected by

the top-k retrieval paradigm. β€’ Providing meaningful scoring functions is almost impossible for complex

data. β€’ Subsequently skyline paradigm was adopted.β€’ Do not allow compensation(trade-off), weighting or ranking between

domains.β€’ Skyline result sets tend to be very large. β€’ Proposed an efficient method for computing skylines that allows trade-offs.β€’ Proposed algorithms to speed up retrieval by indexing and pruning the

trees.

Page 4: Efficient Computation of Trade-Off Skylines

Basic Concepts

β€’ Top-K Retrievalβ€’ Recalling Skylines from previous lectureβ€’ Full product orderβ€’ Pareto Skylines

Page 5: Efficient Computation of Trade-Off Skylines

Top-K Query ProcessingExample scenario:-β€’ Consider a user interested in finding a location (e.g., city) where the

combined cost of buying a house and paying school tuition for 10 years at that location is minimum.

β€’ The user is interested in the five least expensive places. β€’ Assume that there are two external sources (databases), Houses and

Schools, that can provide information on houses and schools, respectively

How this works:β€’ For the join results in this example, the scoring function obtains the total

cost of each house-school pair by adding the house price and the school tuition for 10 years.

β€’ The five cheapest pairs constitute the final answer to this query.

Page 6: Efficient Computation of Trade-Off Skylines

Top-K Query Processing

Page 7: Efficient Computation of Trade-Off Skylines

Skylines

Page 8: Efficient Computation of Trade-Off Skylines

β€’ In two dimensional Eucledian space,

β€’ Skyline consists of maximal vectors

Skyline in Eucledian Space

Page 9: Efficient Computation of Trade-Off Skylines

Full Product Order

Definition: full product order P

β€’ For ( 1×…× )Γ—( 1×…× ) and for any 1, 2 ( 1Γ—β€¦π‘ƒβŠ† 𝐷 𝐷𝑛 𝐷 𝐷𝑛 π‘œ π‘œ ∈ 𝐷× ), 1, 2 denoted by 1> 2 is defined by𝐷𝑛 π‘œ π‘œ βˆˆπ‘ƒ π‘œ π‘ƒπ‘œ

βˆ€ 𝑖 ∈ {1,…, } : o1,i o2,i {1,… }: o1,i > o2,i. 𝑛 ≳𝑖 ∧ βˆƒ 𝑖 ∈ 𝑛 𝑖

Page 10: Efficient Computation of Trade-Off Skylines

Pareto Skyline

Definition:β€’ Assume a database relation 1×…× on attributes. Then π‘…βŠ†π· 𝐷𝑛 𝑛

the Skyline of R is defined as.,

β€’ We test whether 1 > 2 holds, by comparing only necessary π‘œ 𝑃 π‘œobjects component wise using the individual attribute preferences.

Page 11: Efficient Computation of Trade-Off Skylines

β€’ Consider a sample Trade-off, β€’ ” I would prefer a car for $18000 with a metallic blue paint job over

a car for $16000 with a plain blue paint job ”. 𝑑1:= $18000, $16000, π‘π‘™π‘’π‘’π‘šπ‘’π‘‘π‘Žπ‘™π‘™π‘–π‘ ⊳ 𝑏𝑙𝑒𝑒‒ The car o1:($17000, , 100 , / ) previously π‘π‘™π‘’π‘’π‘šπ‘’π‘‘π‘Žπ‘™π‘™π‘–π‘ β„Žπ‘ π‘Ž 𝑐

incomparable with o2:($16000, , 100 , / )π‘€β„Žπ‘–π‘‘π‘’ β„Žπ‘ π‘›π‘œ π‘Ž 𝑐‒ After introducing t1 , Car o1 with above characteristics will

dominate the car o2.

Trade-off Skylines

Page 12: Efficient Computation of Trade-Off Skylines

Trade-off Skylines

Definition:β€’ Assuming a set of trade-offs , the resulting skyline with respect to 𝑇

the new domination relationships induced by all the given trade-offs is given by.,

β€’ This new criterion respects all trade-offs in , and also 𝑇 all original Pareto preferences.

Page 13: Efficient Computation of Trade-Off Skylines

Trade-Off Dominance Relationships

β€’ For a set { 1:= ( 1 2)} containing only a single trade-off 𝑇≔ 𝑑 π‘₯ βŠ³π‘¦and given two objects 1 and 2, 1 may either dominate 2 π‘œ π‘œ π‘œ π‘œaccording to the original Pareto semantics or by actually applying the trade-off using both ceteris paribus and Pareto semantics.

β€’ Proposition 1: constructing > with = π’πŸ π“π’πŸ 𝑻 𝟏

Page 14: Efficient Computation of Trade-Off Skylines

Trade-Off Dominance Relationships

Page 15: Efficient Computation of Trade-Off Skylines

β€’ Consider trade-off t1 as

β€’ o1: (($18000, blue metallic, a/c, 80hp))β€’ o2: (($16000, white, non a/c, 80hp))

β€’ Apply the rules! will o1 dominate o2?

Trade-Off Dominance Relationships

Page 16: Efficient Computation of Trade-Off Skylines

Trade-Off Dominance Relationships

Proposition 2: constructing > with = :π’πŸ π“π’πŸ 𝑻 πŸβ€’ Given 1 = 1 1 with 1 and 2 = 2 2 with 2. 𝑑 π‘₯ βŠ³π‘¦ πœ‡ 𝑑 π‘₯ βŠ³π‘¦ πœ‡β€’ Assume 1 and 2 can be applied in sequence 1 2, i.e. 1 𝑑 𝑑 𝑑 ∘ 𝑑 βˆ€π‘– ∈ πœ‡

∩ 2 : 1, 2, . Then, for any 1, 2 1,…, a domination πœ‡ 𝑦 𝑖 ≳𝑖 π‘₯ 𝑖 π‘œ π‘œ ∈𝐷 𝑛relationship 1 >T 2 via a sequence 1 2 holds, if .,π‘œ π‘œ 𝑑 ∘ 𝑑

Page 17: Efficient Computation of Trade-Off Skylines

Trade-Off Dominance Relationships

Page 18: Efficient Computation of Trade-Off Skylines

β€’ Consider two trade-offs,β€’ 𝑑1 = $18000, $16000, π‘π‘™π‘’π‘’π‘šπ‘’π‘‘π‘Žπ‘™π‘™π‘–π‘ ⊳ 𝑏𝑙𝑒𝑒‒ 𝑑2 = , / , / 𝑏𝑙𝑒𝑒 π‘Ž π‘βŠ³ π‘π‘™π‘’π‘’π‘šπ‘’π‘‘π‘Žπ‘™π‘™π‘–π‘ π‘›π‘œ π‘Ž 𝑐‒ o1= (($18000, blue metallic , a/c, 80hp))β€’ o2=(($16000,blue metallic , no a/c, 80hp))

β€’ o1 >T o2 ! Check it out !

Trade-Off Dominance Relationships

Page 19: Efficient Computation of Trade-Off Skylines

Merging Trade-Off Tuples

Definition: The Merge Operator : β†Άβ€’ Let x , z {1, … , }. For and , the tuple πœ‡ πœ‡ βŠ† 𝑛 π‘₯βˆˆπ·πœ‡π‘₯ π‘§βˆˆπ·πœ‡π‘§ π‘₯

is defined component-wise as: ( ) = : β†Ά β†Ά ≔ β„Ž βˆˆπ‘§ π‘₯ 𝑧 𝑒𝑀𝑖𝑑 𝑒𝑖 π‘₯𝑖 𝑖x : ( z )πœ‡ 𝑧𝑖 𝑖 ∈ πœ‡ βˆ– πœ‡π‘₯

Page 20: Efficient Computation of Trade-Off Skylines

Merging Trade-Off Tuples

Proposition 1 (cont.): constructing > with the merge operator for π’πŸ π“π’πŸ 𝑻= πŸβ€’ Consider trade-off t1 as

Page 21: Efficient Computation of Trade-Off Skylines

Merging Trade-Off Tuples

Proposition 2 (cont.): constructing > with the merge-operator π’πŸ π“π’πŸfor = 𝑻 𝟐

β€’ Consider again our example for two trade-offs 1 and 2 as follows: 𝑑 𝑑𝑑1 = $18000, $16000, π‘π‘™π‘’π‘’π‘šπ‘’π‘‘π‘Žπ‘™π‘™π‘–π‘ ⊳ 𝑏𝑙𝑒𝑒𝑑2 = , / , / 𝑏𝑙𝑒𝑒 π‘Ž π‘βŠ³ π‘π‘™π‘’π‘’π‘šπ‘’π‘‘π‘Žπ‘™π‘™π‘–π‘ π‘›π‘œ π‘Ž 𝑐

Page 22: Efficient Computation of Trade-Off Skylines

Merging Trade-Off Tuples

Page 23: Efficient Computation of Trade-Off Skylines

Integrating Trade-Off Chains into Single Trade-offs

Definition: trade-off integration β€’ Assume that 1 ( 1 y1) on 1, and 2 ( 2 y2) on 2, 𝑑 ≔ π‘₯ ⊳ πœ‡ 𝑑 ≔ π‘₯ ⊳ πœ‡ and 1 and 2 can be applied in sequence, i.e. ,𝑑 𝑑 βˆ€π‘–βˆˆ πœ‡1∩ 2 : 1, 2, . πœ‡ 𝑦 𝑖≳𝑖 π‘₯ 𝑖‒ This integrated trade-off is given by: If 1∩ 2 holds 1, 2, then βˆ€π‘–βˆˆ πœ‡ πœ‡ 𝑦 𝑖≳𝑖 π‘₯ 𝑖 𝑑1 2 ( 1 2 y2 1 ) ∘ ≔ π‘₯ β†Άπ‘₯ ⊳ ↢𝑦‒ Integrating trade-off sequences into a single trade-off does not

affect subsequent skyline computation.

Page 24: Efficient Computation of Trade-Off Skylines

A Basic Trade-Off Skyline Algorithm

Page 25: Efficient Computation of Trade-Off Skylines

A Basic Trade-Off Skyline Algorithm

Page 26: Efficient Computation of Trade-Off Skylines

Representing Trade-Off Sequences

Algorithm 3: Basic Algorithm for computing integrated trade-offs

Step 1:β€’ For each trade-off ( 1 y1), test with already included trade-offs ( 2 y2) iff.., π‘₯ ⊳ π‘₯ ⊳ βˆ€π‘–βˆˆ πœ‡1∩ 2 : 1 2 . πœ‡ 𝑦 ≳𝑖 π‘₯Step 2:β€’ If yes , β€’ Create integrated trade-off for the two trade-offs. β€’ For this newly constructed trade-offs, repeat from step 1 to check for further valid sequences.

Page 27: Efficient Computation of Trade-Off Skylines

Representing Trade-Off Sequences

Page 28: Efficient Computation of Trade-Off Skylines

Removing Redundancy from the Tree

Subsumption criterion for : π’•πŸ βŠ† π’•πŸβ€’ For two trade-offs 1 ( 1 1) and 2 ( 2 2), 𝑑 ≔ π‘₯ βŠ³π‘¦ 𝑑 ≔ π‘₯ βŠ³π‘¦ 𝑑1 2, iff 1 β‰₯ 2 1 2 1 β‰₯ 1 2 1 βŠ† 𝑑 π‘₯ 𝑃 π‘₯ β†Άπ‘₯ βˆ§π‘¦ ↢𝑦 𝑃𝑦 βˆ§πœ‡ βŠ†πœ‡

Page 29: Efficient Computation of Trade-Off Skylines

Removing Redundancy from the Tree

β€’ As an example, consider following trade-offs.,

β€’ Consider a blue-metallic car o1 for $18500 and a blue car for $15500 .β€’ Clearly o1 more desirable than o2.However o1 >T o2 via t4 not via t1.

Page 30: Efficient Computation of Trade-Off Skylines

Indexing for Improved Match-Making

Algorithm 5: Indexing for improved match makingβ€’ Idea is to check only those trade-off sequences which potentially

could establish a domination relationship.

β€’ Trade-offs that satisfy..,i. 𝑑≔( ) : 2, π‘₯βŠ³π‘¦ βˆˆπ‘‡βˆ€π‘–βˆˆπœ‡ 𝑦𝑖 ≳𝑖 π‘œ 𝑖ii. 𝑑≔( ) 1, > π‘₯βŠ³π‘¦ βˆˆπ‘‡βˆ€π‘–βˆˆπœ‡π‘–βˆΆπ‘œ 𝑖 𝑖 π‘₯𝑖

Page 31: Efficient Computation of Trade-Off Skylines

Experimental Evaluations

β€’ Randomly generated sets of consistent trade-offs to be integrated. β€’ The underlying database relation has 6 attributes with domains of

about 20 distinct values each (based on values often occurring in e-commerce settings).

β€’ Trade-offs are chosen randomly on two to four of those attributes. β€’ Up to 10 trade-offs are used per set

Page 32: Efficient Computation of Trade-Off Skylines

β€’ Measured the tree sizes (i.e. number of tree nodes) and analyzed them according to different percentage quantiles (e.g. β€œ2%-quantile = 139” means 2% of all trees are smaller than 139 nodes).

β€’ Below table shows the observed results

Trading Tree Size

Page 33: Efficient Computation of Trade-Off Skylines

β€’ For each trade-off set 10,000 pairs of skyline objects were generated and checked for dominance with respect to the trade-off set.

β€’ For each object pair, we used the basic algorithm (Algorithm 2), an algorithm with 1-way-indexing, and the algorithm for 2-way-indexing (Algorithm 5), each with trading trees with and without subsumption respectively. Below table shows the observed results.

Comparison Speed-Up Using Indexes

Page 34: Efficient Computation of Trade-Off Skylines

β€’ Computed some trade-off skylines using two typical real world E-commerce datasets with 998 notebook offers and 1,350 real estate offers.

β€’ For notebook dataset, the Pareto skyline still contains 205 items. β€’ Assuming the user prefers to buy a 15’’ screen size laptop and larger

or smaller screens are less desirable.β€’ Two new trade-offs can be generated i.e., (15’’,cluster avg. values) > (13.3’’,cluster avg. values) (15’’,cluster avg. values) > (13.3’’,cluster avg. values)

Computing Trade-off Skylines

Page 35: Efficient Computation of Trade-Off Skylines

β€’ Incorporating these two trade-offs decreased the skyline size on avg. by 15% to 176 items.

β€’ For real estate dataset of 1,350 offers ,initial skyline set has 81 items.

β€’ Incorporating two trade-offs with respect to space and price reduced skyline set by 30% to 57 items.

Computing Trade-off Skylines

Page 36: Efficient Computation of Trade-Off Skylines

EXPERIMENTAL EVALUATIONS Computing Trade-off Skylinesβ€’ The run time results of all the three algorithms ( naΓ―ve baseline, 1-way indexing, and 2-way indexing ) with subsumption are given in the below figure.

Page 37: Efficient Computation of Trade-Off Skylines

Questions..?

β€’ Conclusion and future work .

Thank you!