skyline snippets - informatik.uni-augsburg.de · • the skyline operator (börzsönyi et. al,...

Post on 10-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Skyline SnippetsMarkus Endres and Werner Kießling

Outline

2

1. Skyline and Preference Queries

2. Skyline Snippets 3. Performance Benchmarks

4. Summary and Outlook

3

1. Skyline Queries

Cal

[mg]

Fat [mg]

0.5 1.0 1.5 2.0

0.0

5

10

15

20

Drink 5

Drink 4

Drink 3

Drink 3

Drink 1

Drink 6

Drink 7

Drink 8

Drink 9

Skyline QueriesSkyline Queries and Pareto Preferences

4

Beverages with lowest calories and lowest fat?

Literature: • On Finding the Maxima of a Set of Vectors (Kung et. al, 1975)• The Skyline Operator (Börzsönyi et. al, 2001)• Foundations of Preferences in Database Systems (Kießling, 2002)

Skyline / Preference SQL query

SELECT *FROM Beverage BPREFERRING B.cal LOWEST AND B.fat LOWEST

‣ Skyline results become large for• high dimensionality (dimensions up to 10 are not uncommon)

• large database relations

‣Computing the full Skyline is time and memory consuming

‣ In many applications a fraction of the full Skyline is sufficient, e.g. Web-Services, Mobile Internet

‣ State of the Art: • Full Skyline: BNL, LESS, Hexagon / Lattice Skyline, ...

Algorithm with and without indexes.

• Progressive Skyline: BBS, Bitmap, PDS, ...Highly specialized indexes necessary.

Skyline QueriesMotivation

5

‣ Skyline queries are a subset of Pareto preference queries

‣ Preference: strict partial order on dom(A) means: I like y more than x

‣ Preference selection of a preference P

σ[P ](R) := {t ∈ R | ¬∃t� ∈ R : t <P t�}

Skyline QueriesPreference Background (Kießling)

6

x <P y

Skyline / BMO-set / Winnow

<P

‣ Weak Order Preference (WOP)Dominance test by a numerical utility functionwhich depends on the type of preference

‣ Base preference constructorsLOWEST, HIGHEST, POS, NEG, ...

The d-parameter allows the partitioning of the range of domain values

Skyline QueriesPreference Background (Kießling)

7

fP : dom(A) → R+0

x <P y ⇐⇒ fP (x) > fP (y)

P:=LOWESTd(A)

P:=HIGHESTd(A)fP (x) :=

�x−min

d

fP (x) :=

�max−x

d

‣ Complex preference constructors, e.g. Pareto (Skyline)

For weak order preferences P1 = (A1, <P1), . . . , Pm = (Am, <Pm), a Paretopreference is defined as

P := ⊗(P1, . . . , Pm) = (A1 × · · ·×Am, <P )

(x1, . . . , xm) <P (y1, . . . , ym) ⇐⇒∃i ∈ {1, . . . ,m} : fPi(xi) > fPi(yi) ∧∀j ∈ {1, . . . ,m}, j �= i : fPj (xj) ≥ fPj (yj)

Skyline QueriesPreference Background (Kießling)

8

A tuple is said to dominate another tuple if it is better in at least one dimension and not worse in all other dimensions.

‣Taxonomy of Base Preference Constructors

‣Complex Preference Constructors• Equal importance: Pareto

• More important: Prioritization

• Weighted importance: Rank, ...

POS NEG LOWESTd HIGHESTd

EXPLICIT

POS/POS POS/NEG AROUNDd

LAYEREDm BETWEENd

SCOREd

CONTAINS GEO PREFERENCE

NEARBYd

WITHINd BUFFERd

ONROUTEd

Skyline QueriesPreference Constructor - An Overview

9

Skyline QueriesHigh Dimensional Preference Query

10www.trial.PreferenceSQL.comA Demo of Preference SQL is available at

A high dimensional preference query

SELECT r.id, r.name, FROM restaurant r, city_map c PREFERRING

c.location NEARBY <lat>, <lon>, 1000 ANDc.ascent LESS THAN 200, 20 ANDr.cuisine IN (`Italian`, `Mexican`) NOT IN (`German`) ANDr.priceCategory NOT IN (`Expensive`, `Luxury`) ANDr.rating BETWEEN `2star` AND `3star` ANDr.ambient IN (`pleasant`) ANDr.waitingTime LOWEST ANDr.customerFriendly HIGHEST

11

2. Skyline Snippets

Skyline Snippets

are a general method to computea fraction of the full Skylinewithout any index structure

Skyline Snippets

12

13

2.1 Pareto k-partition

‣Sub-preference: a lower-dimensional Pareto preference (similar to the concept of subspace Skylines)

‣ Example

Sample sub-preferences

Skyline QueriesSub-Preferences

P := ⊗(P1, P2, P3)

• P {P1,P2} := ⊗(P1, P2)

• P {P1,P3} := ⊗(P1, P3)

• P {P2,P3} := ⊗(P2, P3)

‣ A k-partition of is a decomposition of P into k disjoint Pareto sub-preferences such that

‣ Example:A few partitions of are

P := ⊗(P1, . . . , Pm)

⊗(P1, . . . , Pm) = ⊗(P I1 , . . . , P Ik)

P = ⊗(P1, . . . , P4), k = 2

15

Skyline SnippetsPareto k-Partition

• P = ⊗(P {P1,P2}, P {P3,P4})

• P = ⊗(P {P1,P3}, P {P2,P4})

• P = ⊗(P {P1}, P {P2,P3,P4})

16

2.2 The Skyline Snippets Algorithm

‣The Skyline Snippets Theorem• Given a Pareto preference

and a k-partition

• Let be the Skyline on a relation R.

⊗(P I1 , . . . , P Ik)

S := σ[P ](R)

1. Let Sk =�k

i=1 σ[PIi ](R), then

• σ[P ](Sk) �= ∅• σ[P ](Sk) ⊆ S

σ[P ](Sk) is called a k-snippet of the skyline S.

2. Let Lk =�k

i=1 σ[PIi ](R). If Lk �= ∅, then Lk ⊆ S.

17

Skyline Snippets

P := ⊗(P1, . . . , Pm)

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S18

Skyline SnippetsExample

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S19

Skyline SnippetsExample

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S20

Skyline SnippetsExample

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S21

Skyline SnippetsExample

‣Example 1: , only LOWEST preferences on RP := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1,P2}](R) = {t1, t4}– σ[P {P3,P4}](R) = {t2}

• The 2-snippet is: σ[P ]({t1, t4} ∪ {t2}) = {t1, t2} ⊆ S22

Skyline SnippetsExample

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

23

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

24

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

25

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

‣Example 2: , only LOWEST preferences on R

P := ⊗(P1, . . . , P4)

Table 1: Sample data set.R ID A1 A2 A3 A4

t1 0 0 1 0t2 0 1 0 0t3 1 0 0 1t4 0 0 1 1t5 1 0 1 0

26

Skyline SnippetsExample

• The Skyline is S := {t1, t2, t3}

• 2-partitions

– σ[P {P1}](R) = {t1, t2, t4}– σ[P {P2,P3,P4}](R) = {t1, t2, t3, t5}

• Lk = {t1, t2, t4} ∩ {t1, t2, t3, t5} �= ∅ ⇒ LK = {t1, t2} ⊆ S

27

Skyline SnippetsThe Skyline Snippets Algorithm (SSA)

Note:Line 4 can be done in parallel in multi-core architectures.

28

3. Performance Benchmarks

Performance Benchmarks

29

‣ SSA Algorithm vs. • Hexagon (Lattice Skyline) Preisinger, Kießling: The Hexagon Algorithm for Pareto Preference Queries (2007)

• Progressive Hexagon

‣ Implementation in Preference SQL• Java Framework for preference queries on conventional databases• Oracle 11g database

‣ Experiments • Synthetic data sets: ANTI, COR, IND (Data generator, Börzsönyi 2001)

• Vary data cardinality, number of distinct values, d-parameter

0

10

20

30

40

50

60

70

80

90

2 3 4 5 6 7 8 9 10

Ru

ntim

e in

se

c

Dimension m

HexagonSSA

Performance Benchmarks

30

Benchmark 1: Computation time Hexagon vs. SSA

• Pareto preference, only LOWEST preferences (MIN)• Hexagon computes full Skyline, whereas SSA computes a few Skyline points• n = 500K tuples, domain size c = 100K, d_value d = 10K

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

2 3 4 5 6 7 8 9 10

Runtim

e in s

ec

Dimension m

HexagonSSA

ANTI COR

Benchmark 2: Progressive Hexagon vs. SSA

• Pareto preference: • Stop progressive Hexagon after it has computed as many Skyline points as SSA• k-partitions k = 2, 4, 8 to evaluate the influence of the partitions • n = 500K tuples, domain size c = 100K, d_value d = 10K• Full Skyline size: 5902

⊗(P1, . . . , P8)

Table 1: Hexagon (prog.) vs. SSA (ANTI).

#Skylines sec #Skylines sec #Skylines sec

Hexagonp 3801 6.22 1075 5.95 419 5.29

SSA 3801 3.81 1075 0.812 419 0.198

k = 2 k = 4 k = 8

Performance Benchmarks

Benchmark 3: Number of Skyline points computed by Hexagon and SSA

• Pareto preference: • m/2-partitions• Hexagon computes full Skyline• n = 500K tuples, domain size c = 100K, d_value d = 10K

Table 1: Skyline points computed by Hexagon and SSA (ANTI).

m # Skyline points σ[P ](Sk) P {P1,P2} P {P3,P4} P {P5,P6} P {P7,P8}

4 12312 1348 1211 1394 - -6 18771 2851 1378 1631 1299 -8 24432 5495 1812 1919 1058 1403

Table 2: Skyline points computed by Hexagon and SSA (COR).

m # Skyline points σ[P ](Sk) P {P1,P2} P {P3,P4} P {P5,P6} P {P7,P8}

4 3126 982 706 703 - -6 8931 117 516 621 581 -8 11026 1131 643 681 657 597

Performance Benchmarks

⊗(P1, . . . , Pm), m = 4, 6, 8

33

4. Summary and Outlook

Summary and Outlook

34

Summary

‣ Too many Skyline points in high-dimensional space‣ Skyline evaluation on high-dimensional space is time and memory consuming‣ Some Snippets of the full Skyline often sufficient, e.g. Mobile Internet, Web Services‣ Skyline Snippets algorithm (SSA) without any specialized index structure‣ Very fast computation of some Skyline points

Summary and Outlook

35

Outlook

‣ Extended performance benchmarks investigating• Influence of the different types of preference constructors• Performance impact of different k-partitions

‣ Development of heuristics for choosing k-partitions

36

Thank you for your attention!

Questions ?

{endres,kiessling}@informatik.uni-augsburg.de

top related