6/15/20151 top-k algorithms finding k objects that have the highest overall grades

39
01/21/22 1 Top- Top- k k algorithms algorithms Finding k objects that have the highest overall grades

Post on 20-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 1

Top-Top-kk algorithms algorithmsTop-Top-kk algorithms algorithms

Finding k objects that have the highest overall grades

Page 2: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 2

Top-k queryGiven – a relation R (id, x1, x2, x3) and – a query Q: sum(x1, x2, x3)

Find k tuples with highest grades according to Q.

id x1 x2 x3

a 0.3 0.6 0.7

b 0.2 0.3 0.4

c 0.4 0.5 0.9

d 0.7 0.6 0.1

R

Top-2 tuples

sum

1.6

0.9

1.8

1.4

Page 3: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 3

Problem formulation 1• Given

– A relational table R (id, x1, x2, …, xm)

– A query Q (monotone function)

• Find top-k tuples according to Q

Page 4: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 4

Problem formulation 2• Given

– A relational table R (id, x1, x2, …, xm)

– A materialized view V (id, scorev) over R

– A query Q (monotone function)

• Find top-k tuples according to Q

Page 5: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 5

Topics of Discussion• Fagin’s algorithm (FA)• Threshold algorithm (TA)

– No Random Accesses algorithm (NRA)

• Prefer

Page 6: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 6

Topics of Discussion

• Fagin’s algorithm (FA)• Threshold algorithm (TA)

– No Random Accesses algorithm (NRA)

• Prefer

Page 7: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 7

Finding top –k with FA • Do sorted access (in parallel) to each of

the lists Xi until at least k objects are seen in each of the lists

• For each object t seen, do random accesses to the rest of the lists

• Compute Q (t) for each object seen. Y is the set having k objects seen with the highest grades

Page 8: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 8

FA example• Find top-2 with Q: min(x1, x2)

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

Sorted Χ1 Sorted Χ2R

ID X1 X2

a 0.9 0.85

b 0.8 0.7

c 0.72 0.2

.

.

.

.

.

.

.

.

.

.

.

.

d 0.6 0.9

Page 9: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 9

FA example • STEP 1

– Read attributes from every sorted list– Stop when k objects have been seen in common from all lists

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

Χ1 Χ2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

. c

ID Χ1 Χ2 min(x1,x2)

a

d

0.9

0.9

0.85

b 0.8

0.72

0.7

Page 10: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 10

FA example • STEP 2

– Random access to find missing grades

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

Χ1 Χ2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

. c

ID Χ1 Χ2 min(x1,x2)

a

d

0.9

0.9

0.85

b 0.8

0.72

0.7

0.6

0.2

Page 11: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 11

c

ID Χ1 Χ2 min(x1,x2)(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

Χ1 Χ2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85

b 0.8

0.72

0.7

0.6

0.2

0.85

0.6

0.7

0.2

FA example • STEP 3

– Compute the grades of the seen objects.– Return the k highest graded objects.

Page 12: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 12

Topics of Discussion

• Fagin’s algorithm (FA)• Threshold algorithm (TA)

– No Random Accesses algorithm (NRA)

• Prefer

Page 13: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 13

Finding top –k with TA • Do sorted access (in parallel) to each of the

lists Xi and random accesses to the other lists. Compute Q (t) for every object t seen. Remember k highest objects.

• For each list Xi let xi be the last grade seen. Compute threshold value τ = Q(x1, …, xm). Halt when at least k objects have grade ≥ τ

• Y is the set having k objects seen with the highest grades

Page 14: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 14

TA example• Find top-2 with Q: min(x1, x2)

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

Sorted Χ1 Sorted Χ2R

ID X1 X2

a 0.9 0.85

b 0.8 0.7

c 0.72 0.2

.

.

.

.

.

.

.

.

.

.

.

.

d 0.6 0.9

Page 15: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 15

ID Χ1 Χ2 min(x1,x2)

Step 1: - parallel sorted access to each list

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

X1 X2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85 0.85

0.6 0.6

For each object seen: - get all grades by random access - determine min(x1,x2) - amongst 2 highest seen ? keep in

buffer

TA example

Page 16: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 16

ID X1 X2 min(x1,x2)(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

Χ1 Χ2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

Step 2: - Determine threshold value based on objects currently seen under sorted access. τ = min(x1, x2)

a

d

0.9

0.9

0.85 0.85

0.6 0.6

T = min(0.9, 0.9) = 0.9

- 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1

TA example

Page 17: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 17

ID X1 X2 min(X1,X2)

Step 1 (Again): - parallel sorted access to each list

(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

X1 X2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

a

d

0.9

0.9

0.85 0.85

0.6 0.6

For each object seen: - get all grades by random access - determine min(x1,x2) - amongst 2 highest seen ? keep in

buffer

b 0.8 0.7 0.7

TA example

Page 18: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 18

ID X1 X2 min(x1,x2)(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

X1 X2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

Step 2 (Again): - Determine threshold value based on objects currently seen. τ =min(X1, X2)

a

b

0.9

0.7

0.85 0.85

0.8 0.7

τ = min(0.8, 0.85) = 0.8

- 2 objects with overall grade ≥ threshold value ? stop else go to next entry position in sorted list and repeat step 1

TA example

Page 19: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 19

ID Χ1 Χ2 min(x1,x2)(a, 0.9)

(b, 0.8)

(c, 0.72)

(d, 0.6)

.

.

.

.

Χ1 Χ2

(d, 0.9)

(a, 0.85)

(b, 0.7)

(c, 0.2)

.

.

.

.

Situation at stopping condition

a

b

0.9

0.7

0.85 0.85

0.8 0.7

τ = min(0.72, 0.7) = 0.7

TA example

Page 20: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 20

Topics of Discussion

• Fagin’s algorithm (FA)• Threshold algorithm (TA)

– No Random Accesses algorithm (NRA)

• Prefer

Page 21: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 21

Finding top –k with NRA

• Do sorted access (in parallel) to each of the lists– Maintain last grades seen xi

– For every object t compute Wt and Bt

– Topk = {k objects with highest W} and Mk = kth highest W

– Viable object when Bt >Mk, t belongs in R

• Halt when Bt ≤ Mk for all objects not in Topk

Page 22: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 22

Define W and B• Lower bound W=(x1, x2,…,xl,0,…0)

• Upper bound B=(x1, x2, …,xl,xl+1,..)

• E.g. f(x1, x2, x3)=x1+x2+x3

x1

a:0.7

.

.

.

x2

a:0.8

.

.

.

x3

d:0.9

.

.

.

•Wa=(0.7, 0.8, 0) = 1.5

•Ba=(0.7, 0.8, 0.9) = 2.4

Page 23: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 23

NRA examplelis

ts s

ort

ed b

y

score

f:0.6 d:0.6 q:0.9

n:0.5 g:0.6 d:0.7

q:0.4 c:0.6 j:0.3

d:0.3 a:0.6 p:0.2

e:0.2 q:0.5 m:0.1

r:0.1 e:0.3 b:0.1

h:0.1

Χ1 Χ3Χ2

•Find top-2 with Q: sum(x1, x2, x3)

Page 24: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 24

NRA examplelis

ts s

ort

ed b

y

score

f:0.6 d:0.6 q:0.9

n:0.5 g:0.6 d:0.7

q:0.4 c:0.6 j:0.3

d:0.3 a:0.6 p:0.2

e:0.2 q:0.5 m:0.1

r:0.1 e:0.3 b:0.1

h:0.1

Χ1 Χ3Χ2

Nk=2.1 ≤ Mk=0.6

ID BW

q

d

f

0.90.6

2.1

0.6

2.12.1

Topk

Page 25: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 25

NRA examplelis

ts s

ort

ed b

y

score

f:0.6 d:0.6 q:0.9

n:0.5 g:0.6 d:0.7

q:0.4 c:0.6 j:0.3

d:0.3 a:0.6 p:0.2

e:0.2 q:0.5 m:0.1

r:0.1 e:0.3 b:0.1

h:0.1

Χ1 Χ3Χ2

Nk=1.9 ≤ Mk=0.9

ID BW

q

d

f

1.30.9

1.8

0.6

2.01.9

g

n

0.6

1.8

1.8

0.5

Topk

Page 26: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 26

NRA examplelis

ts s

ort

ed b

y

score

f:0.6 d:0.6 q:0.9

n:0.5 g:0.6 d:0.7

q:0.4 c:0.6 j:0.3

d:0.3 a:0.6 p:0.2

e:0.2 q:0.5 m:0.1

r:0.1 e:0.3 b:0.1

h:0.1

Χ1 Χ3Χ2

Nk=1.5 ≤ Mk=1.3

ID BW

q

d

f

1.31.3

1.9

0.6

1.71.5

n

g 0.6

1.4

1.3

0.5

c

j

1.3

0.6

1.3

0.3

Topk

Page 27: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 27

NRA examplelis

ts s

ort

ed b

y

score

f:0.6 d:0.6 q:0.9

n:0.5 g:0.6 d:0.7

q:0.4 c:0.6 j:0.3

d:0.3 a:0.6 p:0.2

e:0.2 q:0.5 m:0.1

r:0.1 e:0.3 b:0.1

h:0.1

Χ1 Χ3Χ2

Nk=1.4 ≤ Mk=1.3

ID BW

q

d

f

1.61.3

1.6

0.6

1.91.4

a

g 0.6

1.1

1.1

0.6

c

n

1.1

0.6

1.3

0.5

p

j 0.30.2

1.21.1

Topk

Page 28: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 28

NRA examplelis

ts s

ort

ed b

y

score

f:0.6 d:0.6 q:0.9

n:0.5 g:0.6 d:0.7

q:0.4 c:0.6 j:0.3

d:0.3 a:0.6 p:0.2

e:0.2 q:0.5 m:0.1

r:0.1 e:0.3 b:0.1

h:0.1

Χ1 Χ3Χ2

Nk=1.2 ≤ Mk=1.6

ID BW

d

q

f

1.81.6

1.8

0.6

1.61.2

a

g 0.6

0.9

0.9

0.6

c

n

0.9

0.6

1.1

0.5

p

j 0.30.2

1.00.9e

m

0.2

0.80.

10.8

Topk

Page 29: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 29

Topics of Discussion• Fagin’s algorithm (FA)• Threshold algorithm (TA)

– No Random Accesses algorithm (NRA)

• Prefer

Page 30: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 30

Finding top –k with PREFER

• Step1: View selection algorithm– materializes a number of ranked views V of the

relation R and uses them to efficiently answer preference queries Q.

• Step2: Pipelined algorithm– Define 1st watermark– Output first tuples according to 1st watermark – Define 2nd watermark– Output second tuples according to 2nd watermark– …

Page 31: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 31

)(tf(t)fT(t)fR,t 1vqq

1v,qv

Finding top –k with PREFER

• Determine watermark – How deep in V we must go to output the

top result tuple tq1

• such that– if t in V is below then t can’t be tq

1

since tv1 has higher score over Q

1v,qT

1v,qT

1v,qT

t fv(t)V

Watermark 1

v,qT

Page 32: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 32

Finding top –k with PREFER

• Determine tq1 according to

– Scan V from top and retrieve prefix [tv1, tv

2,…, tv

w) where tvw first tuple in V with score

less than

– Order prefix according to Q, [tq1,…, tq

w-1]. Let tq

s be the position of tv1 according to Q.

1v,qT

1v,qT

t fv(t)

a 0.9

b 0.8

c 0.7

d 0.5

V

Watermark

=0.651

v,qTtv

1

tv2

tv3

Order according to Q

tq1

tq2

tq3

a=tv1

c=tv3

b=tv1 =tq

s

Page 33: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 33

PREFER example

X3X2X1ID fq(t)fv(t)

g

f

e

d

5512

51015

12105

81015

5.76.4

99

10.19.8

9.910.2

c 121817 16.115.4

b 112020 17.316.4

a 201710 17.216.8

Find top-4 with:

fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3

t1

Watermark=14.26

1. Calculate Watermark for t1, which is 14.26

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

a 201710 17.216.8

b

a

ID

b 112020 17.316.4

Page 34: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 34

PREFER example

X3X2X1ID fq(t)fv(t)

g

f

e

d

5512

51015

12105

81015

5.76.4

99

10.19.8

9.910.2

c 121817 16.115.4

b 112020 17.316.4

a 201710 17.216.8

a 201710 17.216.8

b

a

ID

b 112020 17.316.4

t1

1. Calculate Watermark for t1, which is 13.1

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Find top-4 with:

fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3

Page 35: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 35

PREFER example

X3X2X1ID fq(t)fv(t)

g

f

e

d

5512

51015

12105

81015

5.76.4

99

10.19.8

9.910.2

c 121817 16.115.4

b 112020 17.316.4

a 201710 17.216.8

a 201710 17.216.8

b

a

c

ID

b 112020 17.316.4

t1

1. Calculate Watermark for t1, which is 13.1

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Find top-4 with:

fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3

Page 36: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 36

PREFER example

X3X2X1ID fq(t)fv(t)

g

f

e

d

5512

51015

12105

81015

5.76.4

99

10.19.8

9.910.2

c 121817 16.115.4

b 112020 17.316.4

a 201710 17.216.8

a 201710 17.216.8

b

a

c

ID

b 112020 17.316.4

t1

1. Calculate Watermark for t1, which is 8.3

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Find top-4 with:

fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3

Page 37: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 37

PREFER example

X3X2X1ID fq(t)fv(t)

g

f

5512

51015

5.76.4

99

e 12105 10.19.8

d 81015 9.910.2

c 121817 16.115.4

b 112020 17.316.4

a 201710 17.216.8

a 201710 17.216.8

b

a

c

d

e

ID

b 112020 17.316.4

t1

1. Calculate Watermark for t1, which is 8.3

2. Find prefix of view with fv greater than watermark value and sort them by fq

3. Output tuples up to t1

4. Repeat using first unprocessed as t1

Find top-4 with:

fv(t)=0.2*X1+0.4*X2+0.4*X3 fq(t)=0.1*X1+0.6*X2+0.3*X3

Page 38: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 38

Citations• Ronald Fagin, Amnon Lotem, Moni Naor. Optimal aggregation

algorithms for middleware. J. Comput. Syst. Sci. 66(4), pp. 614-656, 2003.

• Ronald Fagin. Combining fuzzy information from multiple systems. In Proc. of the 15th ACM Symposium on principles of database systems, pp. 216-226, Montreal Canada, 1996.

• Ronald Fagin. Fuzzy queries in multimedia database systems. In Proc. of the 17th ACM Symposium on principles of database systems, pp. 1-10, Seattle USA, 1998.

• Ulrich Güntzer, Wolf-Tilo Balke, Werner Kießling. Optimizing Multi-Feature Queries for Image Databases. In proc. of the 26th VLDB conference, pp. 419-428, Cairo Egypt, 2000.

• Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou. PREFER a system for the efficient execution of multi-parametric ranked queries. In Proc. of the ACM Special Interest Group on Management of Data Conference (SIGMOD), pp. 259-270, Santa Barbara USA, 2001

• Vagelis Hristidis, Yannis Papakonstantinou. Algorithms and applications for answering ranked queries using ranked views. VLDB journal, 13(1), pp. 49-70, 2004.

• Surya Nepal, M. V. Ramakrishna. Query Processing Issues in Image (Multimedia) Databases. In Proc. 15th International Conference on Data Engineering (ICDE), pp. 22-29, Sydney Australia, March 1999.

Page 39: 6/15/20151 Top-k algorithms Finding k objects that have the highest overall grades

04/18/23 39

Questions