supporting top-k join queries in relational databases

35
Supporting top-k join queries in relational databases Ihab Ilyas, Walid Aref, Ahmed Elmagarmid Presented by Jing Chen

Upload: dexter

Post on 05-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Supporting top-k join queries in relational databases. Ihab Ilyas, Walid Aref, Ahmed Elmagarmid Presented by Jing Chen. 1. Motivation. Select A.1, B.2 From A, B, C Where A.1=B.1 and B.2 = C.2 Order By (0.3*A.1 + 0.7*B.2) Stop After 5. Traditional Joins. Nested Loop Merge Join - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Supporting top-k join queries in relational databases

Supporting top-k join queries in relational databases

Ihab Ilyas, Walid Aref, Ahmed Elmagarmid

Presented by Jing Chen

Page 2: Supporting top-k join queries in relational databases

1. Motivation

• Select A.1, B.2

From A, B, C

Where A.1=B.1

and B.2 = C.2

Order By (0.3*A.1 + 0.7*B.2)

Stop After 5

Page 3: Supporting top-k join queries in relational databases

Traditional Joins

• Nested Loop

• Merge Join

• Hash Join

Page 4: Supporting top-k join queries in relational databases

2. Goals/Contribution

• Propose a pipe lined Rank-Join algorithm

• Analyze the I/O cost of the algorithm

• Implement the algorithm

• Propose a optimal join strategy

• Evaluate performance

Page 5: Supporting top-k join queries in relational databases

3. Ripple JoinJOIN : L.A = R.A L and R are descending ordered by B

(L1(1,1,5) R1(1,3,5))L

R

Page 6: Supporting top-k join queries in relational databases

3. Ripple Join --contd

JOIN: L.A = R.A

(L2,R2) {(2,2,4),(2,1,4)}

L

R

Page 7: Supporting top-k join queries in relational databases

Ripple Join --contd

JOIN: L.A = R.A

(L2,R2) {(2,2,4),(2,1,4)}

(L2,R1) {2,2,4), (1,3,5)}

(L1,R2) {(1,1,5), (2,1,4)}

L

R

Page 8: Supporting top-k join queries in relational databases

3. Variation Of Ripple Join

Rectangle

Block

Hash

Page 9: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

(1). Get a valid combination using any certain algorithm

Ripple Select (L1, R1) => No Result

Page 10: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

(1). Get a valid combination using any certain algorithm

Select (L2, R2)

(L2, R2), (L2, R1), (L1, R2) => (L1, R2)

Page 11: Supporting top-k join queries in relational databases

4. Proposed Algorithm -- ContdSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

(2). Compute the score (J) for the result

J1(L1, R2) => L.B + R.B = 5 + 4 = 9

Page 12: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

(3). Compute a water mark score (T) by

Max {(Last L).B + (First R.B), (First L).B + (Last R).B}

Page 13: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

(3). Compute a water mark (T) score by

Max {(Last L).B + (First R.B), (First L).B + (Last R).B}

Selection (L1, R1) , (L2, R2) =>

T = Max (L2.B + R1.B, L1.B + R2.B)

=Max (4+5, 5+4) = 9

Page 14: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

J1= 9

T = 9

J1 >= T

Report J1

Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) > T

Page 15: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

Select (L3, R3)

(L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3)

=> (L3, R3), (L2, R3)

J2(L2, R3) = 4 + 3 = 7

J3(L3, R3) = 3 + 3= 6

Page 16: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

Calculate T

T = Max { (Last L).B + (First R).B, (First L).B+ (Last R).B}

= Max {L3.B + R1.B , L1.B + R3.B}

= Max(3 + 5, 5 + 3)

= 8

Page 17: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

T = 8

J1(L1,R2) = 9 reported

J2( L2, R3) = 7

J3(L3, R3) = 6

Note, J’s are in descending order

Min (J) = 6 < T Continue

Comment: Calculate T before J is more efficient. Can stop after find first Jk >= T

Page 18: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

Select (L4, R4) => (L4, R1), (L2, R4), (L3, R4)

J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5

T= Max(L4.B+R1.B, L1.B + R4.B) = Max(7, 7) = 7

Page 19: Supporting top-k join queries in relational databases

4. Proposed AlgorithmSelect * From L, R

Where L.A = R.A

Order By L.B + R.B

Stop After 2

T= 7J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5Min(J1, J2) = 7 >= T (k = 2)Comment: When reach all records, T does not need to be calculated, unless, calculate T first, and compare each J(i) with T immediately

Page 20: Supporting top-k join queries in relational databases

4. The Algorithm

Page 21: Supporting top-k join queries in relational databases

5. Implementation

• Hash rank join operator (HRJN)

- Use Hash Ripple Join

- Two hash table contain the two inputs

- A queue holds ordered join results

- Ltop, Rtop, Lbottom, Rbottom are used

to calculate T

Page 22: Supporting top-k join queries in relational databases

5. Implementation contd

• Issues with HRJN

Buffer problem

Local Ranking Problem

Page 23: Supporting top-k join queries in relational databases

5. Implementation contd

• Use Block Ripple Join to Solve Local Ranking Problem. (block size = 2)

Page 24: Supporting top-k join queries in relational databases

5. Implementation contd• HRJN* score-guided join strategy

- How to select next (block) tuple

T1 = Ltop + Rbottom , T2 = Lbottom + Rtop

T = Max(T1, T2)

If T1 > T2, need to reduce T1. How?

Page 25: Supporting top-k join queries in relational databases

5. Implementation contd• HRJN* score-guided join strategy

- How to select next (block) tuple

T1 = Ltop + Rbottom , T2 = Lbottom + Rtop

T = Max(T1, T2)

If T1 > T2, need to reduce T1. How?

Reduce Rbottom and not reduce Lbottom

(descending ordered), thus more tuples should be retrieved from R to reduce T1

Page 26: Supporting top-k join queries in relational databases

6. Join Order

• When more than two tables join, the join order matters. (A and C have high similarity)

Page 27: Supporting top-k join queries in relational databases

6. Join Order -- contd

• Rank-Join order heuristic

- Get a ranked sample, top S ranked list from L and R

- Calculate the similarity using footrule

Where L(i) and R(i) are the rank of object i in L and R

Page 28: Supporting top-k join queries in relational databases

6. Join Order -- contd

Page 29: Supporting top-k join queries in relational databases

7. Generalizing the rank-join

• Using indexes

• Eliminate duplications

• Faster termination

Page 30: Supporting top-k join queries in relational databases

8. Performance Evaluation

Selectivity = 0.2 and m= 4

Page 31: Supporting top-k join queries in relational databases

8. Performance Evaluation

Selectivity = 0.2 and m= 4

Page 32: Supporting top-k join queries in relational databases

8. Performance Evaluation

Selectivity = 0.2 and m= 4

Page 33: Supporting top-k join queries in relational databases

m = 4 and K=50

Page 34: Supporting top-k join queries in relational databases

m = 4 and K=50

Page 35: Supporting top-k join queries in relational databases

m = 4 and K=50