supporting top-k join queries in relational databases by:ihab f. ilyas, walid g. aref, ahmed k....
TRANSCRIPT
![Page 1: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/1.jpg)
Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid
Presented by: Calvin R Noronha (1000578539) Deepak Anand (1000603813)
![Page 2: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/2.jpg)
AGENDAIntroductionMotivationRequirementsOverview of Ripple JoinRank Join Algorithm
ExampleHRJNHRJN*
PerformanceConclusionReferences
![Page 3: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/3.jpg)
IntroductionWhat are top-k queries ?
Ranking queries which order results on some computed score
What are top-k join queries ?Typically these queries involve joins
Usually, users are interested in the top-k join results.
![Page 4: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/4.jpg)
Introduction..Searches are performed using multiple features
Every feature produces a different ranking for the query
Join and aggregate the individual feature rankings for a global ranking.
Answer to a Top-k join query is an ordered set of join results according to some provided function that combines the orders on each input
![Page 5: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/5.jpg)
Example
Find location for a house such that the combination of the cost of the house and 5 years tuition at a nearby school is minimal.
![Page 6: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/6.jpg)
Motivation
Existing join operators decouple join and sorting (ranking) of results.
Sorting is expensive and is a blocking operation.
Sort-merge joins (MGJN) only preserves order of joined column data
Nested-loop joins (NLJN) only orders on the outer relations are preserved through the join
Hash join (HSJN) doesn’t preserve order if hash tables do not fit in memory
![Page 7: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/7.jpg)
Example Ranking Query
SELECT A.1, B.2 FROM A, B, CWHERE A.1 = B.1 and B.2 = C.2ORDER BY (0.3*A.1 + 0.7*B.2)STOP AFTER 5
![Page 8: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/8.jpg)
Requirements• Perform basic join operation
• Conform with current query operator interface
• Use the orderings of its inputs
• Produce top ranked join results ASAP
• Adapt quickly to input fluctuations
![Page 9: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/9.jpg)
Overview of Ripple JoinJOIN : L.A = R.A, L and R are descending, ordered by B
We get a tuple from L and a tuple from R
(L1(1,1,5) R1(1,3,5))
No valid join resultL
R
![Page 10: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/10.jpg)
Ripple Join ..
We get a second tuple from L and a second tuple from R and join with prior tuples, creating all possible combinations
(L2,R2) {(2,2,4),(2,1,4)}
L
R
JOIN : L.A = R.A, L and R are descending, ordered by B
![Page 11: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/11.jpg)
Ripple Join ..
(L2,R2) {(2,2,4),(2,1,4)} is an invalid join result!
(L2,R1) {2,2,4), (1,3,5)} is an invalid join result!
(L1,R2) {(1,1,5), (2,1,4)} is a valid join result!L
R
JOIN : L.A = R.A, L and R are descending, ordered by B
![Page 12: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/12.jpg)
Variations of the Ripple Join
• Rectangular version – obtain tuples from one source at a higher rate than from the other source
• Block Ripple Join– obtain data, b tuples at a time, for classic ripple join b = 1
• Hash Ripple Join –in memory, maintain hash tables of the samples obtained so far
![Page 13: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/13.jpg)
Rank-Join Algorithm1. Generate new valid join combinations
2. Compute score for each valid combination
3. For each incoming input, calculate the threshold value T
4. Store top k in priority queue Lk
5. Halt when lowest value of queue, scorek ≥ T
![Page 14: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/14.jpg)
A Rank-Join Algorithm ExampleSelect * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
Initial Input:
(1). Get a valid join combination using some join strategy
Ripple Select (L1, R1) => Not a valid join result
Next input:(1). Get a valid join combination using some join strategy Ripple Select (L2, R2)
(L2, R2), (L2, R1), (L1, R2) => (L1, R2) is a valid join result
Join Condition
Score
k
![Page 15: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/15.jpg)
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
(2). Compute the score (J) for the result
J1(L1, R2) => L.B + R.B = 5 + 4 = 9
(3). Compute a threshold scoreT = Max ( Last L.B + First R.B, First L.B + Last R.B )
For Ripple Selection (L2, R2) => T = Max ( L2.B + R1.B, L1.B + R2.B ) = Max (4+5, 5+4) = 9
Example Continued ..
![Page 16: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/16.jpg)
Example Continued ..
J1 = 9
T = 9
(4). J1 >= T, so report J1 in top-k results (i.e. add it to list Lk )
Since we need top 2 (k=2), continue until k=2 and Min(J1, J2, …Jk) >= T
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
![Page 17: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/17.jpg)
Example Continued ..
Next input:
1). Get a valid join combination using some join strategyRipple Select (L3, R3)
(L3, R3), (L3, R1), (L3, R2), (L1, R3), (L2, R3)
=> (L3, R3), (L2, R3) are valid join results
(2). Compute the scores (J) for the results
J2(L2, R3) = 4 + 3 = 7 J3(L3, R3) = 3 + 3= 6
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
![Page 18: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/18.jpg)
Example Continued ..
(3). Calculate a NEW threshold T
T = Max ( Last L.B + First R.B, First L.B + Last R.B )
= Max ( L3.B + R1.B , L1.B + R3.B )
= Max(3 + 5, 5 + 3)
= 8
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
![Page 19: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/19.jpg)
Example Continued ..
T = 8
J1(L1,R2) = 9 reported
J2( L2, R3) = 7
J3(L3, R3) = 6
Note, J’s are in descending order
(4). Min (J) = 6 < T so continue
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
![Page 20: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/20.jpg)
Example Continued ..
Next input:(1). Get a valid join combination using some join strategyRipple Select ( L4, R4) => (L4, R1), (L2, R4), (L3, R4) are valid join results
(2). Compute the scores (J) for the results
J(L4, R4) = 7, J(L2, R4) = 6, J(L3, R4) = 5
(3). Calculate a NEW threshold T
T = Max( L4.B+R1.B, L1.B + R4.B ) = Max( 7, 7 ) = 7
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
![Page 21: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/21.jpg)
Example Continued ..
T= 7J1(L1,R2) = 9, J2(L2, R3) = 7, J3(L4, R1) = 7, J3(L3, R3) = 6, J4(L2, R4) = 6, J5(L3, R4) = 5
(4). Min(J1, J2) = 7 >= T (k = 2), so report J2 and STOP
Select * From L, R
Where L.A = R.A
Order By L.B + R.B
Stop After 2
![Page 22: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/22.jpg)
Rank-Join Continued …Join strategy is very crucial
Recommended strategy = Ripple Join Alternates between tuples of the two relations Flexible in the way it sweeps out (rectangular, etc) Retains ordering in considering samples
Variant of Rank-JoinHash Rank Join (HRJN)Block Ripple Join
![Page 23: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/23.jpg)
Hash Rank Join (HRJN) OperatorBuilt on idea of hash ripple join
Inputs are stored in two hash tables
Maintains highest (first) and lowest (last selected) objects from each relation
Results are added to a priority queue
Advantages:Smaller space requirementCan be pipelined
![Page 24: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/24.jpg)
HRJN*
• Score-Guided Strategy
Consider L = 100, 50, 25, 10…. R= 10, 9, 8, 5..For 3 tuples from each input, T = max(108,35) = 108 T1= 108 , T2 =35
For 4 tuples from R and 2 tuples from L,T = max(105,60) = 105 T1= 105 , T2 =60
• If T1 > T2, more inputs should be taken from R and reduce T1.
• Therefore value of Threshold T will reduce => faster reporting of ranked join results.
![Page 25: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/25.jpg)
Performance
![Page 26: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/26.jpg)
Performance
![Page 27: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/27.jpg)
Conclusion
Integrates well with query plans and supports top – k queries
Produces results as fast as possible
Minimizes space requirements
Eliminates needs for sorting after join
![Page 28: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/28.jpg)
References“Supporting top-k join queries in relational databases” - Ihab Ilyas,
Walid Aref, Ahmed Elmagarmid (2004)
Jing Chen : CSE6392 Spring 2005, CSE-UT Arlington http://ranger.uta.edu/~gdas/Courses/ Spring2005/DBIR/slides/top-k_join.ppt
Zubin Joseph : CSE6392 Spring 2006, CSE-UT Arlington http://crystal.uta.edu/~gdas/Courses/Courses/Spring2006/DBExploration/Zubin_Supporting_Top_k_join_Queries.ppt
![Page 29: Supporting Top-k join Queries in Relational Databases By:Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Calvin R Noronha (1000578539)](https://reader035.vdocument.in/reader035/viewer/2022062407/56649c785503460f9492d0f3/html5/thumbnails/29.jpg)
TIME TO ASK QUESTIONS