reverse top- k queries
DESCRIPTION
Reverse Top- k Queries. Akrivi Vlachou * , Christos Doulkeridis * , Yannis Kotidis # , Kjetil Nørvåg * *Norwegian University of Science and Technology (NTNU), Trondheim, Norway # Athens University of Economics and Business (AUEB), Greece. Outline. Motivation & Preliminaries - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/1.jpg)
Reverse Top-k Queries
Akrivi Vlachou*, Christos Doulkeridis*, Yannis Kotidis#, Kjetil Nørvåg*
*Norwegian University of Science and Technology (NTNU), Trondheim, Norway#Athens University of Economics and Business (AUEB), Greece
![Page 2: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/2.jpg)
2
Outline
Motivation & PreliminariesMonochromatic Reverse Top-k QueriesBichromatic Reverse Top-k Queries
Threshold-based AlgorithmMaterialized Views
Experimental EvaluationConclusions & Future Work
![Page 3: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/3.jpg)
3
Rank-aware Query Processing
Huge amount of available data
Users prefer to retrieve a limited set of k ranked data objects that best match their preferences (top-k queries)
![Page 4: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/4.jpg)
4
Top-k Query
Given a scoring function f(), retrieve the k object that best match the user preferences
Linear scoring function
f w(p) = Σw[i]*p[i]
Weight w[i]: relative importance of attribute i
Definition TOPk(w): Given a
weighting vector w and a positive integer k, find the k data points p with the minimum f(p) scores
Query line of w at point p: defines the score of pQuery space of w defined by point p: number of enclosed points determines the rank of p
![Page 5: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/5.jpg)
5
From the perspective of manufacturers: it is important that a
product is returned in the highest ranked positions for as many user preferences as possible
estimate the impact of a product compared to their competitors products
advertise a product to potential customers
Reversing the Top-k Query
sales representative
customer customer customer customer
Which customers would be interested?
![Page 6: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/6.jpg)
6
Reverse top-k query: Given a potential product q
and a positive integer k, which are the weighting vectors w for which q is in the top-k query result set?
Two different versions Monochromatic: no knowledge of user
preferences Bichromatic: a dataset with user
preferences is given
Reversing the Top-k Query
sales representative
customer customer customer customer
![Page 7: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/7.jpg)
7
Car Database Example
A database containing information about different cars Different users have different preferences Bob prefers a cheap car, and does not care much about the age
the best choice (top-1) for Bob is the car p1 with score 2.5 Tom prefers a newer car rather than a cheap car
the best choice for Tom and Max is the car p2
![Page 8: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/8.jpg)
8
Car Database Example
Query point q=p2, k=1: Bichromatic reverse top-k: {(0.2,0.8), (0.5,0.5)}
advertise product to Tom and Max Monochromatic reverse top-k: line segment w[price]=[1/7,5/6]
estimate the impact of p2 as 69%
Query point q=p3, k=1: empty result set for the bichromatic query
![Page 9: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/9.jpg)
9
Outline
Motivation & PreliminariesMonochromatic Reverse Top-k QueriesBichromatic Reverse Top-k Queries
Threshold-based AlgorithmMaterialized Views
Experimental EvaluationConclusions & Future Work
![Page 10: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/10.jpg)
10
Monochromatic Reverse Top-k Query
mRTOPk(q): Given a point q, a positive number k and a dataset S, the result set of the monochromatic reverse top-k query is the locus for which there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p).
The solution space W can be split into a finite set of non-adjacent partitions such that query point q has the same rank for all the weighting vectors.
For the monochromatic case: we focus on the 2-d space
Solution space
2
2
mRTOP1(q)
1
![Page 11: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/11.jpg)
11
Geometric Interpretation d=2, k =1
If q belongs to the convex hull, then there exists exactly one partition in mRTOP1(q)
Weighting vectors that are perpendicular to pq and qr define the line segment
For weighting vectors with smaller and larger slopes than w1, the relative order of p and q changes
Monochromatic reverse top-k, k>1: The solution space may contain
more than 1 partition
![Page 12: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/12.jpg)
14
Outline
Motivation & PreliminariesMonochromatic Reverse Top-k QueriesBichromatic Reverse Top-k Queries
Threshold-based AlgorithmMaterialized Views
Experimental EvaluationConclusions & Future Work
![Page 13: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/13.jpg)
15
Bichromatic Reverse Top-k Query
bRTOPk(q): Given a point q, a positive number k and two datasets S and W, where S represents data points and W is a dataset containing different weighting vectors, a weighting vector wi belongs to the result set, if and only if there exists p in TOPk(wi) such that fwi(q) ≤ fwi(p)
Naïve approach: for each weighting vector process the top-k query test if query point q is in the top-k list
![Page 14: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/14.jpg)
16
Threshold-based Algorithm (RTA)
Goal: reduce the number of top-k evaluations by discarding
weighting vectorsThreshold-based Algorithm (RTA):
sort the weighting vectors based on pairwise similarity top-k queries defined by similar vectors, have similar result
setsevaluate the first top-k query, calculate a thresholdFor each weighting vector
possibly prune based on threshold refine threshold
![Page 15: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/15.jpg)
17
Example of RTA Algorithm (k=2)
Evaluate top-2 query for w1
Set threshold based on w2
fw2(q) > threshold discard w2
Refine threshold for w3
W=[ w1, w2, w3 ]
Buffer: p1, p2
w1 q
p4p1
p2
p3
p5p6
p7
p8p9
p10
w2
w3
![Page 16: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/16.jpg)
18
Materialized Views
Threshold-based Algorithm (RTA) reduce the top-k evaluations by discarding some
weighting vectors that are not in the reverse top-k result set
process at least as many top-k evaluations as the cardinality of the result set
Materialized Views find weighting vectors that belong definitely to
the result without top-k evaluation
![Page 17: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/17.jpg)
19
Materialized Views
Grid-based space partitioningcell Ci
lower left corner CiL
upper right corner CiU
We store for each cell Ci the results of reverse top-k queries for corners Ci
L and CiU
w1, w2, w3
![Page 18: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/18.jpg)
20
Materialized Views
Given a point q enclosed in cell Ciall weighting vectors
in RTOPk(CiU) belong
to the result set of qonly weighting
vectors in RTOPk(Ci
L) - RTOPk(CiU)
have to be examined Materialized views can
be generalized for arbitrary k<K values
w1, w2, w3
w1, w2, w3 , w4
![Page 19: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/19.jpg)
21
Outline
Motivation & PreliminariesMonochromatic Reverse Top-k QueriesBichromatic Reverse Top-k Queries
Threshold-based AlgorithmMaterialized Views
Experimental EvaluationConclusions & Future Work
![Page 20: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/20.jpg)
22
Experimental Setup
Comparison between Naïve and RTA (varying dimensionality, cardinality, data distribution – real data)
Queries: uniform and k-skyband pointsMetrics:
time I/Osnumber of top-k evaluations
![Page 21: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/21.jpg)
23
RTA vs. Naïve
RTA outperforms naive by 1 to 2 orders of magnitude as dimensionality increases, |RTOPk(q)| decreases leading to
fewer top-k evaluations
uniform distribution of S and uniform weights W|S|=10K, |W|=10K, top-k=10, skyband query points
![Page 22: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/22.jpg)
24
Scalability of RTA Algorithm
naive requires |W| top-k query evaluations |W|=5K, correlated dataset:
RTA needs on 544 out of 5000 top-k evaluations (saves 89.12% of the cost)
the average size of the result set is 459
various distributions (UN, AC, CO) of S and uniform weights W|S|=10K or |W|=10K, d=5, top-k=10, skyband query points
![Page 23: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/23.jpg)
25
Performance of RTA on Real Data
uniform and clustered weights W (|W|=10K) clustered weights lead to fewer top-k evaluations
NBA consists of 17265 tuples, d=5 (number of points scored, rebounds, assists, steals and blocks)
HOUSE consists of 127930 tuples, d=6 (income spent on gas, electricity, water, heating, insurance, and property tax)
![Page 24: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/24.jpg)
26
Outline
Motivation & PreliminariesExample of Reverse Top-k QueriesMonochromatic Reverse Top-k QueriesBichromatic Reverse Top-k Queries
Threshold-based AlgorithmMaterialized Views
Experimental EvaluationConclusions & Future Work
![Page 25: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/25.jpg)
27
We introduced reverse top-k queries geometric interpretation of the solution spaceefficient algorithm for bichromatic reverse top-k
querymaterialized reverse top-k views
Future Work interpretation of solution space for higher
dimensions (monochromatic reverse top-k) improve the performance of the bichromatic reverse
top-k computation
Conclusions and Future Work
![Page 26: Reverse Top- k Queries](https://reader035.vdocument.in/reader035/viewer/2022062321/56813c00550346895da556d5/html5/thumbnails/26.jpg)
28
Thank you!
More information: http://www.idi.ntnu.no/~vlachou/
Related work:
Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis, Kjetil Nørvåg: "Reverse Top-k Queries"
Akrivi Vlachou, Christos Doulkeridis, Kjetil Nørvåg, Yannis Kotidis: "Identifying the Most Influential Data Objects with Reverse Top-k Queries"