retrieving k-nearest neighboring trajectories by a set of point locations
DESCRIPTION
Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations. Lu-An Tang , Yu Zheng , Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han. University of Illinois at Urbana-Champaign Microsoft Research Asia. Motivation: trajectory query by locations. Huge volume of spatial trajectories - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/1.jpg)
Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations
Lu-An Tang, Yu Zheng, Xing Xie, Jing Yuan, Xiao Yu, Jiawei Han
•University of Illinois at Urbana-ChampaignMicrosoft Research Asia
![Page 2: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/2.jpg)
2
Motivation: trajectory query by locations
Huge volume of spatial trajectories Require to search trajectories by a set of point locations
Geo-tagged photos Taxi trajectories Check-ins
User
![Page 3: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/3.jpg)
3
k-Nearest Neighboring trajectory query
The trajectories may not exactly pass those locationsQuery the top k trajectories with the minimum aggregated distance to the given locations
q1
q2
q3
![Page 4: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/4.jpg)
4
k-NNT query
Task Definition: Given the trajectory dataset D, and a set of query points, Q, the k-NNT query retrieves k trajectories K from D, K = {R1, R2, …, Rk} that for ∀ Ri ∈ K, ∀ Rj ∈ D - K, dist(Ri,Q) ≤ dist(Rj,Q).
ChallengesHuge trajectory dataset: High I/O cost to scan all the trajectories Aggregated distance computationNon-uniform distribution:
the trajectories are sparse/dense in different regionsthe user-given query locations may be far from all the trajectories
![Page 5: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/5.jpg)
5
R1
R2q1
q2 q3
p1,1p1,2
p1,3p1,4 p1,5
p2,1p2,2
p2,3p2,4
p2,5
p2,6
The aggregate distance in k-NNT query
1. Find out the closest point from a trajectory to each query point (i.e., shortest matching pairs)
3. Sum up the lengths of all matching pairs
• dist(R1, q1)= dist(p1,2, q1)= 20 m• dist(R1, q2)= dist(p1,3, q2)= 50 m• dist(R1, q3)= dist(p1,5, q3)= 15 m• dist(R1, Q)=∑ dist(R1, qi)= 85 m
• dist(R2, q1)= dist(p2,3, q1)= 30 m• dist(R2, q2)= dist(p2,4, q2)= 5 m• dist(R2, q3)= dist(p2,6, q3)= 40 m• dist(R2, Q)=∑ dist(R2, qi)= 75 m
![Page 6: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/6.jpg)
6
Related Work: k-BCT query
k-Best Connected Trajectory (k-BCT) query [SIGMOD2010]the similarity function between a trajectory R and query locations Q is
Problem: This function changes over units (inconsistent)An example
If query Q has two points q1 and q2;
dist(R1, q1) = dist(R1, q2) = 2.4km = 1.48 miles,
dist(R2, q1) = 1.5 km =0.93 miles, dist(R2, q2) = 5km = 3.1 miles
Use unit “mile”, Sim(R1, Q) = 0.45 > Sim(R2, Q) = 0.43
Use unit “km”, Sim(R1, Q) = 0.18 < Sim(R2, Q) = 0.22
![Page 7: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/7.jpg)
7
Advantages of k-NNT over k-BCT
The distance function of k-BCT changes over units (inconsistent)The distance function of k-BCT is sensitive to a query
q1
q2
q3
• k-BCT&k-NNT
• k-NNT
• k-BCT
![Page 8: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/8.jpg)
8
Query framework: candidate-generation-and-verification
Candidate generationBest-first search based individual heapsCoordination by a global heap
Candidate verificationLower-bound estimationEfficient pruning with the global heap
Qualifier expectation-based method
R1 R2R3 R4
q1
q2
q3
R5
R6
dist(R1, Q)= 5+2+2=9 mdist(R2, Q)= 25+20+30=75mdist(R3, Q)= 80+25+30=135mdist(R4, Q)= 90+5+3=98 mdist(R5, Q)= 55+8+70=123mdist(R6, Q)= 120+80+40=240 m
Direct Computing
Candidate Generation
R1 R4
q1
q2
q3
R5 dist(R1, Q)= 5+2+2=9 mdist(R4, Q)= 90+5+3=98 mdist(R5, Q)= 55+8+70=123m
Candidate Verification
![Page 9: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/9.jpg)
9
Candidate Generation
Given a query Q = {q1, q2, …, qm}, generate a trajectory candidate set including all the k-NNTs (i.e., complete set)
Step 1: searching k-NN points using best-first-based individual heap Step 2: generating the candidate trajectories by the global heap
R1 R2R3 R4
q1
q2
q3
R5
R6
<p2,3, q1><p5,2, q1><p1,6, q1><p2,9, q1>
…...
h1
<p6,2, q2><p5,3, q2><p7,4, q2><p4,8, q2>
…...
h2
<p2,2, q3><p3,5, q3><p7,3, q3><p8,6, q3>
…...
h3
![Page 10: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/10.jpg)
10
Global heapA minimum heap sorting matching pairs by the distanceRetrieves new matching pair from individual heapsPops the matching pairs to the candidate set
Step 2: generating candidate trajectories
<p2,3, q1><p5,2, q1><p1,6, q1><p2,9, q1>
…...
h1
<p6,2, q2><p5,3, q2><p7,4, q2><p4,8, q2>
…...
<p2,2, q3><p3,5, q3><p7,3, q3><p8,6, q3>
…...
<p5,1, qm><p2,3, qm><p5,7, qm><p9,2, qm>
…...
…...
<p1,4, q1>, <p5,1, q3>, <p6,4, q4>, <p3,4, q2>, …...
Global Heap (Size=m)
R1: <p1,2, q1>, <p1,5, q2>, <p1,3, q3>, ……, <p1,9, qm>. R2: , <p2,2, q2>, <p2,4, q3>, ……, . R4: <p4,5, q1>, , <p4,3, q3>, ……, <p4,7, qm> ………... Candidate Set
h2 h3 hmIndividual Heaps
![Page 11: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/11.jpg)
11
R1 R2R3 R4
R5
p1,2
p4,4
p4,5p1,4
p1,6
p5,5
Example: Search based on the global heap
Candidate Set
Global Heap
Individual Heaps
q1
q2
q3
h1 h2 h3
…… …… …… • <p1,2,
q1>• <p1,4,
q2>• <p1,6,
q3>
![Page 12: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/12.jpg)
12
R1 R2R3 R4
R5
p1,2
p4,4
p4,5p1,4
p1,6
p5,5
Example: Search based on the global heap
Candidate Set
Global Heap
Individual Heaps
q1
q2
q3
h1 h2 h3
…… …… ……
• <p1,2, q1>
• <p1,4, q2>
• <p1,6, q3>
• R1: (Partial Match)
• <p5,5, q2>
![Page 13: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/13.jpg)
13
R1 R2R3 R4
R5
p1,2
p4,4
p4,5p1,4
p1,6
p5,5
Example: Search based on the global heap
Candidate Set
Global Heap
Individual Heaps
q1
q2
q3
h1 h2 h3
…… …… ……
• <p1,2, q1>
• <p1,4, q2>
• <p1,6, q3>
• R1: (Partial Match)
• <p5,5, q2>
• <p4,5, q3>
![Page 14: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/14.jpg)
14
R1 R2R3 R4
R5
p1,2
p4,4
p4,5p1,4
p1,6
p5,5
Example: Search based on the global heap
Candidate Set
Global Heap
Individual Heaps
q1
q2
q3
h1 h2 h3
…… …… ……
• <p1,2, q1>
• <p1,4, q2>
• <p1,6, q3>
• R1: (Partial Match)
• <p5,5, q2>
• <p4,5, q3>
• R5: (Partial Match)
• <p4,4, q2>
![Page 15: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/15.jpg)
15
Example: Search based on the global heap
R1: <p1,2, q1>, <p1,4, q2>, <p1,6, q3>. (Full Match)
R4: <p4,5, q3>. (Partial Match)
R5: <p5,5, q2>. (Partial Match)
Candidate Set
Global Heap<p1,2, q1>, <p4,4, q2>, <p1,5, q3>
Individual Heaps
…… ……
h1 h2 h3
……
R1 R2R3 R4
R5
p1,2
p4,4
p4,5p1,4
p1,6
p5,5
q1
q2
q3
Stop critiria: when there is k full-matching candidates – Property 1: The candidate set is complete if G has popped out k full-matching candidates (In this example k=1)
Advantagesguarantee including all k-NNTs in candidate setgenerate compact candidate sets
![Page 16: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/16.jpg)
16
Candidate verification
The full-matching candidate may not be the final k-NNT The system has to retrieve the partial-matching trajectories (R4 and R5) to compute their aggregate distance (I/O cost)
Question: can we compute a lower-bound for R4 and R5 without retrieving their details?If LB(R4/5) > dist(R1,Q), we can prune it directly
R1: <p1,2, q1>, <p1,4, q2>, <p1,6, q3>. (Full Match)
R4: <p4,5, q3>. (Partial Match)
R5: <p5,5, q2>. (Partial Match)
Candidate Set
![Page 17: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/17.jpg)
17
Candidate verification
The lower-bound of a partial-matching trajectory is
If the LB(R) is larger than the distance of full-matching candidate, R can be pruned directlyR1: <p1,2, q1> <p1,4, q2> <p1,6, q3> dist(R1) = 95
R4: <p4,5, q3>
R5: <p5,5, q2>
Candidate Set
Global Heap• <p1,5, q3>
• <p1,2, q1>
• <p4,4, q2>
• <p1,5, q3>
• <p1,2, q1>
• <p4,4, q2>
• <p1,5, q3>
• <p1,2, q1>
• <p4,4, q2>
LB(R4) =114 (pruned)
LB(R5) =90 (passed)
![Page 18: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/18.jpg)
18
Problem of Outlier Query Location
A query location is an outlier if it is far from all the trajectories
Too many partial-matching candidates will be generated before finding a full-matching candidates
R1: <p1,1, q1>, <p1,4, q2>, . (Partial Matching) R2: <p2,1, q1>, <p2,5, q2>, . (Partial Matching)R4: , <p4,4, q2>, . (Partial Matching)
<p1,1, q1>, <p4,4, q2>, <p1,7, q3> Iteration 4
Global Heap
Candidate Set
<p1,4, q2>, <p1,1, q1>, <p1,7, q3> Iteration 3
<p2,1, q1>, <p1,4, q2>, <p1,7, q3> Iteration 2
<p2,5, q2>, <p2,1, q1>, <p1,7, q3> Iteration 1
……
<p1,7, q3> cannot be popped out
R1R2
R3 R4
q1
q2
q3
p1,2
p2,1
p2,2
p2,5
p1,7 p2,6
p4,4
p1,4
![Page 19: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/19.jpg)
19
Qualifier expectation based method
The system can make up the missing pairs of a partial-matching trajectory by retrieving all its pointsTwo key issues:
Guarantee the completeness of candidate set Property 2: If there are k made-up candidates (qualifier) with distance smaller than the sum of the pairs in global heap, the candidate set is complete
Which candidate should be selected to make up? The qualifier expectation measure
![Page 20: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/20.jpg)
20
R1R2
R3 R4
q1
q2
q3
p1,2
p2,1
p2,2
p2,5
p1,7 p2,6
p4,4
p1,4
Example of Qualifier Expectation
R1: <p1,1, q1>, <p1,4, q2>, .
R2: <p2,1, q1>, <p2,5, q2>, .
R4: ,<p4,4, q2>, .Candidate Set
Global Heap, total dist sum(G) = 200m<p2,1, q1>, <p4,4, q2>, <p1,7, q3>
R1: 40m. R2: 30m. R4: 15m.
Qualifier Expectation
• R1: <p1,1, q1>, <p1,4, q2>, <p1,7, q3>.
dist(R1) =160m < sum(G), R1 is a qualifier
![Page 21: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/21.jpg)
21
Experiment Setup
Real Dataset: collected from the Microsoft GeoLife and T-Drive projects , with over 20,000 real trajectoriesSynthetic datasets with both uniform distribution and biased distributionRandom generated query Q The proposed methods are compared with Fagin’s Algorithm (FA) and Threshold Algorithm (TA) (used in k-BCT)
• GeoLife
![Page 22: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/22.jpg)
22
Evaluations on synthetic dataset (biased distribution)
GH (global heap) is faster than baselines with less I/O costsQE( global heap+ qualifier expectation ) is an order of magnitude faster than others
1000
10000
100000
1000000
10000000
2 4 6 8 10100
1000
10000
100000
2 4 6 8 10
100
1000
10000
100000
3k 6k 9k 12k
GH QE TA FA
Time (unit: ms) Accessed Rtree Nodes
(a) Query Time vs. |Q| (b) I/O Cost vs. |Q|
![Page 23: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/23.jpg)
23
Evaluations on real dataset
When |Q| is small, the probability of outlier location is low, GH achieves the best performanceWhen |Q| is larger, the probability of outlier location is high, QE is more efficient
1000
10000
100000
1000000
2 4 6 8 10
10
100
1000
10000
2 4 6 8 10
100
1000
10000
100000
3k 6k 9k 12k
GH QE TA FA
Time (unit: ms) Accessed Rtree Nodes
(a) Query Time vs. |Q| (b) I/O Cost vs. |Q|
![Page 24: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/24.jpg)
24
Conclusion
k-Nearest Neighboring Trajectory (k-NNT) queryretrieve trajectories by a set of locations
Candidate-generation-and-verification frameworkGenerate candidate trajectories with global heapEfficient lower-bound computation
Outlier query location: qualifier expectation-based method
![Page 25: Retrieving k-Nearest Neighboring Trajectories by a Set of Point Locations](https://reader035.vdocument.in/reader035/viewer/2022062501/56816507550346895dd77857/html5/thumbnails/25.jpg)
25
Thanks!
Yu Zheng
Released Datasets:T-Drive taxi trajectoriesGeoLife GPS trajectories