searching trajectories by locations – an efficiency study zaiben chen 1, heng tao shen 1, xiaofang...
TRANSCRIPT
![Page 1: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/1.jpg)
Searching Trajectories by Locations
– An Efficiency Study
Zaiben Chen1, Heng Tao Shen1, Xiaofang Zhou1, Yu Zheng2, Xing Xie2
1 The University of Queensland2 Microsoft Research, Asia
![Page 2: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/2.jpg)
Outline
Research problem & application scenarios Basic ideas
K Best-Connected Trajectory (k-BCT) query The Incremental k-NN Algorithm (IKNN)
Performance study Best-first Depth-first
Optimization & extension Experiments Conclusion
![Page 3: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/3.jpg)
Research Problem: Searching Trajectory Databases
GPS trajectories collected by GeoLife Project, MSRA
How to retrieve the trajectories we want?
![Page 4: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/4.jpg)
Searching Trajectory Databases
Search by a location
Search by a sample trajectory
Frentzos et al. Geoinfomatica07; Dfoser et al. VLDB00. (R-tree variants)
Chen et al, SIGMOD05; Vlachos et al, ICDE02; Yi et al, ICDE98, etc. (Similarity)
![Page 5: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/5.jpg)
Searching Trajectory Databases
The problem we study: Searching by multiple locations
To find trajectories that are ‘close’ to all the locations Technically, it is an extension of the single-location based query. But more complicated. Practically, it produces a more general way to search trajectories.
Two extreme cases (one location, many locations)
![Page 6: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/6.jpg)
Application motivations
The Microsoft GeoLife Projecthttp://research.microsoft.com/en-us/projects/geolife/
GeoLife is a location-based service built on Microsoft Virtual Earth.
Our work benefits the following two functions
(1) Travel recommendation
E.g. To help a visitor planning a trip to multiple attractions by considering other’s traveling trajectories.
(2) Sharing life experiences & friend recommendation
E.g. To find out which users share the similar daily route through Queens Plaza, Central Stat., Mains St.
![Page 7: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/7.jpg)
Application motivations
Geo-Coding:From Pictures to Coordinates
The recommended route
![Page 8: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/8.jpg)
Application motivations
Geo-Coding:From Pictures to Coordinates
The recommended route
The first step: to define the closeness (i.e. distance) between a trajectory and locations
![Page 9: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/9.jpg)
Similarity Function
The similarity function reflects how close a trajectory is to the given locations, and we call the most similar trajectory the best-connected trajectory. Step 1. find out the closest trajectory point on R to each location qi
Step 2. sum up the contribution of each matched pair. (unordered query)
Distq(qi, R) is the shortest distance from qi to R
Q={q1, q2, … qm}, R={p1, p2, … pn}
![Page 10: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/10.jpg)
Problem Definition
k-Best Connected Trajectory (k-BCT) query
Given a set of trajectories T = {R1, R2, … , Rn}, a set of query locations
Q = {q1, q2, … ,qm}, and the similarity function Sim(Q, R), the k-BCT query is to find the k trajectories among T that have the highest similarity.
Assumption:
The number of query locations is small. (m is a small constant)
Intuition:
The k-BCT result is the JOIN of m single-location based queries.
![Page 11: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/11.jpg)
Basic ideas
Incremental k-NN Algorithm (IKNN)
Step 1. Index all the trajectory points by one single R-tree Get the shortest distance from a query location to the trajectories
Step 2. Search for the λ-nearest neighbor (λ-NN) of each query location (q1 to qm), by using any traditional k-nearest neighbor algorithm over R-tree.
For any trajectory that scanned by a λ-NN, it’s shortest distance to the query point is known.
Candidate set C = {all scanned trajectories}
![Page 12: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/12.jpg)
IKNN algorithm
Step 3. Construct lower bounds of similarity.
For a trajectory R1 in C, assume it got 3 points p1, p2 and p3 scanned by the λ-NN search of q1, q2.
R1
p1 p2
Sim(Q, R1) = e-|q1, p1| + e-|q2, p2| + e-|q3, p5|
p3
q1q2 q3
p5
≥ e-|q1, p1| + e-|q2, p2|
![Page 13: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/13.jpg)
The Incremental k-NN algorithm
Step 4. Construct upper bound of similarity.
For any trajectory that is not covered by the λ-NN search, e.g. R5
it’s distance to qi must be larger than the radius of qi
R1
Sim(Q, R5) = e-|q1, R5| + e-|q2, R5| + e-|q3, R5| ≤ e-radius1+ e-radius2 + e-radius3
q1q2 q3
R5
radius1 radius2 radius3
![Page 14: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/14.jpg)
The Incremental k-NN algorithm
Step 5. Check the STOP condition (pruning condition)
For a k-BCT query, if we can get k candidate trajectories whose lower bounds are not less than the upper bound of similarity for all un-scanned trajectories ,
then the k best-connected trajectories must be included in the candidate set.
if the condition is satisfied
go to the refinement step
else
increase λ by some Δ
repeat the search process
With the search region of the λ-NN search enlarges, eventually k best-connected trajectories will be found.
![Page 15: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/15.jpg)
Problem
The problem: we may need to increase λ and compute the lower/upper bounds for many rounds before we eventually find the k-BCT results. The λ-NN search will run for many rounds for every query location.
(let λ be a constant k initially, and Δ be k as well)
round 1: 1 – k nearest neighbors
round 2: 1 – 2k nearest neighbors
…
round i: 1 – i*k nearest neighbors
Trajectory points are visited multiple times.
Normally, λ >> k, so the complexity is λ^2.
![Page 16: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/16.jpg)
Problem
The problem: we may need to increase λ and compute the lower/upper bounds for many rounds before we eventually find the k-BCT results. The λ-NN search will run for many rounds for every query location.
(let λ be a constant k initially, and Δ be k as well)
round 1: 1 – k nearest neighbors
round 2: 1 – 2k nearest neighbors
…
round i: 1 – i*k nearest neighbors
Normally, λ >> k, so the complexity is lambda square.
Can we reduce the overlapped search regions?
![Page 17: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/17.jpg)
Efficiency study of the IKNN
Adaption of the λ-NN algorithm The best-first nearest neighbor search [Hjaltason et al., TODS99]
A priority queue is maintained to store all the R-tree entries that have yet to be visited, using the MINDIST as a key. So it visits MBRs/Objects in the order of the MINDIST.
The depth-first nearest neighbor search [Roussopoulos et al., SIGMOD95]
It recursively traverses the R-tree level by level in a depth-first manner, while maintaining a global list of k nearest candidates found so far.
Estimate the performance of the IKNN adopting different λ-NN algorithms
![Page 18: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/18.jpg)
Adaption of the λ-NN algorithm
The best-first NN search Retrieve the λ, λ+∆, λ+2∆, … NN for each query location incrementally
until the k best-connected trajectories are included in the candidate set.
Benefit
The λ-NN is returned in an incremental way
I/O optimal, no overlap occurs, Vsum = λ
Shortcoming
Memory consumption is NOT guaranteed. A priority queue is maintained to store all the R-tree entries that have yet to be visited. The queue may be as large as the whole dataset in an extreme case.
![Page 19: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/19.jpg)
The best-first strategy
Performance (R-tree leaf access) Estimate the circle region (with radius r) that contains λ points [Belussi
et al. VLDB95]
Estimate the leaf access of a range query with radius r [Korn et al. TKDE2001]
m independent λ-NN queries
q
λ objects
radius
![Page 20: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/20.jpg)
Adaption of the lambda-NN algorithm
The depth-first NN search Every time we search for the λ+∆ NN, we have to re-visit the search
region of the λ-NN query.
Benefit: Guaranteed memory usage, O(c LogcN)
Drawback: Too many overlaps
A simple improvement: Double λ at each round, to reduce the number of rounds and amortize cost.
Pruning: All MBRs whose MAXDIST is even smaller than the current search range of λ-NN can be skipped in the search of λ+∆ NN.
![Page 21: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/21.jpg)
The depth-first strategy
Performance (R-tree leaf access)
The search region is not necessary a circle! So we can not use the previous method directly. Estimate the size of the first visited
MBR (at any level) that contains not less
than λ points Estimate the radius (MAXDIST) of the
region that contains the MBR
MBR1
qi
MAXDIST
R-tree nodes outside the circle with radius MAXDIST wont be visited.
![Page 22: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/22.jpg)
The depth-first strategy (cont.)
Performance Estimate the leaf access of a range query with radius MAXDIST [Korn et
al. TKDE2001]
Finally,
![Page 23: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/23.jpg)
Summary
IKNN algorithm Memory usage Object visits Leaf access
The best-first strategy
no guarantee m × O(λ)
The depth-first strategy
O(logN * c) m × O(λ)
The best-first strategy, although has no guarantee in memory usage, it normally runs faster and the priority queue can still be accommodated in the main memory of a modern computer easily.
The modified depth-first strategy reaches nearly the same performance as that of the best-first strategy, while it still preserves a low memory consumption
![Page 24: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/24.jpg)
Optimization & Extension
Considering the importance of the query locations and assigning different weights in exploring objects.
Extension to query locations with an order specified
![Page 25: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/25.jpg)
Experiments
12, 653 trajectories (1,147,116 points) collected by the Geolife project
Number of query locations: 2 to 10 Tests are conducted on PC with 2.1GHz CPU and 1GB memory
![Page 26: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/26.jpg)
Experiments – Node Access
![Page 27: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/27.jpg)
Experiments – Query Time
![Page 28: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/28.jpg)
Experiments – Memory Usage
![Page 29: Searching Trajectories by Locations – An Efficiency Study Zaiben Chen 1, Heng Tao Shen 1, Xiaofang Zhou 1, Yu Zheng 2, Xing Xie 2 1 The University of Queensland](https://reader035.vdocument.in/reader035/viewer/2022070305/551475635503462d4e8b6286/html5/thumbnails/29.jpg)
Thank you