mining interesting locations and travel sequences from gps trajectories
DESCRIPTION
Mining Interesting Locations and Travel Sequences From GPS Trajectories. Yu Zheng and Xing Xie Microsoft Research Asia March 16, 2009. Outline. Introduction Our Solution Experiments Conclusion. Background. GPS-enabled devices have become prevalent - PowerPoint PPT PresentationTRANSCRIPT
Mining Interesting Locations and Travel Sequences From GPS Trajectories
Yu Zheng and Xing Xie
Microsoft Research Asia
March 16, 2009
Outline
IntroductionOur SolutionExperimentsConclusion
2
Background
3
GPS-enabled devices have become prevalent
These devices enable us to record our location history with GPS trajectories
Human location history is a big cake given the large number of GPS phones
Motivation
When people come to an unfamiliar cityWhat’s the top interesting locations in this cityHow should I travel among these places (travel sequences)A map does not make much sense to a freshman
4
?
Strategy
Mining interesting locations and travel sequences from multiple users’ location histories
http://geolife
5
6
7
Difficulty
What is a location? (geographical scales)The interest level of a location
does not only depend on the number of users visiting this location but also lie in these users’ travel experiences
How to determine a user’s travel experience?The location interest and user travel
are region-relatedare relative value (Ranking problem)
8
Solution – Step 1: Modeling Human Location History GPS logs P and GPS trajectory
Stay points S={s1, s2,…, sn}.Stands for a geo-region where a user has stayed for a whileCarry a semantic meaning beyond a raw GPS point
Location history: represented by a sequence of stay pointswith transition intervals
p4
p3
p5
p6
p7
A Stay Point S
p1
p2
Latitude, Longitude, Time
p1: Lat1, Lngt1, T1
p2: Lat2, Lngt2, T2
………...pn: Latn, Lngtn, Tn
𝐿𝑜𝑐𝐻= (𝑠1 ∆𝑡1ሱሮ 𝑠2 ∆𝑡2
ሱሮ ,…,∆𝑡𝑛−1ሱۛ ۛ ሮ 𝑠𝑛)
GPS Logs of User 1
GPS Logs of User 2
GPS Logs of User n
GPS Logs of User i
GPS Logs of User i+1
GPS Logs of User n-1
Stands for a stay point SStands for a stay point cluster cij
{C }High
Low
Shared Hierarchical Framework
c10
c20 c21
c30 c31 c32 c33 c34
1. Stay point detection
2. Hierarchical clustering
l1
G3
G1
G2
c30
c31
c32
c33
c34
c20
c21 l2
l3
3.Graph Building
Solution – 2. The HITS-Based Inference
Mutual reinforcement relationshipA user with rich travel knowledge are more likely to visit more interesting locationsA interesting location would be accessed by many users with rich travel knowledge
A HITS-based inference modelUsers are hub nodesLocations are authority nodesTopic is the geo-region
11
12
Users: Hub nodes
Locations: Authority nodes
The HITS-based inference model
13
{C }Ascendant
Stands for a stay point cluster cij
{C }
Descendant
A region specified by a userStands for a cluster that covers the region specified by the user
c35c31 c32 c33 c34 c35c31 c32 c33 c34
A) A region covering locations from single parent cluster
B) A region covering locations from multiple parent clusters
c11
c22c21
c11
c22c21{C }Ascendant
Stands for a stay point cluster cij
{C }
Descendant
A region specified by a userStands for a cluster that covers the region specified by the user
c35c31 c32 c33 c34 c35c31 c32 c33 c34
A) A region covering locations from single parent cluster
B) A region covering locations from multiple parent clusters
c11
c22c21
c11
c22c21
Solution – 3. Detecting Classical Travel Sequence
Three factors determining the classical score of a sequence:Travel experiences (hub scores) of the users taking the sequenceThe location interests (authority scores) weighted by The probability that people would take a specific sequence
14
𝑆𝐴𝐶 = σ (𝑎𝐴∙𝑂𝑢𝑡𝐴𝐶+𝑎𝐶∙𝐼𝑛𝐴𝐶+ ℎ𝑘𝑢𝑘∈𝑈𝐴𝐶 )
A
BC D
E
2 3
4
456
3
2 1
: Authority score of location A
: Authority score of location C
: User k’s hub score
The classical score of sequence AC:
Experiments
SettingsEvaluation ApproachResults
15
GPS Devices and Users
60 Devices and 138 usersFrom May 2007 ~ present
16
16%
45%
30%
9%
age<=22 22<age<=25
26<=age<29 age>=30
18%14%
10%58%
Microsoft emplyeesEmployees of other companies Government staffColleage students
• A large-scale GPS dataset (by Feb. 18, 2009)– 10+ million GPS points– 260+ million kilometers– 36 cities in China and a few city in the USA, Korea and Japan
Evaluation Approach
• 29 subjects – 14 females and 15 males– have been in Beijing for more than 6 years
• The test region: – specified by the fourth ring road of Beijing
• Evaluated objects– The top 10 interesting locations and – the top 5 classical travel sequences
18
Evaluation Approach• Presentation
– The ability of the retrieved locations in presenting a given region.
– Investigate three aspects• Representative (0-10)• Comprehensive rating (1-5)• Novelty rating (0-10)
• Rank – The ranking performance of the retrieved
locations based on inferred interests.
19
Top 10 interesting locations
(C1, C2,…,C10)
A geospatial region
User Desirability Rating on each
location(-1, 0, 1, 2)
Representative Rating (0~10)
Comprehensive Rating (1~5)
nDCG &
MAP
Top 5 classical travel sequences(Sq1, Sq2,…,Sq5)
Novelty Rating (0~10)
Presentation
User Desirability Rating On each sequence
(-1,0,1,2)
Rank
Ratings Explanations
2 I’d like to plan a trip to that location.
1 I’d like to visit that location if passing by.
0I have no feeling about this location, but don’t oppose others to visit it.
-1 This location does not deserve to visit.
Ratings Explanations
2 I’d like to plan a trip with this travel sequence.
1 I’d like to take that sequence if visiting the region.
0I have no feeling about this sequence, but don’t oppose others to choose it.
-1 It is not a good choice to select this sequence.
Results on Evaluating Interesting Locations
20
A) Our method B) Rank-by-count C) Rank-by-frequency
Results on Evaluating Interesting Locations
21
Ours Rank-by-count Rank-by-frequency
nDCG@5 0.823 0.714 0.598
nDCG@10 0.943 0.848 0.859
MAP 0.759 0.532 0.365
Ranking ability of different methods
Ours Rank-by-count Rank-by-frequencyRepresentative 5.4 4.5 3.1Comprehensive 4 3.4 2.3Novelty 3.4 2.4 2.2
Comparison on the presentation ability of different methods
Results on Evaluating Travel Sequences
22
Ours (Interest + Experience)
Rank-by-counts
Rank-by-interest
Rank-by-experience
Mean score 1.6 1.2 1.4 1.5
Classical Rate 0.6 0.3 0.4 0.4
23
A railway station A ordinary hotel nearby the station
An ordinary café nearby an experienced user’s home
An normal store close to her home
Rank-by-experience
Rank-by-counts
Tiananmen Square The Summer Palace
Rank-by-interest
The Bird’s nets Houhai Bar street
Our methods
Investigating in our method
24
1
3
2
4
6
5
8
9
7
10
1
6
3
2
4
9
58
7
10
A) Inferring the top 10 interesting locations without using hierarchy
B) Ranking the locations using the authority scores of the region
C) Ranking locations using their authority scores of the whole Beijing
A) Our method using hierarchy B) Our method without using hierarchy
• Why Hierarchy• Provide user with a comprehensive view of a large region (a city) • help users understand the region step-by-step (level-by-level). • The hierarchy can be used to specify users’ travel experiences in
different regions.
25
1
3
2
4
6
5
8
9
7
10
1
6
3
2
4
9
58
7
10
A) Inferring the top 10 interesting locations without using hierarchy
B) Ranking the locations using the authority scores of the region
C) Ranking locations using their authority scores of the whole Beijing
1
3
2
4
6
5
8
9
7
10
1
6
3
2
4
9
58
7
10
A) Inferring the top 10 interesting locations without using hierarchy
B) Ranking the locations using the authority scores of the region
C) Ranking locations using their authority scores of the whole Beijing
Conclusion
Enable generic travel recommendationTop interesting locations, travel experts and classical travel sequences
Regarding mining interesting locations Our method outperformed Ranking-by-count and Ranking-by-frequencyUser experience is very criticalHierarchy of the geo-spaces is important
Classical travel sequencesLocation interest + user travel experience is better
26