mining interesting locations and travel sequences from gps trajectories

Mining Interesting Locations and Travel Sequences From GPS Trajectories

Yu Zheng and Xing Xie

Microsoft Research Asia

March 16, 2009

Outline

IntroductionOur SolutionExperimentsConclusion

2

Background

3

GPS-enabled devices have become prevalent

These devices enable us to record our location history with GPS trajectories

Human location history is a big cake given the large number of GPS phones

Motivation

When people come to an unfamiliar cityWhat’s the top interesting locations in this cityHow should I travel among these places (travel sequences)A map does not make much sense to a freshman

4

?

Strategy

Mining interesting locations and travel sequences from multiple users’ location histories

http://geolife

5

http://geolife/

Difficulty

What is a location? (geographical scales)The interest level of a location

does not only depend on the number of users visiting this location but also lie in these users’ travel experiences

How to determine a user’s travel experience?The location interest and user travel

are region-relatedare relative value (Ranking problem)

8

Solution – Step 1: Modeling Human Location History GPS logs P and GPS trajectory

Stay points S={s1, s2,…, sn}.Stands for a geo-region where a user has stayed for a whileCarry a semantic meaning beyond a raw GPS point

Location history: represented by a sequence of stay pointswith transition intervals

p4

p3

p5

p6

p7

A Stay Point S

p1

p2

Latitude, Longitude, Time

p1: Lat1, Lngt1, T1

p2: Lat2, Lngt2, T2

………...pn: Latn, Lngtn, Tn

𝐿𝑜𝑐𝐻= (𝑠1 ∆𝑡1ሱሮ 𝑠2 ∆𝑡2

ሱሮ ,…,∆𝑡𝑛−1ሱۛ ۛ ሮ 𝑠𝑛)

GPS Logs of User 1

GPS Logs of User 2

GPS Logs of User n

GPS Logs of User i

GPS Logs of User i+1

GPS Logs of User n-1

Stands for a stay point SStands for a stay point cluster cij

{C }High

Low

Shared Hierarchical Framework

c10

c20 c21

c30 c31 c32 c33 c34

1. Stay point detection

2. Hierarchical clustering

l1

G3

G1

G2

c30

c31

c32

c33

c34

c20

c21 l2

l3

3.Graph Building

Solution – 2. The HITS-Based Inference

Mutual reinforcement relationshipA user with rich travel knowledge are more likely to visit more interesting locationsA interesting location would be accessed by many users with rich travel knowledge

A HITS-based inference modelUsers are hub nodesLocations are authority nodesTopic is the geo-region

11

12

Users: Hub nodes

Locations: Authority nodes

The HITS-based inference model

13

{C }Ascendant

Stands for a stay point cluster cij

{C }

Descendant

A region specified by a userStands for a cluster that covers the region specified by the user

c35c31 c32 c33 c34 c35c31 c32 c33 c34

A) A region covering locations from single parent cluster

B) A region covering locations from multiple parent clusters

c11

c22c21

c11

c22c21{C }Ascendant

Stands for a stay point cluster cij

{C }

Descendant

A region specified by a userStands for a cluster that covers the region specified by the user

c35c31 c32 c33 c34 c35c31 c32 c33 c34

A) A region covering locations from single parent cluster

B) A region covering locations from multiple parent clusters

c11

c22c21

c11

c22c21

Solution – 3. Detecting Classical Travel Sequence

Three factors determining the classical score of a sequence:Travel experiences (hub scores) of the users taking the sequenceThe location interests (authority scores) weighted by The probability that people would take a specific sequence

14

𝑆𝐴𝐶 = σ (𝑎𝐴∙𝑂𝑢𝑡𝐴𝐶+𝑎𝐶∙𝐼𝑛𝐴𝐶+ ℎ𝑘𝑢𝑘∈𝑈𝐴𝐶 )

A

BC D

E

2 3

4

456

3

2 1

: Authority score of location A

: Authority score of location C

: User k’s hub score

The classical score of sequence AC:

Experiments

SettingsEvaluation ApproachResults

15

GPS Devices and Users

60 Devices and 138 usersFrom May 2007 ~ present

16

16%

45%

30%

9%

age<=22 22<age<=25

26<=age<29 age>=30

18%14%

10%58%

Microsoft emplyeesEmployees of other companies Government staffColleage students

• A large-scale GPS dataset (by Feb. 18, 2009)– 10+ million GPS points– 260+ million kilometers– 36 cities in China and a few city in the USA, Korea and Japan

Evaluation Approach

• 29 subjects – 14 females and 15 males– have been in Beijing for more than 6 years

• The test region: – specified by the fourth ring road of Beijing

• Evaluated objects– The top 10 interesting locations and – the top 5 classical travel sequences

18

Evaluation Approach• Presentation

– The ability of the retrieved locations in presenting a given region.

– Investigate three aspects• Representative (0-10)• Comprehensive rating (1-5)• Novelty rating (0-10)

• Rank – The ranking performance of the retrieved

locations based on inferred interests.

19

Top 10 interesting locations

(C1, C2,…,C10)

A geospatial region

User Desirability Rating on each

location(-1, 0, 1, 2)

Representative Rating (0~10)

Comprehensive Rating (1~5)

nDCG &

MAP

Top 5 classical travel sequences(Sq1, Sq2,…,Sq5)

Novelty Rating (0~10)

Presentation

User Desirability Rating On each sequence

(-1,0,1,2)

Rank

Ratings Explanations

2 I’d like to plan a trip to that location.

1 I’d like to visit that location if passing by.

0I have no feeling about this location, but don’t oppose others to visit it.

-1 This location does not deserve to visit.

Ratings Explanations

2 I’d like to plan a trip with this travel sequence.

1 I’d like to take that sequence if visiting the region.

0I have no feeling about this sequence, but don’t oppose others to choose it.

-1 It is not a good choice to select this sequence.

Results on Evaluating Interesting Locations

20

A) Our method B) Rank-by-count C) Rank-by-frequency

Results on Evaluating Interesting Locations

21

Ours Rank-by-count Rank-by-frequency

nDCG@5 0.823 0.714 0.598

nDCG@10 0.943 0.848 0.859

MAP 0.759 0.532 0.365

Ranking ability of different methods

Ours Rank-by-count Rank-by-frequencyRepresentative 5.4 4.5 3.1Comprehensive 4 3.4 2.3Novelty 3.4 2.4 2.2

Comparison on the presentation ability of different methods

Results on Evaluating Travel Sequences

22

Ours (Interest + Experience)

Rank-by-counts

Rank-by-interest

Rank-by-experience

Mean score 1.6 1.2 1.4 1.5

Classical Rate 0.6 0.3 0.4 0.4

23

A railway station A ordinary hotel nearby the station

An ordinary café nearby an experienced user’s home

An normal store close to her home

Rank-by-experience

Rank-by-counts

Tiananmen Square The Summer Palace

Rank-by-interest

The Bird’s nets Houhai Bar street

Our methods

Investigating in our method

24

1

3

2

4

6

5

8

9

7

10

1

6

3

2

4

9

58

7

10

A) Inferring the top 10 interesting locations without using hierarchy

B) Ranking the locations using the authority scores of the region

C) Ranking locations using their authority scores of the whole Beijing

A) Our method using hierarchy B) Our method without using hierarchy

• Why Hierarchy• Provide user with a comprehensive view of a large region (a city) • help users understand the region step-by-step (level-by-level). • The hierarchy can be used to specify users’ travel experiences in

different regions.

25

1

3

2

4

6

5

8

9

7

10

1

6

3

2

4

9

58

7

10




1

3

2

4

6

5

8

9

7

10

1

6

3

2

4

9

58

7

10




Conclusion

Enable generic travel recommendationTop interesting locations, travel experts and classical travel sequences

Regarding mining interesting locations Our method outperformed Ranking-by-count and Ranking-by-frequencyUser experience is very criticalHierarchy of the geo-spaces is important

Classical travel sequencesLocation interest + user travel experience is better

26

Thanks!

[email protected]

27

mailto:[email protected]

mining interesting locations and travel sequences from gps trajectories

Documents

user travel

travel sequences

authority score of location

gps trajectory

gps points260

travel experiences hub

number of users

raw gps pointlocation