multiple location profiling for users and relationships from social network and content rui li,...

Post on 14-Jan-2016

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Multiple Location Profiling for Users and Relationships

from Social Network and Content

Rui Li, Shengjie Wang, Kevin Chen-Chuan ChangUniversity of Illinois at Urbana-Champaign

2

Users’ Locations are important for many information services

and many others.

Lives in: Los Angeles

Carol

User

Social Network

Content Provider

Local Content Recommendation

Local Friends Recommendation

3

Community has explored social network and content to profile users’ locations.

Profiling a User’s Home Location

Location: Los Angeles

Tweets

Terrible LA traffic!

Want to go to Honolulu for Spring vacation!

See Gaga in Hollywood.

Good Morning!

Mike

LA

Carol

?

Lucy

Austin

Gaga

NY

Bob

San Diego

Jean

?

Social Network

4

Problem 1 They only profile a single home location.

Locations of a user’s friends

Locational Word Frequencies

Paramount 1

Los Angeles 1

Hollywood 2

Austin 2

Tweeted Locational Words

Carol lives Los Angeles and studied at Uni. of Texas at Austin

o incompleteo inaccurate

5

Problem 2 They totally miss profiling relationships.

Relationships Profiling

Carol follows Bob

Carol follows Lucy

Carol tweets Hollywood

both Carol and Lucy studied at AustinCarol lives Los Angeles

both Carol and Bob work at Los Angeles

o useful !

6

We focus on multiple location profiling for users and relationships.

Carol in Real-worldLocation: Los Angeles Education: Uni. of Texas at Austin 

Terrible LA traffic!

Want to go to Honolulu for Spring vacation!

See Gaga in Hollywood.

Good Morning!

Mike

LA

Carol

?

Lucy

Austin

Gaga

NY

Bob

San Diego

Jean

?

Carol’s Location Profile: Los Angeles, AustinCarol follows Lucy: Austin, Austin

 

  

7

Our approach is to build a model to connect known relationships with unknown locations.

Known Relationships

Following Relationships

Carol follows Lucy

Carol follows Mike

….

Tweeting Relationships

Carol tweets Hollywood

Carol tweets Honolulu

….

Users’ Locations

?

Unknown Locations

MLP Model

Generation Model

Inference Algorithm

8

Challenge 1 How to connect users’ locations with relationships?A. from users’ locations to following relationshipsB. from users’ locations to tweeting relationships

Challenge 2 How to model that the relationships are mixed?A. some relationships are not based on locations.B. each relationship is based on a different location.

Challenge 3 How to utilize home locations from labeled users?

There are three challenges for building MLP.

9

Challenge 1.A We need to connect following relationships with two users’ locations.

Even a user has only one location follows others from different locations.

Tweeting Probability

Carol at Los Angeles follows Bob in San Diego. 20%

Carol at Los Angeles follows Mike in Los Angeles. 30%

The following probability as the probability generating a following relationship from a user to another user based on their locations

10

Observation We explore following probability via investigating a corpus

• It captures our intuition well.

• It fits a power law distribution.

11

Solution: We derive location-based following model for following probability.

The location-based following model

12

Challenge 1.B We need to connect tweeting relationships with a user’s location.

User at a location tweets different locations.

The tweeting probability as the probability generating a tweeting relationship from a user to a venue based on a location

Probability of Tweeting

Carol at Los Angeles tweets about watching a show in Hollywood. 30%

Carol at Los Angeles tweets about traffic in Los Angeles. 40%

13

• They capture our intuition well.

• They can be modeled as a set of multinomial distributions.

Observation We explore tweeting probability via investigating a corpus.

14

Solution: We derive location-based tweeting model for tweeting probability.

The location-based tweeting model

A

ݖ ǡ�� ݐ ሺ�ǡ���ሻ �

L

K

15

Noisy relationships are not useful!

Noisy Relationships

Carol follows Lady Gaga

Carol tweets Honolulu

Location-based Relationshipsb

Carol follows Lucy

Carol tweets Los Angeles

Challenge 2.A There are both noisy and location-based relationships.

16

Solution: We propose a mixture component for two types of relationships.

1. A relationship is generated based on either a location-based model or a random model.

2. A binary model selector μ indicates which model is used.

3. The selector is generated via a binomial distribution

17

Challenge 2.B Location-based relationships are related to multiple locations.

Location-based relationships

Carol follows Lucy

Carol tweets Hollywood

Accurate!Complete!

both Carol and Lucy studied at Austin

Carol lives Los Angeles

18

Solution: We fundamentally model users multiple locations in generating relationships.

Carol

{Los Angels 0.1, Austin 0.1, … }

Location profile as a multinomial distribution over locations.

Each relationship is based on one particular location from his profile.

19

Challenge 3 We should utilize observed locations from some users’ profiles.

Mike

LA

Carol

?

Lucy

Austin

Gaga

NY

Bob

San Diego

Jean

?

they are useful for profiling locations! we cannot use them directly to generate

relationships!

20% users provide their home locations in their profiles.

Solution: We utilize observed locations from as priors to generate users’ profiles.

Bob

{San Diego 0.9, Los Angels 0.05, …}

We assume users profiles are generated prior distributions.

Home locations of users are likely to be generated.

21

Therefore, we arrive a complete model.

22

We crawled a subset of Twitter. There are 139K users, 50

million tweets and 2 million following relationships.

We evaluate our model on a large Twitter corpus.

23

Task 1 profiling users’ home locations, MLP performs accurately and improves baselines.

24

Task 2 profiling users’ multiple locations, MLP proforms accurately and completely.

Precision and Recall at Rank 2

Case Studies

Locations in a similar region

Locations in different areas

Accurately

Completely

25

Task 3 profiling following relationships, MLP achieves 57% accuracy.

26

Thanks and Questions !

27

Backup for Questions

28

Experiments 1

• We use the home location provided in users’ profiles as ground truth.

• We compare two baseline methods proposed in literature.

29

Experiments 2

• We manually labeled multiple locations of 1000 users, and obtained 585 users, who clearly have multiple locations.

• We compare the same baseline methods as in the previous task.

• We measure the performance in terms of “precision” and “recall”.

30

Experiments 3

• We manually labeled location assignments of 585 users, whose multiple locations are known to us, and obtained 4426 relationships.

• We design a meaningful baseline method, which profile a relationship based users home locations.

31

MLP defines the joint probability of observations, parameters, and latent variables.

We infer users’ locations and locations assignments with the observed relationships and the given parameters.

We develop our algorithm based on the Gibbs sampling method.

We infer users’ locations and location assignments for relationships as latent variable in the joint probability.

top related