linking organizational social networking profiles project id: h0791030 jerome cheng zhi kai...

29
Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

Upload: aubrie-stevenson

Post on 20-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

1

Linking Organizational Social Networking ProfilesPROJECT ID: H0791030JEROME CHENG ZHI KAI (A0080860H)

Page 2: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

2

Example: Holiday InnTWITTER FACEBOOK

Page 3: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

3

Motivation: Individuals

• Want to find profiles, but no one place has them

• Sometimes on company websites, but:• No standardized location• Not all companies bother

Page 4: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

4

Page 5: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

5

Page 6: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

6

Motivation: Organizations

• Track competitor’s use of social media

• Find imposter profiles

Page 7: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

7

Problem Definition

System

Social Profiles

Organization Name

Official

Affiliate

Unrelated

Page 8: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

8

Related Work

• Focused on deduplication for individuals

• Relevant: profile characteristics focused on

Page 9: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

9

Related Work: Usernames

• Connecting Corresponding Identities across Communities (Zafarani & Liu, 2009)

• Connecting users across social media sites: a behavioral-modeling approach (Zafarani & Liu, 2013)

• Studying User Footprints in Different Online Social Networks (Malhotra et al., 2012)

Page 10: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

10

Related Work: Created Content

• Identifying Users Across Social Tagging Systems (Iofciu, Fankhauser, Abel & Bischoff, 2011)

Page 11: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

11

Methodology: System Design

1. Input: organization’s name (query)

2. Search Facebook/Twitter APIs, retrieve profiles

3. Convert profiles into feature vectors

4. Classify profile-as-vectors

Page 12: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

12

Classifier Choice

• Evaluated scikit-learn’s:• Decision Tree• Naïve Bayes• Support Vector• Logistic Regression• Random Forest

• Features aren’t independent – trees are well-suited

Page 13: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

13

Feature Breakdown: Name-based

• Normalized Edit Distance• Query to Username• Query to Display Name

• Edit Distance• Query to Username• Query to Display Name• Length of Query• Length of Username• Length of Display Name

Page 14: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

14

Feature Breakdown: Name-based Quirks

• Need to handle abbreviations, stopwords• Citigroup versus Citi, General Motors versus GM

• Take two edit distances: original string, processed string

• Use better scoring of the two

Page 15: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

15

Feature Breakdown: Description

• Occurrences of Query

• Cosine Similarity• Query and Description• Duckduckgo Description and Profile Description

Page 16: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

16

Feature Breakdown: Language Models

• Construct Bigram Language Model for:• Official profile descriptions• Affiliate profile descriptions• Unrelated profile descriptions

• Probability that candidate description belongs to each

Page 17: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

17

Evaluation: Ground Truth Creation

1. Retrieved organizations from Freebase

2. Searched for profiles on Twitter/Facebook

3. Manually labelled as official/affiliate/unrelated

Page 18: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

18

Evaluation: Ground Truth Breakdown

TWITTER CLASSES

Official; 232; 7%

Affiliate; 675; 20%

Unrelated; 2474; 73%

FACEBOOK CLASSES

Official; 146; 4%Affiliate; 491; 14%

Unrelated; 2776; 81%

3381 labels 3413 labels

Page 19: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

19

Evaluation: Process

• Mainly concerned with official and affiliate classes• Not interested in unrelated class

• Modified 10-fold Cross Validation

Page 20: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

20

Evaluation: Modified Cross Validation

1. Generate folds as per normal

2. Train classifier on training set as per normal

3. For each affiliate/official profile in test set:1. Input organization’s name to system2. Count number of correct results

4. Calculate precision/recall/F1 from counts

Page 21: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

21

Evaluation: Baseline

• Normalised Edit Distance: Username/Display Name and Query

• Emulates searching networks manually without examining profile in detail

Page 22: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

22

Results & Discussion: Twitter

F1 Precision Recall0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.559

0.716

0.458

0.862

0.947

0.791

Official

Baseline Final

F1 Precision Recall0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.7130.750

0.559

0.905 0.884 0.862

Affiliate

Baseline Final

Page 23: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

23

Results & Discussion: Facebook

F1 Precision Recall0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.7500.792

0.711

0.8840.945

0.830

Official

Baseline Final

F1 Precision Recall0.000

0.100

0.200

0.300

0.400

0.500

0.600

0.700

0.800

0.900

1.000

0.559

0.744

0.480

0.8620.816

0.639

Affiliate

Baseline Final

Page 24: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

24

Discussion

• Baseline performs well for official class on Facebook

• Username and display name alone are good indicators for this class• Other features still help, but not as much

Page 25: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

25

Discussion: Facebook Characteristics

• Many profile types: people, pages, places, etc.

• Finding official pages is simplified

• But: finding affiliates requires more effort

Page 26: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

26

Discussion: Facebook Characteristics

• Facebook doesn’t require a “username” be specified for pages• Will just use an ID instead

• Auto-generated pages also only have IDs, use name from Wikipedia/other sources

Page 27: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

27

Limitations

• Ground truth proportions: expand and/or balance

Page 28: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

28

Limitations

• Ground truth proportions: expand and/or balance

• Limited number of profiles retrieved for classification

Page 29: Linking Organizational Social Networking Profiles PROJECT ID: H0791030 JEROME CHENG ZHI KAI (A0080860H ) 1

29

Future Work

• Support additional networks

• Examine post content

• “Preferential” classification