Transcript
Page 1: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Matching Profiles of Facebook and VK UsersThe First International Conference on Social Network Analysis,

Higher School of Economics, Moscow, Russia

Alexander Panchenko1,2, Dmitry Babaev1, Sergey Objedkov3

[email protected]

1 – Digital Society Laboratory, 2 – UCLouvain, 3 – HSE

November 21, 2014

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 2: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Outline

1 The Problem

2 The Data

3 The Method

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 3: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Outline

1 The Problem

2 The Data

3 The Method

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 4: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Users in Russia has 2.5 profiles in average

Facebook (FB) and VKontakte (VK)

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 5: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Problem

Motivation

input: a user profile of one social networkoutput: profile of the same person in another social networkimmediate applications in marketing, search, security, etc.

Contribution

user identity resolution approachprecision of 0.98 and recall of 0.54the method is computationally effective and easily parallelizable

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 6: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Related Work

Several researchers recently tried to tackle this problem:Balduzzi et al. Abusing social networks for automated userprofiling. Springer, 2010.Bartunov et al. Joint link-attribute user identity resolution inonline social networks. SNA-KDD Workshop at KDD, 2012.P. Jain et al. i seek’fb.me’: identifying users across multipleonline social networks. WWW, 2013.Malhotra et al. Studying user footprints in different onlinesocial networks. IEEE Computer Society, 2012.Sironi. Automatic alignment of user identities inheterogeneous social networks. 2012.Veldman. Matching profiles from social network sites. 2009.

BUT:Our experiment is the most large-scale up to date.

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 7: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Outline

1 The Problem

2 The Data

3 The Method

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 8: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Facebook (FB) and VKontakte (VK) types of data

Profiles: a set of user attributescategorical variables (region, city, profession, etc.)integer variables (age, graduation year, etc.)text variables (name, surname, etc.)

Network: a graph that relates usersfriendship graphfollowers graphcommenting graph, etc.

Texts:postscommentsgroup titles and descriptions

Multimedia content:AvatarPhotosVideosMusic

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 9: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Dataset

VK FacebookNumber of users in our dataset 89,561,085 2,903,144Number of users in Russia 1 100,000,000 13,000,000User overlap 29% 88%

training set: 92,488 matched FB-VK profiles

1According to comScore and http://vk.com/aboutAlexander Panchenko Matching Profiles of Facebook and VK Users

Page 10: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

How training data can be obtained?

. . . also valid for the “cheap matching”!

Link to FB in VK profileLink to FB and VK in a third network, e.g. LJ or FoursquareLinking by emailLinking by phone

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 11: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Gathering of VK and FB data

Big Data: VK worth tens or even hundreds of TBDecide what do you need (posts, profiles, etc.).Download:

APIScraping

Download limits and API limitations are specific for eachnetwork.Parallelization is very practical, especially horizontal one:

Amazon EC2, Distributed Message Queues

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 12: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Storing VK and FB data

Again, Big DataNoSQL solutions are helpfulRaw data: Amazon S3For analysis: HDFSEfficient retrieval: Elastic Search

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 13: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Outline

1 The Problem

2 The Data

3 The Method

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 14: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Profile matching algorithm

1 Candidate generation. For each VK profile we retrieve a setof FB profiles with similar first and second names.

2 Candidate ranking. The candidates are ranked according tosimilarity of their friends.

3 Selection of the best candidate. The goal of the final stepis to select the best match from the list of candidates.

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 15: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Candidate generation

Retrieve FB users with names similar to an input VK profile.Two names are similar if:

the first letters are the samethe edit distance between names ≤ 2

Levenshtein Automata for edit distance of namesUse an automatically extracted dictionary of namesynonyms:

“Alexander”, “Sasha”, “Sanya”, “Sanek”, etc.

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 16: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Candidate ranking

The higher the number of friends with similar names in VKand FB profiles, the greater the similarity of these profiles.Two friends are considered to be similar if:

First two letters of their last names matchSimilarity between first/last names sims are greater thanthresholds α, β:

sims(si , sj) = 1− lev(si , sj)max(|si |, |sj |)

,

Contribution of each friend to similarity simp of two profilespvk and pfb is inverse of name expectation frequency:

simp(pvk , pfb) =∑

j :sims(sfi ,s

fj )>α∧sims(ss

i ,ssj )>β

min(1,N

|s fj | · |ss

j |).

Here s fi and ss

i are first and second names of a VK profile,correspondingly, while s f

j and ssj refer to a FB profile.

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 17: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Best candidate selection

FB candidates are ranked according to similarity simp to aninput profile pvk

The best candidate pfb should pass two thresholds to match:

its score should be higher than the score threshold γ:

simp(pvk , pfb) > γ.

either the only candidate or score ratio between it and the nextbest candidate p′

fb should be higher than the ratio threshold δ:

simp(pvk , pfb)

simp(pvk , p′fb)

> δ.

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 18: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Results

Figure : Precision-recall plot of the matching method. The bold linedenotes the best precision at given recall.

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 19: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Results: matching VK and FB profiles

First name threshold, α 0.8Second name threshold, β 0.6Profile score threshold, γ 3Profile ratio threshold, δ 5Number of matched profiles 644,334 (22%)Expected precision 0.98Expected recall 0.54

Alexander Panchenko Matching Profiles of Facebook and VK Users

Page 20: Matching Profiles of Facebook and VK UsersIndex

The Problem The Data The Method

Thank you! Questions?

Alexander Panchenko Matching Profiles of Facebook and VK Users


Top Related