e6885 network science lecture 6cylin/course/netsci/netsci... · more targeted advertising data...

51
© 2013 Columbia University E6885 Network Science Lecture 6: Network Topology Inference E 6885 Topics in Signal Processing -- Network Science Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University October 14 th , 2013

Upload: others

Post on 10-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University

E6885 Network Science Lecture 6: Network Topology Inference

E 6885 Topics in Signal Processing -- Network Science

Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University

October 14th, 2013

Page 2: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University2 E6885 Network Science – Lecture 6: Topology Inference

Course Structure

Class Date Lecture Topics Covered

09/09/13 1 Overview of Network Science

09/16/13 2 Network Representation and Feature Extraction

09/23/13 3 Network Paritioning, Clustering and Visualization

09/30/13 4 Network Analysis Use Case

10/07/13 5 Network Sampling, Estimation, and Modeling

10/14/13 6 Network Topology Inference

10/21/13 7 Network Information Flow

10/28/13 8 Graph Database

11/11/13 9 Final Project Proposal Presentation

11/18/13 10 Dynamic and Probabilistic Networks

11/25/13 11 Information Diffusion in Networks

12/02/13 12 Impact of Network Analysis

12/09/13 13 Large-Scale Network Processing System

12/16/13 14 Final Project Presentation

Page 3: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University3 E6885 Network Science – Lecture 6: Topology Inference

Correlation Networks

Pearson product-moment correlation between two nodes.

Empirical correlations

If the pair of variables (Xi,Xj) has a bivariate Gaussian distribution, the density under H0: has a concise closed-form expression, but it is somewhat complicated and requires numerical integration or tables to produce p-values.

( , ) ijij i j

ii jj

corr X Xs

rs s

= =

ˆˆ

ˆ ˆij

ij

ii jj

sr

s s=

ˆijr 0ijr =

Page 4: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University4 E6885 Network Science – Lecture 6: Topology Inference

Partial Correlation Networks

Important -- ‘Correlation does not imply causation’

For instance:

–Two vertices may have highly correlated attributes because the vertices somehow strongly ‘influence’ each other in a direct fashion.

–Alternatively, their correlation may be high primarily because they each are strongly influenced by a third vertex.

Need more considerations!!

Page 5: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University5 E6885 Network Science – Lecture 6: Topology Inference

Partial Correlation Networks (cont’d)

If it is felt desirable to construct a graph G where the inferred edges are more reflective of direct influence among vertices, rather than indirect influence, the notion of partial correlation becomes relevant.

The partial correlation of attributes Xi and Xj of vertices i, j, defined with respect to the attributes Xk1, … Xkm of vertices k1, … km in , is the correlation between Xi and Xj left over after adjusting for those effects of Xk1, … Xkm common to both.

Let Sm={k1, … km }, we define the partial correlation of Xi and Xj , adjusting for , as

1,..., \{ , }mk k V i jÎ

1( ,..., )

m

TS k kmX X=X

||

| |

m

m

m m

ij Sij S

ii S jj S

sr

s s=

Page 6: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University6 E6885 Network Science – Lecture 6: Topology Inference

Partial Correlation Networks (cont’d)

Here

for and .

and then , , and are the diagonal and off-diagonal elements of this 2x2 partial covariance matrix.

For example, for a given choice of m, we may dictate that an edge be present only when there is correlation between Xi and Xj regardless of which m other vertices are conditioned upon.

1 11 12

2 21 22

CovS Sæ ö é ù

=ç ÷ ê úS Sè ø ë û

W

W

1 ( , )Ti jX X=W

2 mS=W X

| |m mij S ji Ss s=| mii Ss | mjj Ss

111|2 11 12 22 21

-S = S -S S S

{ }(2) ( )| \{ , }{ , } : 0,

m

mij S m i jE i j V S Vr= Î ¹ " Î

Page 7: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University7 E6885 Network Science – Lecture 6: Topology Inference

Gaussian Graphical Model Networks

A special and popular case of the use of partial correlation coefficients is when m=Nv-2 and the attributes are assumed to have a multivariate Gaussian joint distribution.

Here the partial correlation between attributes of two vertices is defined conditional upon the attribute information at all other vertices.

The graph with edge set

is called a conditional independence graph. The overall model, combing the multivariate Gaussian distribution with the graph G, is called a Gaussian graphical model.

The partial correlation coefficients may be expressed in the form:

where is the (I,j)-th entry of , the inverse of the covariance matrix of vertex attributes.

{ }(2)| \{ . }{ , } : 0ij V i jE i j V r= Î ¹

| \{ , }ij

ij V i j

ii jj

wr

w w

-=

ijw 1-W = S

Page 8: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University8 E6885 Network Science – Lecture 6: Topology Inference

Inferring Interests from Social Graph

Page 9: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University9 E6885 Network Science – Lecture 6: Topology Inference

On the quality of inferring interests from social neighbors (Wen and Lin, KDD 2010)

Modeling user interests enables personalized servicesMore relevant search/recommendation results

More targeted advertising

Data about users are sparseMany user profiles are static, incomplete and/or outdated

<10% employees actively participate social software [Brzozowski2009]

Inferring user interests from neighbors can be a solutionAlso bring up a concern of exposing user’s private information

How true are “You are who you know”, “Birds of a Feather Flocks

Together”?

Page 10: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University10 E6885 Network Science – Lecture 6: Topology Inference

Challenges in Observing Users

Diverse types of media

Public social media (friending, blogs, etc.)

• Data are public but limited (esp. in enterprises)

Private communication media (email, instant messaging, face-to-face meetings, etc)

• Much more data

• Privacy is a major issue

Page 11: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University11 E6885 Network Science – Lecture 6: Topology Inference

Example of Diverse Types of Media

Number of people participated in top 3 media in an Enterprise with 400K employees

Number of entries:• Social bookmarking: 400K• Electronic communication: 20M• File sharing: 140K

Page 12: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University12 E6885 Network Science – Lecture 6: Topology Inference

Our Goals

How well a user’s interests can be inferred from his/her social neighbors?

Can the diverse types of media be combined to improve inferring user interests from social neighbors?

Can the quality of the inference be predicted based on features of social neighbors?

Only sufficiently accurate inference may help personalized services

Page 13: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University13 E6885 Network Science – Lecture 6: Topology Inference

Our Approach

Infer user interests from social neighbors

Model user interests based on multiple types of information they accessed

Construct employee social network from communication data

Infer using social influence model

Study the relationship between inference quality and network characteristics

Identify effective factors to ensure high quality results for applications

Page 14: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University14 E6885 Network Science – Lecture 6: Topology Inference

Dataset

25315 users’ contributed content

– 20M email/chats

– 400K social bookmarks

– 20K shared public files

– Profile information

• Job role, division, news categories of interests, etc

Infer social network based on email/chats

X’: number of emails

Page 15: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University15 E6885 Network Science – Lecture 6: Topology Inference

User Interests Model – Implicit Interests

Model users’ interests implicitly indicated by their contributed content

Extract latent topics from the multiple types of content using LDA

Select top-N distinct topics as the implicit interests model of a user

The degree the user is interested

The similarity of topics

Page 16: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University16 E6885 Network Science – Lecture 6: Topology Inference

User Interests Model – Explicit Interests

29% users manually specify interests in their profile

A list of selected terms

• From a static 1120-term taxonomy related to work

Compare implicit and explicit interests

Explicit interests models are more limited

• Implicit interests cover 60.4% explicit interests

• Explicit interests cover 2.2% implicit interests

Page 17: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University17 E6885 Network Science – Lecture 6: Topology Inference

Infer Interests Based on Social Influence

Social influence model

Network autocorrelation model [Leenders02]

• Social influence represented as a weighted combination of neighbors’ attributes

The weight is an exponential function of the social distance

Page 18: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University18 E6885 Network Science – Lecture 6: Topology Inference

Inference Quality

Condition Max Mean St. Deviation

Using social bookmark data only 59.4% 19.2% 10.7%

Using file sharing data only 44.9% 12.7% 7.2%

Using email/IM data only 62.1% 29.6% 14.1%

Using all three data 100% 45.1% 21.7%

Implicit interests: how close the inferred top-20 topics to the ground truth

– Significant advantage in combining multiple sources

– Large variance can affect practical application, thus need predict when to infer interests

– Much better recall than precision

Explicit interests: precision and recall of inferred terms

Measure Mean St. Deviation

Precision 30.1% 26.9%

Recall 61.5% 27.6%

Page 19: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University19 E6885 Network Science – Lecture 6: Topology Inference

Can Inference Quality be Predicted?

Hypothesis: inference quality can be predicted from social network properties

–User activeness: the amount of contribution

–In-degree

–Out-degree

–Betweenness

–User management role

Use Support Vector Regression to perform prediction

Evaluate prediction

–Precision/recall of the prediction (10-fold cross validation)

–Use prediction to improve inference• Only infer when we predict it’s high quality

Page 20: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University20 E6885 Network Science – Lecture 6: Topology Inference

Quality Prediction ResultsPrecision/recall of prediction

Implicit Interests Explicit Interests

Page 21: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University21 E6885 Network Science – Lecture 6: Topology Inference

Quality Prediction Results

Improve inference

Measure Improved to Improvement (%)

Precision 60.5% 101%

Recall 85.7% 39.3%

Implicit Interests

Explicit Interests

Page 22: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University22 E6885 Network Science – Lecture 6: Topology Inference

Feature Comparison “Leave-one-feature-out" comparisons of prediction results

Most social influences are from 1&2-degree neighbors

You neighbors decide how well you can be inferred

Page 23: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University23 E6885 Network Science – Lecture 6: Topology Inference

Feature Comparison (cont’d) “Leave-one-feature-out" comparisons of prediction results

You neighbors’ network positions may be even more important than how active they are

– Formal organizational properties

• Manager neighbors are more important in inferencei.e., more social influence (about 5% more)

Page 24: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University24 E6885 Network Science – Lecture 6: Topology Inference

Conclusion of the Wen-Lin 2010 KDD paper

There’s large variance in the quality of inferring user interests from social neighbors

The “recall” of the inference is much better than “precision”

The inference quality may be predicted from social network properties

Page 25: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University25 E6885 Network Science – Lecture 6: Topology Inference

Tomographical Inference

Page 26: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University26 E6885 Network Science – Lecture 6: Topology Inference

Tomographic Network Topology Inference

Inference of ‘interior’ components of a network – both vertices and edges – from data obtained at some subset of ‘exterior’ vertices.

E.g., in computer networks, desktop and laptop computers are typical instances of ‘exterior’ vertices, while Internet routers to which we do not have access are effecitvely ‘interior’ vertices.

Network tomography: describe prblems in the context of computer network monitoring, in which aspects of the ‘internal’ workings of the network, suchas intensity of traffic flowing between vertices, are inferred from ‘external’ measurements.

This is an’ill-posed inverse proble’ in mathematics. Mapping being many-to-one.

For the tomographic inference of network topologies, a key structural simplification has been the restriction to inference of networks in the form of trees.

Page 27: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University27 E6885 Network Science – Lecture 6: Topology Inference

Tomographic Inference of Tree Topologies A tree structure.

Page 28: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University28 E6885 Network Science – Lecture 6: Topology Inference

Tomographic network inference problem

Suppose a set of Nl vertices, we have n iid observations of some random variables {X1, … XNl}. We aim to find that three of all binary trees with Nl labeled leaves that best explains the data.

Example: Multicast probes

Page 29: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University29 E6885 Network Science – Lecture 6: Topology Inference

Two most popular classes of methods

Hierarchical clustering and related ides

Likelihood-based methods

Page 30: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University30 E6885 Network Science – Lecture 6: Topology Inference

Hierarchical Clustering- based methods

We treat the Nl leaves as the ‘objects’ to be clustered and the tree corresponding to the resulting clustering as our inferred tree.

The tree corresponding to the entire set of parititions is our focus.

Example: Rantnasamy and McCanne (1999): the observed rate of shared losses of packets should be fairly indicative of how close two leaf vertices (i.e., destination addresses) are on a multicast tree T.

– Two different types of shared loss between a pair of leaf vertices – ‘true’ and ‘false’ shared loss.

– The true shared losses are due to loss of packets on the path common to the vertices i and j.

– The false shared losses would refer to cases where packets were lost separately on the two paths from i1 to the vertices 1 and 3.

Page 31: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University31 E6885 Network Science – Lecture 6: Topology Inference

Likelihood-based Methods

If we are willing to specify probability models, then we have the potential for likelihood-based methods of inference.

In general, maximum likelihood inference of tree topologies may also be pursued through the use of MCMC. MCMC is critical to the use of Bayesian methods to tomographic inference of trees.

Page 32: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University32 E6885 Network Science – Lecture 6: Topology Inference

Bayesian Inference for Content Classification

Page 33: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University33 E6885 Network Science – Lecture 6: Topology Inference

river

TOPIC 2

river

riverstreambank

bank

stream

loan

TOPIC 1

money

loan

bank money

ban

k bank

loanDOCUMENT 2: loan1 river2 stream2 loan1 bank2 river2 bank2

bank1 stream2 river2 loan1 bank2 stream2 bank2 money1 loan1 river2 stream2 bank2 stream2 bank2 money1 river2

DOCUMENT 1: money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1

stream2 bank1 money1 loan1 river2 stream2 bank1 money10.3

0.8

0.2

Application – Content Clustering

0.7

Goal – categorize the documents into topics

Each document is a probability distribution over topics Each topic is a probability distribution over words

Mixture components

Mixture weights

Page 34: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University34 E6885 Network Science – Lecture 6: Topology Inference

Content Analysis Prior Art I -- Latent Semantic Analysis

Latent Semantic Analysis (LSA) [Landauer, Dumais 1997]

– Descriptions:

• Capture the semantic concepts of documents

by mapping words into the latent semantic

space which captures the possible synonym

and polysemy of words

• Training based on different level of documents.

Experiments show the synergy of the # of

training documents and the psychological

studies of students at 4th, 10th, and college

level. Used as an alternative to TOEFL test.

– Based on truncated SVD of document-term matrix:

optimal least-square projection to reduce

dimensionality

– Capture the concepts instead of words

• Synonym

• Polysemy

X T0

S0

D0

N x M N x K K x K K x M

· ·=te

rms

documents

00

~

LSA

Page 35: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University35 E6885 Network Science – Lecture 6: Topology Inference

Traditional Content Clustering

fw2

fwj : the frequency of the word wj

in a document

fw1

fw3 Clustering:Partition the feature space into segments based on training documents. Each segment represents a topic / category. ( Topic Detection)

Hard clustering: e.g., K-mean clustering

1 2{ , ,..., }

Nw w wd f f f z= ®

( | )P Z wW = f

Soft clustering: e.g., Fuzzy C-mean

clustering

w1

: observationsz1TopicsTopics

WordsWords

z2 z3 z4 z5

w5w2 w3 w4 w6

Another representation of clustering

d1DocumentsDocuments d5d2 d3 d4 d6

Page 36: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University36 E6885 Network Science – Lecture 6: Topology Inference

Traditional Content Clustering

fw2

fwj : the frequency of the word wj

in a document

fw1

fw3 Clustering:Partition the feature space into segments based on training documents. Each segment represents a topic / category. ( Topic Detection)

Hard clustering: e.g., K-mean clustering

1 2{ , ,..., }

Nw w wd f f f z= ®

w1

: observationsz1

TopicsTopics

WordsWords

z2 z3 z4 z5

w5w2 w3 w4 w6Another representation of clustering (w/o showing the deterministic part)

( | )P Z wW = f

Soft clustering: e.g., Fuzzy C-mean

clustering

Page 37: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University37 E6885 Network Science – Lecture 6: Topology Inference

Content Clustering based on Bayesian Network

( | )P W Z

Bayesian Network:• Causality Network – models the causal relationship of attributes / nodes

• Allows hidden / latent nodes

Hard clustering:

w1 : observations

z1TopicsTopics

WordsWords

z2 z3 z4 z5

w5w2 w3 w4 w6

soft clustering

d1DocumentsDocuments d2 d3

( | ) ( )( | )

( )

P Z W P WP W Z

P Z=

( | )P Z D

( ) arg max ( | )z

h D d P Z= = wW = f

( )h D

hard clustering

s. c.

<= MLE

<= Bayes Theorem

Page 38: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University38 E6885 Network Science – Lecture 6: Topology Inference

Content Clustering based on Bayesian Network – Hard Clustering

w1

: observations

z1TopicsTopics

WordsWords

z2 z3 z4 z5

w5w2 w3 w4 w6

( | ) ( ) ( | ) ( )( | )

( ) ( | )

P Z W P W P Z W P WP W Z

P Z P Z W dW= =

ò

z

w

N: the number of words(The number of topics (M) are

pre-determined)

Major Solution 1 -- Dirichlet Process:• Models P( W | Z) as mixtures of Dirichlet probabilities

• Before training, the prior of P(W|Z) can be a easy

Dirichlet (uniform distribution). After training, P(W|Z)

will still be Dirichlet. ( The reason of using Dirichlet)

Major Solution 2 -- Gibbs Sampling:• A Markov chain Monte Carlo (MCMC) method for

integration of large samples calculate P(Z)

z

w

N

bM

f

Topic-WordTopic-Worddistributionsdistributions

Latent Dirichlet Allocation (LDA) (Blei 2003)

shown as

Page 39: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University39 E6885 Network Science – Lecture 6: Topology Inference

river

TOPIC 2

river

riverstreambank

bank

stream

loan

TOPIC 1

money

loan

bank money

ban

k bank

loanDOCUMENT 2: loan1 river2 stream2 loan1 bank2 river2 bank2

bank1 stream2 river2 loan1 bank2 stream2 bank2 money1 loan1 river2 stream2 bank2 stream2 bank2 money1 river2

DOCUMENT 1: money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1

stream2 bank1 money1 loan1 river2 stream2 bank1 money10.3

0.8

0.2

Latent Dirichlet Allocation (LDA) [Blei et al. 2003]

0.7

Goal – categorize the documents into topics

Each document is a probability distribution over topics Each topic is a probability distribution over words

Mixture components

Mixture weights ( ) ( ) ( )

1

|T

i i i ij

P w P w z j P z j=

= = =åThe probability of ith word in a given document

Mixture components

Mixture weights

Page 40: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University40 E6885 Network Science – Lecture 6: Topology Inference

INPUT: document-word counts

• D documents, W words

OUTPUT: Mixture Components Mixture Weights

LDA (cont.)

Parameters can be estimated by Gibbs Sampling

Mixture Components:

Mixture weights:

θθ

wwWD

ββ

αα

zz

T

T: number of topics

f: Observations

Bayesian approach: use priors Mixture weights ~ Dirichlet() Mixture components ~ Dirichlet()

( ) ( ) ( )1

|T

i i i ij

P w P w z j P z j=

= = =å

( )( )djq( )( )j

wf

( )jwf( )d

jq

Page 41: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University41 E6885 Network Science – Lecture 6: Topology Inference

Comparison of Dirichlet Distribution with Gaussian Mixture Models

Dirichlet Distribution:1 21 1 1

1 2 1 1 2 1 2

1

( )( , ,..., ; , ,..., ) ...

( )

ra a ar r rr

kk

Nf f f a a a f f f

ar - - -

-

=

G=

0 1kf£ £1

1r

kk

f=

Multivariate Gaussian: 2 2 2( ) ( ) ( )1 1 2 2 1 12 2 21 2 1

1 2 1 1 1 2 2 1 1 11

1

1( , ,..., ; , , , ,..., , ) ...

(2 )

f f fr r

rr r r r

rk

k

f f f e e em m m

s s sr m s m s m sp s

- - -- -

-- - -

- - - --

=

=

Õ

Page 42: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University42 E6885 Network Science – Lecture 6: Topology Inference

Beyond Gaussian: Examples of Dirichlet Distribution

Page 43: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University43 E6885 Network Science – Lecture 6: Topology Inference

Importance of Dirichlet Distribution

In 1982, Sandy Zabell proved that, if we make certain assumptions about an individual’s beliefs, then that individual must use the Dirichlet density function to quantify any prior beliefs about a relative frequency.

Page 44: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University44 E6885 Network Science – Lecture 6: Topology Inference

Use Dirichlet Distribution to model prior and posterior beliefs (I)

Prior beliefs:

E.g.: *fair* coin?

– Flipping a coin, what’s the probability of getting ‘head’.

( ) ( ; , )f beta f a br =

beta(1,1): Never tossed that coin.No prior knowledge

Page 45: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University45 E6885 Network Science – Lecture 6: Topology Inference

Use Dirichlet Distribution to model prior and posterior beliefs (II)

( ) ( ; , )f beta f a br =

beta(1,1): No prior knowledge beta(3,3): posterior knowledge after 2 head trials (s) and 2 tail trials (t)

-- this coin may be fair

d: 2, 2

Prior beliefs:

Posterior beliefs:

( | ) ( ; , )f d beta f a s b tr = + +

Page 46: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University46 E6885 Network Science – Lecture 6: Topology Inference

Use Dirichlet Distribution to model prior and posterior beliefs (III)

Prior beliefs:

Posterior beliefs:( ) ( ; , )f beta f a br =

( | ) ( ; , )f d beta f a s b tr = + +

d: 8, 2

beta(3,3): prior knowledge -- this coin may be fair

beta(11,5): posterior belief-- the coin may not be fair after

tossing 8 heads and 2 tails

Page 47: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University47 E6885 Network Science – Lecture 6: Topology Inference

Use Dirichlet Distribution to model prior and posterior beliefs (IV)

Another Example:

– Our belief that an event, which in general has 5% occurrence possibility, may be true in our case.

d: 3,0

beta(1/360,19/360): prior knowledge that an 5% chance-event may be true

beta(1/360+3,19/360): posterior belief that an 5% chance-event may be true

Page 48: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University48 E6885 Network Science – Lecture 6: Topology Inference

Gibbs Sampling (I)

Given a PDF, ,usually it’s difficult to get an integration, .

We can use Gibbs Sampling to simulate the *samples* that follows the PDF. Thus, we can do an integration by summing from those samples.

( ) ( )f X X dXr×ò( )Xr

( )sf X

,1 ,2 ,{ , ,..., }s s s s TX X X X=

Page 49: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University49 E6885 Network Science – Lecture 6: Topology Inference

Gibbs Sampling (II)

Gibbs Sampling Find a Markov Chain

, s.t., the outcome of samples follow the PDF

1( | )s i iX Xr -

( )Xr

x1 x2

1 2( , )X x x=

1, 1, 2, 1( | )s i ix xr -

2, 2, 1,( | )s i ix xr

Example: 2-dimensional X vector:

1( | )s i iX Xr - is determined by

x1

x2

X1

X2

X3

X4

Derived from the PDF

1( | )s i iX Xr -

Page 50: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University50 E6885 Network Science – Lecture 6: Topology Inference

Some Insight on BN-based Content Clustering

Content Clustering:• Because documents and words are

dependent,

only close documents in the feature

space can be clustered together as

one topic.

fw2

fwj : the frequency of the word wj

in a document

fw1

fw3

Incorporating human factors can possibly *link* multiple clusters together.

Bayesian Network:• Models the *practical* causal

relationships..

Page 51: E6885 Network Science Lecture 6cylin/course/netsci/NetSci... · More targeted advertising Data about users are sparse Many user profiles are static, incomplete and/or outdated

© 2013 Columbia University51 E6885 Network Science – Lecture 6: Topology Inference

Why We Need Simultaneous Multimodality Clustering?

Multiple-Step Clustering:• e.g., Naïve way to combine content filtering and collaborative filtering

Independently cluster first. Combine later.

fw2

fw1

fw3

OK

fw2

fw1

fw3

Not-OK

Simultaneous Multimodality Clustering is important.