leonid e. zhukov - leonid zhukovlecture outline 1 label propagation problem 2 collective classi...

26
Label propagation on graphs Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization of Networks Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 1 / 26

Upload: others

Post on 31-May-2020

13 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation on graphs

Leonid E. Zhukov

School of Data Analysis and Artificial IntelligenceDepartment of Computer Science

National Research University Higher School of Economics

Structural Analysis and Visualization of Networks

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 1 / 26

Page 2: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Lecture outline

1 Label propagation problem

2 Collective classificationIterative classification

3 Semi-supervised learningRandom walk based methodsGraph regularization

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 2 / 26

Page 3: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation

Label propagation - labeling of all nodes in a graph structure

Subset of nodes is labeled: categorical/numeric/binary values

Extend labeling to all nodes on the graph

Classification in networked data, network classification, structuredinference

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 3 / 26

Page 4: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation problem

Structure can help only if labels/values of linked nodes are correlated

Social networks show assortative mixing - bias in favor of connectionsbetween network nodes with similar characteristics:– homophily: similar characteristics → connections– influence: connections → similar characteristics

Can apply to constructed (induced) similarity networks

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 4 / 26

Page 5: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Network classification

Supervised learning approach

Given graph nodes V = Vl ∪ Vu:– nodes Vl given labels Yl

– nodes Vu do not have lables

Need to find Yu

Labels can be binary, multi-class, real values

Features (attributes) can be computed for every node φi :– local node features (if available)– link features available (labels from neighbors, attributes fromneighbours, node degrees, connectivity patterns)

Feature (design) matrix Φ = (Φl ,Φu)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 5 / 26

Page 6: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Network learning components

Local classifier. This is a local learned model, predicts node labelbased on node attributes. No network information

Relational classier. Takes into account labels and attributes of nodeneighbors. Uses neighborhood network information

Collective classifier. Estimates unknown values together applyingrelational classifier iteratively. Strongly depends on network structure

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 6 / 26

Page 7: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Collective classification

Algorithm: Iterative classification method

Input: Graph G (V ,E ), labels Yl

Output: labels Y

Compute Φ(0)

Train classifier on (Φ(0)l ,Yl)

Predict Y(0)u

repeat

Compute Φ(t)u

Train classifier on (Φ(t),Y (t))

Predict Y(t+1)u from Φ

(t)u

until Y(t)u converges;

Y ← Y (t)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 7 / 26

Page 8: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Iterative classification

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 8 / 26

Page 9: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Relational classifiers

Weighted-vote relational neighbor classifier:

P(yi = c |Ni ) =1

Z

∑i∈Ni

AijP(yj = c |Nj)

Network only Bayes classifier:

P(yi = c |Ni ) =P(Ni |c)P(c)

P(Ni )

where

P(Ni |c) =1

Z

∏j∈Ni

P(yj = yj |yi = c)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 9 / 26

Page 10: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Semi-supervised learning

Graph-based semi-supervised learning

Given partially labeled dataset

Data: X = Xl ∪ Xu

– small set of labeled data (Xl ,Yl)– large set of unlabeled data Xu

Similarity graph over data points G (V ,E ), where every vertex vicorresponds to a data point xi

Transductive learning: learn a function that predicts labels Yu for theunlabeled input Xu

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 10 / 26

Page 11: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Random walk methods

Consider random walk with absorbing states - labeled nodes Vl

Probability yi [c] for node vi ∈ Vu to have label c ,

yi [c] =∑j∈Vl

p∞ij yj [c]

where yi [c] - probability distribution over labels,pij = P(i → j) - one stept probaility transition matrixIf output requires single label per node, assign the most probableIn matrix form

Y = P∞Y

where Y = (Yl , 0), Y = (Yl , Yu)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 11 / 26

Page 12: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Random walk methods

Random walk matrix: P = D−1A

Random walk with absorbing states

P =

(Pll Plu

Pul Puu

)=

(I 0Pul Puu

)At the t →∞ limit:

limt→∞

Pt =

(I 0

(∑∞

n=0 Pnuu)Pul P∞uu

)=

(I 0

(I − Puu)−1Pul 0

)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 12 / 26

Page 13: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Random walk methods

Matrix equation(Yl

Yu

)=

(I 0

(I − Puu)−1Pul 0

)(Yl

Yu

)Solution

Yl = Yl

Yu = (I − Puu)−1PulYl

(I − Puu) is non-singular for all label connected graphs (is alwayspossible to reach a labeled node from any unlabeled node)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 13 / 26

Page 14: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation

Algorithm: Label propagation, Zhu et. al 2002

Input: Graph G (V ,E ), labels Yl

Output: labels Y

Compute Dii =∑

j Aij

Compute P = D−1AInitialize Y (0) = (Yl , 0), t=0repeat

Y (t+1) ← P · Y (t)

Y(t+1)l ← Y

(t)l

until Y (t) converges;

Y ← Y (t)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 14 / 26

Page 15: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation

IterationY (t) = P Y (t−1)

Y(t)u = PulYl + PuuY

(t−1)u

Y(t)u =

t∑n=1

Pn−1uu PulYl + Pt

uuYu

ConvergenceY = lim

t→∞Y (t) = (I − Puu)−1PulYl

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 15 / 26

Page 16: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label spreading

Algorithm: Label spreading, Zhou et. al 2004

Input: Graph G (V ,E ), labels Yl

Output: labels Y

Compute Dii =∑

j Aij ,

Compute S = D−1/2AD−1/2

Initialize Y (0) = (Yl , 0), t=0repeat

Y (t+1) ← αSY (t) + (1− α)Y (0)

t ← t + 1

until Y (t) converges;

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 16 / 26

Page 17: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label spreading

IterationsY (t) = αSY (t−1) + (1− α)Y (0)

At the t →∞ limit:

limt→∞

Y (t) = limt→∞

(αSY (t−1) + (1− α)Y (0)

)Y = αSY + (1− α)Y (0)

SolutionY = (1− α)(I − αS)−1Y (0)

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 17 / 26

Page 18: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Regularization on graphs

Find labeling Y = (Yl , Yu) that

Consistent with initial labeling:∑i∈Vl

(yi − yi )2 = ||Yl − Yl ||2

Consistent with graph structure (regression function smoothness):

1

2

∑i ,j∈V

Aij(yi − yj)2 = Y T (D − A)Y = Y TLY

Stable (additional regularization):

ε∑i∈V

y2i = ε||Y ||2

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 18 / 26

Page 19: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Regularization on graphs

Minimization with respect to Y , arg minY Q(Y )

Label propagation [Zhu, 2002]:

Q(Y ) =1

2

∑i ,j∈V

Aij(yi − yj)2 = Y TLY , with fixed Yl = Yl

Label spread [Zhou, 2003]:

Q(Y ) =1

2

∑ij∈V

Aij

(yi√d i

−yj√d j

)2

+ µ∑i∈V

(yi − yi )2

Q(Y ) = Y TLY + µ||Y − Y ||2

L = I − S = I − D−1/2AD−1/2

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 19 / 26

Page 20: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Regularization on graphs

Laplacian regularization [Belkin, 2003]

Q(Y ) =1

2

∑ij∈V

Aij (yi − yj)2 + µ

∑i∈Vl

(yi − yi )2

Q(Y ) = Y TLY + µ||Yl − Yl ||2

Use eigenvectors (e1..ep) from smallest eigenvalues of L = D − A:

Lej = λjej

Construct classifier (regression function) on eigenvectors

Err(a) =∑i∈Vl

(yi −p∑

j=1

ajeji )2

Predict value (classify) yi =∑p

j=1 ajeji , class ci = sign(yi )

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 20 / 26

Page 21: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Laplacian regularization

Algorithm: Laplacian regularization, Belkin and Niyogy, 2003

Input: Graph G (V ,E ), labels Yl

Output: labels Y

Compute Dii =∑

j Aij

Compute L = D − ACompute p eigenvectors e1..ep with smallest eigenvalues of L, Le = λeMinimize over a1...aparg mina1,...ap

∑li=1(yi −

∑pj=1 ajeji )

2, a = (ETE )−1ETYl

Label vi by the sign(∑p

j=1 ajeji )

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 21 / 26

Page 22: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation example

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 22 / 26

Page 23: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation example

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 23 / 26

Page 24: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation example

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 24 / 26

Page 25: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

Label propagation example

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 25 / 26

Page 26: Leonid E. Zhukov - Leonid ZhukovLecture outline 1 Label propagation problem 2 Collective classi cation Iterative classi cation 3 Semi-supervised learning Random walk based methods

References

S. A. Macskassy, F. Provost, Classification in Networked Data: AToolkit and a Univariate Case Study. Journal of Machine LearningResearch 8, 935-983, 2007Bengio Yoshua, Delalleau Olivier, Roux Nicolas Le. Label Propagationand Quadratic Criterion. Chapter in Semi-Supervised Learning, Eds.O. Chapelle, B. Scholkopf, and A. Zien, MIT Press 2006Smriti Bhagat, Graham Cormode, S. Muthukrishnan. Nodeclassification in social networks. Chapter in Social Network DataAnalytics, Eds. C. Aggrawal, 2011, pp 115-148D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Scholkopf. Learningwith local and global consistency. In NIPS, volume 16, 2004.X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learningusing Gaussian fields and harmonic functions. In ICML, 2003.M. Belkin, P. Niyogi, V. Sindhwani. Manifold regularization: Ageometric framework for learning from labeled and unlabeledexamples. J. Mach. Learn. Res., 7, 2399-2434, 2006

Leonid E. Zhukov (HSE) Lecture 17 19.05.2015 26 / 26