basics and advances of semi-supervised learning · 2011-11-13 · basics and advances of...

111
Basics and Advances of Semi-supervised Learning Irwin King 1 and Zenglin Xu 2 1 Computer Science and Engineering The Chinese University of Hong Kong Shatin, N. T., Hong Kong 2 Department of Computer Science Purdue University West Lafayette, IN 47906 US ICONIP 2011 King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 1 / 88

Upload: others

Post on 26-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics and Advances of Semi-supervised Learning

Irwin King1 and Zenglin Xu2

1Computer Science and EngineeringThe Chinese University of Hong Kong

Shatin, N. T., Hong Kong

2Department of Computer SciencePurdue University

West Lafayette, IN 47906 US

ICONIP 2011

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 1 / 88

Page 2: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Outline

1 Basics of Semi-supervised Learning

2 Advanced Topics

3 An Empirical Example

4 Conclusion

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 2 / 88

Page 3: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning

Outline

1 Basics of Semi-supervised LearningSemi-supervised LearningProbabilistic MethodsCo-trainingGraph-based Semi-supervised LearningSemi-supervised Support Vector Machine

2 Advanced TopicsTheory of semi-supervised learningAdvanced algorithms of semi-supervised learning

Variational settingLarge scale learning

3 An Empirical Example

4 Conclusion

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 3 / 88

Page 4: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

A problem example

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 4 / 88

Page 5: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

What is semi-supervised learning

Semi-supervised learning

Semi-supervised learning (SSL) is a class of machine learning techniquesthat make use of both labeled and unlabeled data for training.

Supervised learning

Unsupervised learning

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 5 / 88

Page 6: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Learning paradigms

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 6 / 88

Page 7: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Types of semi-supervised learning

Semi-supervised Classification

Given l labeled instances, {(xi , yi )}li=1, and u unlabeled instances,{xi}l+u

i=l+1 for training

Constrained clustering

Given unlabeled instances {xi}ni=1, and “supervised information”, e.g.,must-links, cannot-links.

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 7 / 88

Page 8: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Concepts

labeled

unlabeled

semi-supervised learning

Drawn from the samedistribution

Share the same label

Surveys: [Zhu, 2005], [Chapelleet al., 2006]

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 8 / 88

Page 9: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Why we need semi-supervised learning?

Unlabeled data are usually abundant

Unlabeled data are usually easy to get

Labeled data can be hard to get

Labels may require human effortsLabels may require special devices

Results can also be good

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 9 / 88

Page 10: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Why we need semi-supervised learning?

Some applications of SSL

Web page classification:

Easy to crawl web pagesRequire human experts to label them, e.g., DMOZ

Telephone conversation transcription

400 hours annotation time for each hour of speech

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 10 / 88

Page 11: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Semi-supervised learning vs, transductive learning

Inductive semi-supervised learning

Given {(xi , yi )}li=1 and {xi}l+ui=l+1, learn a function f : X −→ Y so that f

is expected to be a good predictor on future data.

Transductive learning

Given {(xi , yi )}li=1 and {xi}l+ui=l+1, learn a function f : X l+u −→ Y l+u so

that f is expected to be a good predictor on the unlabeled data {xi}l+ui=l+1.

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 11 / 88

Page 12: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

How semi-supervised learning is helpful

SVM

SVM with unlabeled data

Semi-supervised SVM

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 12 / 88

Page 13: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

How semi-supervised learning is helpful

SVM

SVM with unlabeled data

Semi-supervised SVM

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 12 / 88

Page 14: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

How semi-supervised learning is helpful

SVM

SVM with unlabeled data

Semi-supervised SVM

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 12 / 88

Page 15: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

How semi-supervised learning is helpful

p(x) carries information that is helpful for the inference of p(y |x)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 13 / 88

Page 16: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Applications

Natural language processing

X: sentencey: parse tree

Spam filtering

X: emaily: decision(spam or not spam)

Video surveillance

X: video framey: decision(spam or not spam)

Protein 3D structure prediction

X: DNA sequencey: structure

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 14 / 88

Page 17: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

How semi-supervised learning is possible?

Assumptions or intuitions?

Cluster assumption (similarity)Manifold assumption (structural)Others

Which one is correct?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 15 / 88

Page 18: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Models

Self-training

Co-training

Probabilistic generative models

Graph-based models

Large margin based methods

Which one is good?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 16 / 88

Page 19: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Self-training

Maybe a simple way of using unlabeled data

Initialize L = {(xi , yi )}li=1 and U = {xj}ni=l+1

Repeat1 Train f from L using supervised learning2 Apply f to the unlabeled instances in U3 Remove a subset S from U ; add {(x, f (x))|x ∈ S} to L

Until U = φ

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 17 / 88

Page 20: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Learning

Self-training

A wrapper method

The choice of learner for f in step 3 is open

Good for many real world tasks, e.g., natural language processing

But mistake in choosing the f can reinforce itself

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 18 / 88

Page 21: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

A simple example of generative model

Gaussian mixture model (GMM)

Model parameters:θ = {πi , µi ,Σi}Ki=1, πi : classpriors, µi :Gaussian means,Σi :covariance matrices

Joint distribution

p(x, y|θ) = p(y|θ)p(x|y, θ)

=K∑i=1

πiN (x;µi ,Σi )

Classification:

p(y|x, θ) =p(x , y |θ)∑Ki=1 p(x, yi |θ)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 19 / 88

Page 22: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

Effect of unlabeled data in GMM

P(Xl ,Yl |θ) P(Xl ,Yl ,Xu|θ)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 20 / 88

Page 23: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

Generative model for semi-supervised learning

Assumption: knowledge of P(x, y|θ)

Joint and marginal distribution

p(Xl ,Yl ,Xu|θ) =∑Yu

p(Xl ,Yl ,Xu,Yu|θ)

Objective: find the maximum likelihood estimate (MLE) of θ, themaximum a posteriori (MAP) estimate, or be Bayesian

Optimization: Expectation Maximization (EM)

Applications:

Mixture of Gaussian distributions (GMM): image classificationMixture of multinomial distributions (Naıve Bayes): text categorizationHidden Markov Models (HMM): speech recognition

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 21 / 88

Page 24: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

Classification with GMM using MLE

With only labeled data (the supervised case)

log p(Xl ,Yl |θ) =∑l

i=1 log p(yi |θ)p(xi |yi , θ)MLE for θ trivial (sample mean and covariance)

With both labeled and unlabeled data (the semi-supervised case)

log p(Xl ,Yl ,Xu|θ) =∑li=1 log p(yi |θ)p(xi |yi , θ) +

∑l+ui=l+1 log

(∑y p(y |θ)p(xi |y , θ)

)MLE for θ not easy (hidden variables): EM

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 22 / 88

Page 25: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

EM for GMM

1 Initialize θ0 = {π, µ,Σ} on (Xl ,Yl),2 The E-step:

for all x ∈ Xu, compute the expected label

p(y|x, θ) =p(x , y |θ)∑Ki=1 p(x, yi |θ)

label all x ∈ Xu according with p(y|x, θ)

3 The M-step: update MLE θ with both Xl and Xu

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 23 / 88

Page 26: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

The assumption of mixture models

The assumption of mixture models

Data actually comes from the mixture model, where the number ofcomponents, prior p(y), and conditional p(x|y) are all correct.

This assumption could be WRONG!

Which one is correct?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 24 / 88

Page 27: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

The assumption of mixture models

Heuristics

Carefully construct the generative model, e.g., multiple Gaussiandistributions per class

Down-weight the unlabeled data 0 ≤ λ < 1

log p(Xl ,Yl ,Xu|θ) =l∑

i=1

log p(yi |θ)p(xi |yi , θ)

+ λ

l+u∑i=l+1

log

(∑y

p(y |θ)p(xi |y , θ)

)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 25 / 88

Page 28: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Probabilistic Methods

Summary

Assume a distribution for data

Unlabeled data are used to help to identify parameters inP(Xl ,Yl ,Xu|θ)

Incorrect assumption would degrade performance

Prior knowledge on data distribution is necessary

Would be helpful to combine with discriminative models

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 26 / 88

Page 29: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Two views

Two views for email classification:

TitleBody

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 27 / 88

Page 30: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Two views

Classify web pages into category for students and category forprofessorsTwo views of web page

Content: I am currently a professor of ...Hyperlinks: a link to the faculty list of computer science department

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 28 / 88

Page 31: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Why co-training?

Learners can learn from each other

Implied agreement between two learners

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 29 / 88

Page 32: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Co-training algorithm

Input:

Labeled data (Xl ,Yl), unlabeled data Xu

Each instance has two views x = [x(1), x(2)]

A learning speed k

Algorithm:

1 let L1 = L2 = (Xl ,Yl).2 Repeat until unlabeled data U = ∅:

1 Train view-1 f (1) from L1, view-2 f (2) from L2.2 Classify unlabeled data with f (1) and f (2) separately3 Add f (1)s top k most-confident predictions (x, f (1)(x)) to L2

4 Add f (2)s top k most-confident predictions (x, f (2)(x)) to L1

5 Remove these 2k instances from the unlabeled data U.

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 30 / 88

Page 33: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Schematic of a co-training algorithm

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 31 / 88

Page 34: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Illustration of co-training

1 Train a content-based classifier usinglabeled examples

2 Label the unlabeled examples that areconfidently classified

3 Train a hyperlink-based classifier

4 Label the unlabeled examples that areconfidently classified

5 Next iteration

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 32 / 88

Page 35: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Illustration of co-training

1 Train a content-based classifier usinglabeled examples

2 Label the unlabeled examples that areconfidently classified

3 Train a hyperlink-based classifier

4 Label the unlabeled examples that areconfidently classified

5 Next iteration

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 32 / 88

Page 36: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Illustration of co-training

1 Train a content-based classifier usinglabeled examples

2 Label the unlabeled examples that areconfidently classified

3 Train a hyperlink-based classifier

4 Label the unlabeled examples that areconfidently classified

5 Next iteration

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 32 / 88

Page 37: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Illustration of co-training

1 Train a content-based classifier usinglabeled examples

2 Label the unlabeled examples that areconfidently classified

3 Train a hyperlink-based classifier

4 Label the unlabeled examples that areconfidently classified

5 Next iteration

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 32 / 88

Page 38: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Illustration of co-training

1 Train a content-based classifier usinglabeled examples

2 Label the unlabeled examples that areconfidently classified

3 Train a hyperlink-based classifier

4 Label the unlabeled examples that areconfidently classified

5 Next iteration

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 32 / 88

Page 39: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Assumptions of co-training

Assumptions of co-training

Each view alone is sufficient to make good classifications

The two views are conditionally independently given the class label

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 33 / 88

Page 40: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Co-training

Summary

Key idea

Augment training examples of one view by exploiting the classifier ofthe other view

Extension to multiple views

Problem: how to find equivalent views

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 34 / 88

Page 41: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Graph-based semi-supervised learning

Introduction

Label propagation

Graph partition

Harmonic function

Manifold regularization

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 35 / 88

Page 42: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Graph-based semi-supervised learning

Key idea

Construct a graph with nodes being instances and edges beingsimilarity measures among instances

Look for some techniques to cut the graph

Labeled instancesSome heuristics, e.g., minimum cut

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 36 / 88

Page 43: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Graph-based semi-supervised learning

Graph construction

G =< V, E >, where V = {xi}ni=1

Build adjacency graph using a heuristic

ε-NN. ε ∈ R+. Nodes xi and xj are connected if dist(xi , xj) ≤ εk-NN. k ∈ N+. Nodes xi and xj are connected if xi is among the knearest neighbors of xj .

Graph weighting

Heat kernel. If xi and xj are connected, the weight Wij = exp−dist(xi ,xj )

t ,where t ∈ R+.Simple-minded. Wij = 1 if xi and xj are connected.

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 37 / 88

Page 44: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Graph-based semi-supervised learning

G =< V, E >Wij : weights on edge (xi , xj)

Dii =∑n

j=1 Wij

Graph Laplacian: L = D−W

Weighted graph Laplacian:

L = D−12 (D−W)D−

12

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 38 / 88

Page 45: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

1 Supervised case: not consider the datadistribution

2 How to include unlabeled data intothe prediction of class labels?

3 Connect the data points that are closeto each other

4 Propagate the class labels over theconnected graph

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 39 / 88

Page 46: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

1 Supervised case: not consider the datadistribution

2 How to include unlabeled data intothe prediction of class labels?

3 Connect the data points that are closeto each other

4 Propagate the class labels over theconnected graph

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 39 / 88

Page 47: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

1 Supervised case: not consider the datadistribution

2 How to include unlabeled data intothe prediction of class labels?

3 Connect the data points that are closeto each other

4 Propagate the class labels over theconnected graph

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 39 / 88

Page 48: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

Input:

Given adjacency matrix W , degreematrix D = diag(d1, . . . , dn),di =

∑j 6=i Wij

or normalized adjacency matrix:D−1/2WD−1/2

labels Yl

decay parameter: α

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 40 / 88

Page 49: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label Propagation

Initial class assignments y = {−1, 0,+1}n

yi =

{±1 ∀xi ∈ Xl

0 ∀xi ∈ Xu

Predicted class assignments1 Predict the confidence scores f = (f1, . . . , fn)2 Predict the class assignments yi = sign (fi )

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 41 / 88

Page 50: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

One round of propagation

fi =

{yi ∀xi ∈ Xl

α∑n

j=1 Wij yi ∀xi ∈ Xu

f(1) = y + αW y

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 42 / 88

Page 51: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

Two rounds of propagation

f(2) = f(1) + αW f(1)

= y + αW y + α2W 2y

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 43 / 88

Page 52: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

Any rounds of propagation

f(t) = y +t∑

k=1

αkW k y

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 43 / 88

Page 53: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Label propagation

Infinite rounds of propagation

f(∞) = y +∞∑k=1

αkW k y

Or equivalently

f(∞) = (I− αW )−1y

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 43 / 88

Page 54: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Graph partition

Key idea

Classification as graph partitioning

Search for a classification boundary

Consistent with labeled examplesPartition with small graph cut

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 44 / 88

Page 55: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Min-cuts

V+ : source, V−: sink

Infinite weights connecting sinks and sources

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 45 / 88

Page 56: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Min-cuts

Fix fl , search for fu to minimize∑n

i=1

∑nj=1 Wij(fi − fj)

2

Equivalently, solve

C(f ) =n∑

i=1

n∑j=1

Wij(fi − fj)2

4+∞

l∑i=1

(fi − yi )2

Loss function: ∞∑l

i=1(fi − yi )2 (constraint)

Combinatorial problem, but have polynomial time solution

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 46 / 88

Page 57: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic Function

Weight matrix W

membership function

fi =

{+1 ∀xi ∈ A−1 ∀xi ∈ B

Graph cut (energy function)

C(f ) =n∑

i=1

n∑j=1

Wij(fi − fj)2

4

=1

4f>(D−W)f =

1

4f>Lf

Graph Laplacian L = D−WPairwise relationships among dataManifold geometry of data

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 47 / 88

Page 58: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic Function

minf∈{−1,+1}n

C(f) =1

4f>Lf

s. t. fi = yi , i = 1, . . . , l

Challenge: combinatorial optimization?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 48 / 88

Page 59: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic Function

Relaxation to continuous space

minf∈Rn

C(f) =1

4f>Lf

s. t. fi = yi , i = 1, . . . , l

f (xi ) = yi for i = 1, . . . , l

f minimizes the energy function∑ni=1

∑nj=1

Wij (fi−fj )2

4

average of neighbors

f (xi ) =∑

j∼i Wij f (xj )∑j∼i Wij

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 48 / 88

Page 60: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic Function

An alternative algorithm

1 Fix f (xi ) = yi for xi ∈ Xl and initializef (xi ) = 0 for xi ∈ Xu

2 Repeat until convergence

f (xi ) =∑

j∼i Wij f (xj )∑j∼i Wij

for xi ∈ Xu

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 48 / 88

Page 61: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic Function

Analytical solution from the optimizationperspective

fu = −L−1u,uLu,lyl where

L =

[Ll ,l Ll ,uLu,l Lu,u

]f = (fl , fu)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 48 / 88

Page 62: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic function

Connection to label propagation (learning with local and globalconsistency)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 49 / 88

Page 63: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Harmonic function

Connection to label propagation (learning with local and globalconsistency)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 49 / 88

Page 64: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Manifold regularization

Manifold regularization is inductive

Define a function in a RKHS: f (x) = h(x) + b, h(x) ∈ Hk

Flexible loss function: e.g., the hinge loss

Regularizer prefers low energy f>Lf

minf

l∑i=1

(1− yi f (xi ))+ + λ1‖h‖Hk+ λ2f

>Lf

where

λ1 and λ2 are non-negative tradeoff constants

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 50 / 88

Page 65: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Application

Label propagation (learning with local and global consistency)

[Zhou et al., NIPS 2003]

20-newsgroups: autos,motorcycles, baseball, andhockey under rec

Pre-processing: stemming,remove stopwords & rare words,and skip header

#Docs: 3970, #word: 8014

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 51 / 88

Page 66: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Application

Label propagation (learning with local and global consistency)

[Wang et al., ACM MM 2004]

5,000 images

Relevance feedback for the top20 ranked images

Classification problem

Relevant or not?f (x): degree of relevanceLearning

SVM vs. Label propagation

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 51 / 88

Page 67: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Graph-based Semi-supervised Learning

Summary of graph-based methods

Construct a graph using pairwise similarity

Key quantity: graph Laplacian

Captures the geometry of the graph

Decision boundary is consistent

Graph structureLabeled examples

Parameters related to graph structure are important

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 52 / 88

Page 68: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Semi-supervised SVM

SVM

SVM with unlabeled data

Semi-supervised SVM(S3VM)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 53 / 88

Page 69: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Semi-supervised SVM

SVM

SVM with unlabeled data

Semi-supervised SVM(S3VM)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 53 / 88

Page 70: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Semi-supervised SVM

SVM

SVM with unlabeled data

Semi-supervised SVM(S3VM)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 53 / 88

Page 71: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Assumptions of semi-supervised SVM

Low Density Separation Assumption

The decision boundary should lie in a low-density region, that is thedecision boundary does not cut through dense unlabeled data.

Also known as cluster assumption

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 54 / 88

Page 72: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Semi-supervised SVM

S3VM: yu for unlabeled data as a free variable

S3VM

minw,b,ξ

minyu∈{−1,+1}n

‖w‖22 + C

n∑i=1

ξi

s. t. yi (w>xi + b) ≥ 1− ξi , i = 1, . . . , l

yi (w>xi + b) ≥ 1− ξi , i = l + 1, . . . , n

ξi ≥ 0, i = 1, . . . , n

No longer convex optimization problem

Alternating optimization

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 55 / 88

Page 73: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Semi-supervised SVM

Equivalently, unconstrained form:

S3VM

minf

minyu

‖w‖22 + Cl

l∑i=1

(1− yi f (xi ))+ + Cu

l+u∑i=l+1

(1− yi f (xi ))+

where (1− yi f (xi ))+ = max(0, 1− yi f (xi ))

Optimize over yu = (yul+1, . . . , yun ), we have

minyui

(1− yi f (xi ))+ = (1− sign (f (xi ))f (xi ))+ = (1− |f (xi )|)+

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 56 / 88

Page 74: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Semi-supervised SVM

S3VM objective

minf

‖w‖22 + Cl

l∑i=1

(1− yi f (xi ))+ + Cu

l+u∑i=l+1

(1− |f (xi )|)+

Non-convex problem

Optimization methods?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 57 / 88

Page 75: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Representative optimization methods for S3VM

label-switch-retraining [Joachims, 1999]

gradient descent [Chapelle and Zien, 2005]

continuation [Chapelle et al., 2006]

concave-convex procedure [Collobert et al, 2006]

semi-definite programming [Bie and Cristiannini, 2004; Xu et al.,2004; Xu et al., 2007]

deterministic annealing [Sindhwani et al., 2006]

branch-and-bound [Chapelle et al., 2006]

non-differentiable method [Astorino and Fuduli, 2007]

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 58 / 88

Page 76: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Experiments

Experimental data

Figure: Data sets.

Data and results are from [Chapelle et al., 2008]King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 59 / 88

Page 77: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Quality of performance

Quality of minimization

Figure: Average objective values.

Quality of prediction

Figure: Errors on unlabeled data.

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 60 / 88

Page 78: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Combine with graph-based methods

Figure: Errors on unlabeled data.

Seem to have better performance

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 61 / 88

Page 79: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Basics of Semi-supervised Learning Semi-supervised Support Vector Machine

Summary

Semi-supervised SVM

Based on maximum margin principle

Low density assumption

Extend SVM by pushing the decision boundary traversing low densityregions

Classification margin is decided by

Class labels assigned to unlabeled dataLabeled examples

Problem: non-convex optimization

Solvers: ∆S3VM, SVMlight, CCCP, etcNo one is the best?Sensitive to data

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 62 / 88

Page 80: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics

Outline

1 Basics of Semi-supervised LearningSemi-supervised LearningProbabilistic MethodsCo-trainingGraph-based Semi-supervised LearningSemi-supervised Support Vector Machine

2 Advanced TopicsTheory of semi-supervised learningAdvanced algorithms of semi-supervised learning

Variational settingLarge scale learning

3 An Empirical Example

4 Conclusion

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 63 / 88

Page 81: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Theory of semi-supervised learning

Theory of semi-supervised learning

Questions

Whether unlabeled data can help?

If yes, why unlabeled data can help?

If yes, how much unlabeled do we need?

Which assumption of SSL should we take?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 64 / 88

Page 82: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Theory of semi-supervised learning

Theory of semi-supervised learning

Complexity analysis of SSL

E.g., Augmented PAC model (Balcan & Blum, 2008)Analyze the compatibility between data distributions and learningfunctionsUnlabeled data are potentially helpful in estimating the compatibilityBounds for the number of unlabeled data are proposed

Assumption analysis in SSL

E.g., manifold or smoothness? (Lafferty & Wasserman, 2007)Manifold assumption is more important than smoothness assumption

Whether or to what extent unlabeled data can help?

E.g., Finite sample analysis (Singh, Nowak, & Zhu, 2008)Bridge the gap between theory analysis and practiceCases where SSL can help are given based on the analysis of margins

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 65 / 88

Page 83: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Theory of semi-supervised learning

Theory of semi-supervised learning

Complexity analysis of SSL

E.g., Augmented PAC model (Balcan & Blum, 2008)Analyze the compatibility between data distributions and learningfunctionsUnlabeled data are potentially helpful in estimating the compatibilityBounds for the number of unlabeled data are proposed

Assumption analysis in SSL

E.g., manifold or smoothness? (Lafferty & Wasserman, 2007)Manifold assumption is more important than smoothness assumption

Whether or to what extent unlabeled data can help?

E.g., Finite sample analysis (Singh, Nowak, & Zhu, 2008)Bridge the gap between theory analysis and practiceCases where SSL can help are given based on the analysis of margins

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 65 / 88

Page 84: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Theory of semi-supervised learning

Theory of semi-supervised learning

Complexity analysis of SSL

E.g., Augmented PAC model (Balcan & Blum, 2008)Analyze the compatibility between data distributions and learningfunctionsUnlabeled data are potentially helpful in estimating the compatibilityBounds for the number of unlabeled data are proposed

Assumption analysis in SSL

E.g., manifold or smoothness? (Lafferty & Wasserman, 2007)Manifold assumption is more important than smoothness assumption

Whether or to what extent unlabeled data can help?

E.g., Finite sample analysis (Singh, Nowak, & Zhu, 2008)Bridge the gap between theory analysis and practiceCases where SSL can help are given based on the analysis of margins

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 65 / 88

Page 85: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

New directions of semi-supervised learning

Probabilistic methods: hybrids of generative models anddiscriminative models (Lasserre et. al, 2006; Fujino et. al, 2008)

Extensions of multiview learning: view disagreement, structuredoutput, information theoretic framework

Graph-based methods: how to construct the graph?

Semi-supervised SVM: new optimization methods?

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 66 / 88

Page 86: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Extensions

Variational settings of SSL (different distributions)

Variance shift for test dataUnlabeled data are irrelevantMixture of relevant and irrelevant unlabeled data

Scalability

Online learning, e.g., online manifold regularizationEfficient optimization algorithms, like CCCP

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 67 / 88

Page 87: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Extensions

Variational settings of SSL (different distributions)

Variance shift for test dataUnlabeled data are irrelevantMixture of relevant and irrelevant unlabeled data

Scalability

Online learning, e.g., online manifold regularizationEfficient optimization algorithms, like CCCP

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 67 / 88

Page 88: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Variational settings of SSL (different distributions)

Labeled

Unlabeled

Variance-shifted

Drawn from a variance-drifteddistribution

Share the same label withlabeled data

Learning under covariance shiftor sample bias correction

E.g., [Shimodaira et al., 2000],[Zadrozny et al., 2004]

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 68 / 88

Page 89: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Learning with irrelevant data

UnlabeledIrrelevant

Unlabeled data are irrelevantdata or background data

Share no common labels

Learning with universum

E.g., [Weston et al., 2006]

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 69 / 88

Page 90: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

SSL with mixture of relevant and irrelevant data

Unlabeled

Mixture

Relevant mixed with others

Semi-supervised learning from amixture

E.g., [Zhang et al., 2008],[Huang et al., 2008]

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 70 / 88

Page 91: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Large scale semi-supervised learning

Perspective:

Efficient algorithms

Online learning

Examples arrive sequentially, no need to store them all

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 71 / 88

Page 92: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Online semi-supervised learning

Online semi-supervised learning:

1 At time t, adversary picks xt ∈ X , yt ∈ Y shows xt2 Learner builds a classifier ft : X → R, and predicts ft(xt)

3 With small probability, adversary reveals yt4 Learner updates to ft+1 based on xt and yt (if given)

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 72 / 88

Page 93: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Online manifold regularization

Bach mode manifold regularization

J (f ) =1

lδ(yt)`(f (xt , yt)) +

λ1

2‖f ‖2H

2T

T∑i=1

T∑j=1

(f (xi )− f (xj))2Wij

δ(yt): indicator of whether xt is labeled

Instantaneous risk

Jt(f ) =T

lδ(yt)`(f (xt , yt)) +

λ1

2‖f ‖2H

+λ2

T∑i=1

(f (xi )− f (xt))2Wij

Involves graph edges between xt and all previous examples

J (f ) =∑T

t=1 Jt(f )

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 73 / 88

Page 94: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Advanced Topics Advanced algorithms of semi-supervised learning

Online manifold regularization

Use gradient descent to update

ft+1 = ft − ηt∂Jt(f )

∂f| ft

ηt = 1/√

(t)Iteratively update

1 ft =∑t−1

i=1 α(t)i K (xi , )

2 update α(t+1) by

α(t+1)i = (1− ηtλ1)α

(t)i − 2ηtλ2(ft(xi )− ft(xt))Wi,t , i = 1, . . . , t − 1

α(t+1)t = 2ηtλ2

t−1∑i=1

(ft(xi )− ft(xt))Wi,t − ηtT

lδ(yt)`

′(f (xt , yt))

Space O(T ): stores all previous examples

Time O(T 2): each new instance connects to all previous ones

Can be further reduced by approximation techniques

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 74 / 88

Page 95: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Outline

1 Basics of Semi-supervised LearningSemi-supervised LearningProbabilistic MethodsCo-trainingGraph-based Semi-supervised LearningSemi-supervised Support Vector Machine

2 Advanced TopicsTheory of semi-supervised learningAdvanced algorithms of semi-supervised learning

Variational settingLarge scale learning

3 An Empirical Example

4 Conclusion

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 75 / 88

Page 96: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Privacy exposure in social networks

Number of users in Facebook

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 76 / 88

Page 97: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Privacy exposure in social networks

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 76 / 88

Page 98: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Privacy exposure in social networks

User profiles are not complete

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 77 / 88

Page 99: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Privacy exposure in social networks

Friends (linked persons) may share similar property

Information of friends may expose his information

⇓How much of these context information can be exposed?

⇓Semi-supervised methods seem to suite our scenario

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 78 / 88

Page 100: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

Objective: to expose which university a user comes from

Methods: SSL framework

Datasets: real-world data from Facebook and StudiVZ

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 79 / 88

Page 101: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

Objective: to expose which university a user comes from

Methods: SSL framework

Datasets: real-world data from Facebook and StudiVZ

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 80 / 88

Page 102: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 81 / 88

Page 103: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 81 / 88

Page 104: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

Feature Selection

Users’ profile: top 3 completenessRelational information

Data Translation

Missing Value: average valueSimilarity: cosine similarity

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 82 / 88

Page 105: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 83 / 88

Page 106: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Experiment

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 83 / 88

Page 107: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

An Empirical Example

Summary

Learn hidden users’ attributes based on relational information andprofile similarity among users

SSL predicts sensitive information more accurately than supervisedlearning

Users’ security is never secure and protections are needed

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 84 / 88

Page 108: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Conclusion

Outline

1 Basics of Semi-supervised LearningSemi-supervised LearningProbabilistic MethodsCo-trainingGraph-based Semi-supervised LearningSemi-supervised Support Vector Machine

2 Advanced TopicsTheory of semi-supervised learningAdvanced algorithms of semi-supervised learning

Variational settingLarge scale learning

3 An Empirical Example

4 Conclusion

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 85 / 88

Page 109: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Conclusion

Conclusion

Presented

A brief introduction to semi-supervised learning

Generative modelsCo-trainingGraph-based methodsSemi-supervised support vector machine

Advance topics in semi-supervised learning

An empirical evaluation of semi-supervised learning in online socialnetwork analysis

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 86 / 88

Page 110: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Conclusion

References and therein

1 O. Chapelle, B. Scholkopf, and A. Zien. Semi-Supervised Learning. MIT Press,Cambridge, MA, 2006.

2 O. Chapelle and A. Zien. Semi-supervised classification by low density separation. TenthInternational Workshop on Artificial Intelligence and Statistics, 2005.

3 R. Collobert, F. Sinz, J. Weston, and L. Bottou. Large scale transductive SVMs. Journalof Machine Learning Reseaerch, 2006.

4 T. Joachims. Transductive inference for text classification using support vector machines,ICML 1999.

5 X. Zhu. Semi-supervised learning literature survey. Technical report, Computer Sciences,University of Wisconsin-Madison, 2005.

6 X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-supervised learning using gaussian fieldsand harmonic functions, ICML 2003

7 Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi-Supervised Learning. Morgan& Claypool, 2009.

8 http://pages.cs.wisc.edu/~jerryzhu/pub/sslchicago09.pdf

9 http://www.cse.msu.edu/~cse847/slides/semisupervised-1.ppt

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 87 / 88

Page 111: Basics and Advances of Semi-supervised Learning · 2011-11-13 · Basics and Advances of Semi-supervised Learning Irwin King1 and Zenglin Xu2 1Computer Science and Engineering The

Conclusion

QA

Thanks for your attention!

King & Xu (CUHK & Purdue) Semi-supervised Learning ICONIP 2011 88 / 88