relaxed transfer of different classes via spectral partition

23
1 Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center 3 Hong Kong University of Science and Technology 4 Sun Yat-sen University 1. Unsupervised 2. Can use data with different classes to help. How so?

Upload: teresa

Post on 14-Jan-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Relaxed Transfer of Different Classes via Spectral Partition. Unsupervised Can use data with different classes to help. How so?. Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Relaxed Transfer of  Different Classes  via Spectral Partition

1

Relaxed Transfer of Different Classes

via Spectral Partition

Xiaoxiao Shi1 Wei Fan2 Qiang Yang3 Jiangtao Ren4

1 University of Illinois at Chicago2 IBM T. J. Watson Research Center

3 Hong Kong University of Science and Technology4 Sun Yat-sen University

1. Unsupervised2. Can use data with different classes to help. How

so?

Page 2: Relaxed Transfer of  Different Classes  via Spectral Partition

22

What is Transfer Learning?

New York Times

training (labeled)

test (unlabeled)

Classifier

New York Times

85.5%

Standard Supervised Learning

Page 3: Relaxed Transfer of  Different Classes  via Spectral Partition

33New York Times

training (labeled)

test (unlabeled)

New York Times

Labeled data are insufficient!

47.3%

How to improve the

performance?

In Reality…

What is Transfer Learning?

Page 4: Relaxed Transfer of  Different Classes  via Spectral Partition

44

What is Transfer Learning?

Reuters

Source domaintraining (labeled)

Target domaintest (unlabeled)

Transfer Classifier

New York Times

82.6%

Not necessary from the same domain and do not follow the same distribution

Page 5: Relaxed Transfer of  Different Classes  via Spectral Partition

5

Reuters

Source domaintraining (labeled)

Target domaintest (unlabeled)

Transfer Classifier

New York Times

82.6%

Since they are from different domains,they may have different class labels!

Labels:

MarketsPolitics

EntertainmentBlogs……

Labels:

WorldU. S.

Fashion StyleTravel……

How to transfer when class labels

are different?in number and meaning

Transfer across Different Class Labels

Page 6: Relaxed Transfer of  Different Classes  via Spectral Partition

6

Two Main Categories of Transfer Learning

• Unsupervised Transfer Learning– Do not have any labeled data from the target domain.– Use source domain to help learning.– Question: is it better than clustering?

• Supervised Transfer Learning– Have limited number of labeled examples from target

domain– Is it better than not using any source data example?

Page 7: Relaxed Transfer of  Different Classes  via Spectral Partition

7

• Two sub-problems:– (1) What and how to transfer, since we can

not explicitly use P(x|y) or P(y|x) to build the similarity among tasks (class labels ‘y’ have different meanings)?

– (2) How to avoid negative transfer since the tasks may be from very different domains?

Negative Transfer: when the tasks are too different, transfer learning may hurt learning accuracy.

Transfer across Different Class Labels

Page 8: Relaxed Transfer of  Different Classes  via Spectral Partition

8

The proposed solution

• (1) What and How to transfer?– Transfer the eigensapce

Eigenspace: space expended by a set of eigen vectors.-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Dataset exhibits complex Dataset exhibits complex cluster shapescluster shapes

K-means performs very K-means performs very poorly in this space due poorly in this space due bias toward dense bias toward dense spherical clusters.spherical clusters.

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706In the In the eigenspace eigenspace (space given by the (space given by the eigenvectors),eigenvectors), clusters clusters are trivial to separate.are trivial to separate.-- Spectral Clustering-- Spectral Clustering

Page 9: Relaxed Transfer of  Different Classes  via Spectral Partition

9

Page 10: Relaxed Transfer of  Different Classes  via Spectral Partition

10

• (2) How to avoid negative transfer?– A new clustering-based KL Divergence to reflect

distribution differences.– If distributions are too different (KL is large),

automatically decrease the effect from source domain.

The proposed solution

Traditional KL Divergence

Need to solve P(x), Q(x) for every x, which is normally difficult to obtain.

To get the Clustering-based KL divergence:(1) Perform Clustering on the combined dataset.(2) Calculate the KL divergence by some basic

statistical properties of the clusters. See Example.

Page 11: Relaxed Transfer of  Different Classes  via Spectral Partition

11

An Example

P Q

C1

C2

Clustering S(P’, C1)

S(Q’, C1)

S(P’, C2)

S(Q’, C2)CombinedDataset

For example, S(P’, C) means “the portion of examples in P that are contained in cluster C ”.

= 0.5

the portion of examples in P that are contained in

cluster C1

the portion of examples in Q that

are contained in cluster C1

= 0.5

=5/9

=4/9

the portion of examples in P that are contained in

cluster C2the portion of

examples in Q that are contained in

cluster C2

E(P)=8/15E(Q)=7/15

P’(C1)=3/15Q’(C1)=3/15P’(C2)=5/15Q’(C2)=4/15

KL=0.0309

Page 12: Relaxed Transfer of  Different Classes  via Spectral Partition

12

Objective Function

• Objective: Find an eigenspace that well separates the target data– Intuition: If the source data is similar to the target data,

make good use of the source eigenspace;– Otherwise, keep the original structure of the target data.

Prefer Source

Eigenspace

Prefer Original

Structure

Balanced by R(L; U)More similar of distributions, less is R(L; U), more the function will rely on source eigenspace TL

TraditionalNormalized Cut

Penalty Term

Page 13: Relaxed Transfer of  Different Classes  via Spectral Partition

13

How to construct constraint TL and Tu?

• Principle:

– To construct TL --- it is directly derived from the “must-link” constraint (the examples with the same label should be together).

– To construct TU --- (1) Perform standard spectral clustering (e.g., Ncut) on U. (2) the examples in the same cluster should be together.

1 4

2

3 56

1, 2, 4 should be together (blue);

3, 5, 6 should be together (red)

1 4

2

3 56

1, 2, 3 should be together;

4, 5, 6 should be together

Page 14: Relaxed Transfer of  Different Classes  via Spectral Partition

14

How to construct constraint TL and Tu?

• Construct the constraint matrix M=[m1, m2, …, mr]’

For example,

1 4

2

3 56

1, -1, 0, 0, 0, 0

1, 0, 0, -1, 0, 0

0, 0, 1, 0, -1, 0

……

T

ML =

1 and 2

1 and 4

3 and 5

Page 15: Relaxed Transfer of  Different Classes  via Spectral Partition

1515

Experiment Data sets

Page 16: Relaxed Transfer of  Different Classes  via Spectral Partition

16

Experiment data sets

Page 17: Relaxed Transfer of  Different Classes  via Spectral Partition

17

Text Classification

Comp1 VS

Rec1

1: comp2 VS Rec2 2: 4 classes (Graphics, etc) 3: 3 classes (crypt, etc)

1: org2 VS People2 2: 3 classes (Places, etc) 3: 3 classes (crypt, etc)

Org1VS

People1

40%

60%

80%

100%

120%

1 2 3

Ful l Transf er No Transf er RSP

50%

60%

70%

80%

90%

1 2 3

Ful l Transf er No Transf er RSP

Page 18: Relaxed Transfer of  Different Classes  via Spectral Partition

18

Image Classification

HomerVS

Real Bear

CartmanVS

Fern

1: Superman VS Teddy 2: 3 classes (cartman, etc) 3: 4 classes (laptop, etc)

1: Superman VS Bonsai 2: 3 classes (homer, etc) 3: 4 classes (laptop, etc)

50%

60%

70%

80%

90%

1 2 3

Ful l Transf er No Transf er RSP

50%

60%

70%

80%

90%

100%

1 2 3

Ful l Transf er No Transf er RSP

Page 19: Relaxed Transfer of  Different Classes  via Spectral Partition

19

Parameter Sensitivity

Page 20: Relaxed Transfer of  Different Classes  via Spectral Partition

2020

• Problem: Transfer across tasks with different class labels

• Two sub-problems:• (1) What and How to transfer?

• Transfer the eigenspace.• (2) How to avoid negative transfer?

• Propose an effective clustering-based KL Divergence; if KL is large, or distributions are too different, decrease the effect from source domain.

Conclusions

Page 21: Relaxed Transfer of  Different Classes  via Spectral Partition

2121

Thanks!

Datasets and codes: http://www.cs.columbia.edu/~wfan/software.htm

Page 22: Relaxed Transfer of  Different Classes  via Spectral Partition

22

# Clusters?Condition for Lemma 1 to be valid: In each cluster, the expected values of the target and source data are about the same.

>If

Adaptively Control the #Clusters to guarantee Lemma 1 valid!

--Stop bisecting clustering when there is only target/source data in the cluster, or

where is close to 0.

Page 23: Relaxed Transfer of  Different Classes  via Spectral Partition

23

Optimization

Let

Algorithm flow

Then,