relaxed transfer of different classes via spectral partition
DESCRIPTION
Relaxed Transfer of Different Classes via Spectral Partition. Unsupervised Can use data with different classes to help. How so?. Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2 IBM T. J. Watson Research Center - PowerPoint PPT PresentationTRANSCRIPT
1
Relaxed Transfer of Different Classes
via Spectral Partition
Xiaoxiao Shi1 Wei Fan2 Qiang Yang3 Jiangtao Ren4
1 University of Illinois at Chicago2 IBM T. J. Watson Research Center
3 Hong Kong University of Science and Technology4 Sun Yat-sen University
1. Unsupervised2. Can use data with different classes to help. How
so?
22
What is Transfer Learning?
New York Times
training (labeled)
test (unlabeled)
Classifier
New York Times
85.5%
Standard Supervised Learning
33New York Times
training (labeled)
test (unlabeled)
New York Times
Labeled data are insufficient!
47.3%
How to improve the
performance?
In Reality…
What is Transfer Learning?
44
What is Transfer Learning?
Reuters
Source domaintraining (labeled)
Target domaintest (unlabeled)
Transfer Classifier
New York Times
82.6%
Not necessary from the same domain and do not follow the same distribution
5
Reuters
Source domaintraining (labeled)
Target domaintest (unlabeled)
Transfer Classifier
New York Times
82.6%
Since they are from different domains,they may have different class labels!
Labels:
MarketsPolitics
EntertainmentBlogs……
Labels:
WorldU. S.
Fashion StyleTravel……
How to transfer when class labels
are different?in number and meaning
Transfer across Different Class Labels
6
Two Main Categories of Transfer Learning
• Unsupervised Transfer Learning– Do not have any labeled data from the target domain.– Use source domain to help learning.– Question: is it better than clustering?
• Supervised Transfer Learning– Have limited number of labeled examples from target
domain– Is it better than not using any source data example?
7
• Two sub-problems:– (1) What and how to transfer, since we can
not explicitly use P(x|y) or P(y|x) to build the similarity among tasks (class labels ‘y’ have different meanings)?
– (2) How to avoid negative transfer since the tasks may be from very different domains?
Negative Transfer: when the tasks are too different, transfer learning may hurt learning accuracy.
Transfer across Different Class Labels
8
The proposed solution
• (1) What and How to transfer?– Transfer the eigensapce
Eigenspace: space expended by a set of eigen vectors.-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Dataset exhibits complex Dataset exhibits complex cluster shapescluster shapes
K-means performs very K-means performs very poorly in this space due poorly in this space due bias toward dense bias toward dense spherical clusters.spherical clusters.
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706In the In the eigenspace eigenspace (space given by the (space given by the eigenvectors),eigenvectors), clusters clusters are trivial to separate.are trivial to separate.-- Spectral Clustering-- Spectral Clustering
9
10
• (2) How to avoid negative transfer?– A new clustering-based KL Divergence to reflect
distribution differences.– If distributions are too different (KL is large),
automatically decrease the effect from source domain.
The proposed solution
Traditional KL Divergence
Need to solve P(x), Q(x) for every x, which is normally difficult to obtain.
To get the Clustering-based KL divergence:(1) Perform Clustering on the combined dataset.(2) Calculate the KL divergence by some basic
statistical properties of the clusters. See Example.
11
An Example
P Q
C1
C2
Clustering S(P’, C1)
S(Q’, C1)
S(P’, C2)
S(Q’, C2)CombinedDataset
For example, S(P’, C) means “the portion of examples in P that are contained in cluster C ”.
= 0.5
the portion of examples in P that are contained in
cluster C1
the portion of examples in Q that
are contained in cluster C1
= 0.5
=5/9
=4/9
the portion of examples in P that are contained in
cluster C2the portion of
examples in Q that are contained in
cluster C2
E(P)=8/15E(Q)=7/15
P’(C1)=3/15Q’(C1)=3/15P’(C2)=5/15Q’(C2)=4/15
KL=0.0309
12
Objective Function
• Objective: Find an eigenspace that well separates the target data– Intuition: If the source data is similar to the target data,
make good use of the source eigenspace;– Otherwise, keep the original structure of the target data.
Prefer Source
Eigenspace
Prefer Original
Structure
Balanced by R(L; U)More similar of distributions, less is R(L; U), more the function will rely on source eigenspace TL
TraditionalNormalized Cut
Penalty Term
13
How to construct constraint TL and Tu?
• Principle:
– To construct TL --- it is directly derived from the “must-link” constraint (the examples with the same label should be together).
– To construct TU --- (1) Perform standard spectral clustering (e.g., Ncut) on U. (2) the examples in the same cluster should be together.
1 4
2
3 56
1, 2, 4 should be together (blue);
3, 5, 6 should be together (red)
1 4
2
3 56
1, 2, 3 should be together;
4, 5, 6 should be together
14
How to construct constraint TL and Tu?
• Construct the constraint matrix M=[m1, m2, …, mr]’
For example,
1 4
2
3 56
1, -1, 0, 0, 0, 0
1, 0, 0, -1, 0, 0
0, 0, 1, 0, -1, 0
……
T
ML =
1 and 2
1 and 4
3 and 5
1515
Experiment Data sets
16
Experiment data sets
17
Text Classification
Comp1 VS
Rec1
1: comp2 VS Rec2 2: 4 classes (Graphics, etc) 3: 3 classes (crypt, etc)
1: org2 VS People2 2: 3 classes (Places, etc) 3: 3 classes (crypt, etc)
Org1VS
People1
40%
60%
80%
100%
120%
1 2 3
Ful l Transf er No Transf er RSP
50%
60%
70%
80%
90%
1 2 3
Ful l Transf er No Transf er RSP
18
Image Classification
HomerVS
Real Bear
CartmanVS
Fern
1: Superman VS Teddy 2: 3 classes (cartman, etc) 3: 4 classes (laptop, etc)
1: Superman VS Bonsai 2: 3 classes (homer, etc) 3: 4 classes (laptop, etc)
50%
60%
70%
80%
90%
1 2 3
Ful l Transf er No Transf er RSP
50%
60%
70%
80%
90%
100%
1 2 3
Ful l Transf er No Transf er RSP
19
Parameter Sensitivity
2020
• Problem: Transfer across tasks with different class labels
• Two sub-problems:• (1) What and How to transfer?
• Transfer the eigenspace.• (2) How to avoid negative transfer?
• Propose an effective clustering-based KL Divergence; if KL is large, or distributions are too different, decrease the effect from source domain.
Conclusions
2121
Thanks!
Datasets and codes: http://www.cs.columbia.edu/~wfan/software.htm
22
# Clusters?Condition for Lemma 1 to be valid: In each cluster, the expected values of the target and source data are about the same.
>If
Adaptively Control the #Clusters to guarantee Lemma 1 valid!
--Stop bisecting clustering when there is only target/source data in the cluster, or
where is close to 0.
23
Optimization
Let
Algorithm flow
Then,