![Page 1: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/1.jpg)
Knowledge Transfer via Multiple Model Local Structure Mapping
Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han†
†University of Illinois at Urbana-Champaign‡IBM T. J. Watson Research Center
KDD’08 Las Vegas, NV
![Page 2: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/2.jpg)
2/49
Outline• Introduction to transfer learning• Related work
– Sample selection bias– Semi-supervised learning– Multi-task learning– Ensemble methods
• Learning from one or multiple source domains– Locally weighted ensemble framework– Graph-based heuristic
• Experiments• Conclusions
![Page 3: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/3.jpg)
3/49
Standard Supervised Learning
New York Times
training (labeled)
test (unlabeled)
Classifier 85.5%
New York Times
Ack. From Jing Jiang’s slides
![Page 4: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/4.jpg)
4/49
In Reality……
New York Times
training (labeled)
test (unlabeled)
Classifier 64.1%
New York Times
Labeled data not available!Reuters
Ack. From Jing Jiang’s slides
![Page 5: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/5.jpg)
5/49
Domain Difference Performance Droptrain test
NYT NYT
New York Times New York Times
Classifier 85.5%
Reuters NYT
Reuters New York Times
Classifier 64.1%
ideal setting
realistic setting
Ack. From Jing Jiang’s slides
![Page 6: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/6.jpg)
6/49
Other Examples• Spam filtering
– Public email collection personal inboxes
• Intrusion detection– Existing types of intrusions unknown types of intrusions
• Sentiment analysis– Expert review articles blog review articles
• The aim– To design learning methods that are aware of the training and
test domain difference
• Transfer learning– Adapt the classifiers learnt from the source domain to the new
domain
![Page 7: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/7.jpg)
7/49
Outline• Introduction to transfer learning• Related work
– Sample selection bias– Semi-supervised learning– Multi-task learning– Ensemble methods
• Learning from one or multiple source domains– Locally weighted ensemble framework– Graph-based heuristic
• Experiments• Conclusions
![Page 8: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/8.jpg)
8/49
Sample Selection Bias (Covariance Shift)
• Motivating examples– Load approval– Drug testing– Training set: customers participating in the trials– Test set: the whole population
• Problems– Training and test distributions differ in P(x), but not i
n P(y|x)
– But the difference in P(x) still affects the learning performance
![Page 9: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/9.jpg)
9/49
Sample Selection Bias (Covariance Shift)
Unbiased 96.405% Biased 92.7%
Ack. From Wei Fan’s slides
![Page 10: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/10.jpg)
10/49
Sample Selection Bias (Covariance Shift)
• Existing work– Reweight training examples according to the
distribution difference and maximize the re-weighted likelihood
– Estimate the probability of a observation being selected into the training set and use this probability to improve the model
– Use P(x,y) to make predictions instead of using P(y|x)
![Page 11: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/11.jpg)
11/49
Semi-supervised Learning (Transductive Learning)
Labeled Data
Unlabeled Data
Test setModel
• Applications and problems– Labeled examples are scarce but unlabeled data a
re abundant– Web page classification, review ratings prediction
Transductive
![Page 12: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/12.jpg)
12/49
Semi-supervised Learning (Transductive Learning)
• Existing work– Self-training
• Give labels to unlabeled data
– Generative models• Unlabeled data help get better estimates of the parameters
– Transductive SVM• Maximize the unlabeled data margin
– Graph-based algorithms• Construct a graph based on labeled and unlabeled data, pr
opagate labels along the paths
– Distance learning• Map the data into a different feature space where they coul
d be better separated
![Page 13: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/13.jpg)
13/49
Learning from Multiple Domains
• Multi-task learning– Learn several related tasks at the same time
with shared representations– Single P(x) but multiple output variables
• Transfer learning– Two stage domain adaptation: select genera
lizable features from training domains and specific features from test domain
![Page 14: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/14.jpg)
14/49
Ensemble Methods
• Improve over single models– Bayesian model averaging– Bagging, Boosting, Stacking– Our studies show their effectiveness in strea
m classification
• Model weights– Usually determined globally– Reflect the classification accuracy on the trai
ning set
![Page 15: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/15.jpg)
15/49
Ensemble Methods
• Transfer learning– Generative models:
• Traing and test data are generated from a mixture of different models
• Use Dirichlet Process prior to couple the parameters of several models from the same parameterized family of distributions
– Non-parametric models• Boost the classifier with labeled examples which
represent the true test distribution
![Page 16: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/16.jpg)
16/49
Outline• Introduction to transfer learning• Related work
– Sample selection bias– Semi-supervised learning– Multi-task learning
• Learning from one or multiple source domains– Locally weighted ensemble framework– Graph-based heuristic
• Experiments• Conclusions
![Page 17: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/17.jpg)
17/49
All Sources of Labeled Information
training (labeled)
test (completely unlabel
ed)
Classifier
New York Times
Reuters
Newsgroup
…… ?
![Page 18: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/18.jpg)
18/49
A Synthetic Example
Training(have conflicting concepts)
Test
Partially overlapping
![Page 19: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/19.jpg)
19/49
Goal
SourceDomain Target
Domain
SourceDomain
SourceDomain
• To unify knowledge that are consistent with the test domain from multiple source domains (models)
![Page 20: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/20.jpg)
20/49
Summary of Contributions
• Transfer from one or multiple source domains– Target domain has no labeled examples
• Do not need to re-train– Rely on base models trained from each dom
ain– The base models are not necessarily develo
ped for transfer learning applications
![Page 21: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/21.jpg)
21/49
Locally Weighted Ensemble
),( yxf k
k
i
iiE yxfxwyxf1
),()(),(
),(2 yxf
M1
M2
Mk
……
Training set 1),(1 yxf
),|(),( ii MxyYPyxf
),(maxarg| yxfxy Ey
Test example xTraining set 2
Training set k
……
)(1 xw
)(2 xw
)(xwk
k
i
i xw1
1)(
x-feature value y-class label
Training set
![Page 22: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/22.jpg)
22/49
Modified Bayesian Model Averaging
M1
M2
Mk
……
Test set
),|( iMxyP
)|( DMP i
k
iii MxyPDMPxyP
1
),|()|()|(
Bayesian Model Averaging
M1
M2
Mk
……
Test set
Modified for Transfer Learning
),|( iMxyP)|( xMP i
k
iii MxyPxMPxyP
1
),|()|()|(
![Page 23: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/23.jpg)
23/49
Global versus Local Weights
2.40 5.23-2.69 0.55-3.97 -3.622.08 -3.735.08 2.151.43 4.48……
x y
100001…
M1
0.60.40.20.10.61…
M2
0.90.60.40.10.30.2…
wg
0.30.30.30.30.30.3…
wl
0.20.60.70.50.31…
wg
0.70.70.70.70.70.7…
wl
0.80.40.30.50.70…
• Locally weighting scheme– Weight of each model is computed per example– Weights are determined according to models’ pe
rformance on the test set, not training set
Training
![Page 24: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/24.jpg)
24/49
Synthetic Example Revisited
Training(have conflicting concepts)
Test
Partially overlapping
M1 M2
M1 M 2
![Page 25: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/25.jpg)
25/49
Optimal Local Weights
C1
C2
Test example x
0.9 0.1
0.4 0.6
0.8 0.2
Higher Weight
• Optimal weights– Solution to a regression problem
0.9 0.4
0.1 0.6
w1
w2=
0.8
0.2
k
i
i xw1
1)(
H w f
![Page 26: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/26.jpg)
26/49
Approximate Optimal Weights
• How to approximate the optimal weights– M should be assigned a higher weight at x if P(y|M,x)
is closer to the true P(y|x)• Have some labeled examples in the target domain
– Use these examples to compute weights• None of the examples in the target domain are labeled
– Need to make some assumptions about the relationship between feature values and class labels
• Optimal weights– Impossible to get since f is unknown!
![Page 27: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/27.jpg)
27/49
Clustering-Manifold Assumption
Test examples that are closer in feature space are more likely to share the same class label.
![Page 28: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/28.jpg)
28/49
Graph-based Heuristics• Graph-based weights approximation
– Map the structures of models onto test domain
Clustering Structure
M1M2
weight on x
![Page 29: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/29.jpg)
29/49
Graph-based Heuristics
• Local weights calculation– Weight of a model is proportional to the similarity
between its neighborhood graph and the clustering structure around x.
Higher Weight
![Page 30: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/30.jpg)
30/49
Local Structure Based Adjustment• Why adjustment is needed?
– It is possible that no models’ structures are similar to the clustering structure at x
– Simply means that the training information are conflicting with the true target distribution at x
Clustering Structure
M1M2
ErrorError
![Page 31: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/31.jpg)
31/49
Local Structure Based Adjustment• How to adjust?
– Check if is below a threshold– Ignore the training information and propagate the labels of
neighbors in the test set to x
Clustering Structure
M1M2
![Page 32: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/32.jpg)
32/49
Verify the Assumption
• Need to check the validity of this assumption– Still, P(y|x) is unknown– How to choose the appropriate clustering algorithm
• Findings from real data sets– This property is usually determined by the nature o
f the task– Positive cases: Document categorization– Negative cases: Sentiment classification– Could validate this assumption on the training set
![Page 33: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/33.jpg)
33/49
Algorithm
Check Assumption
Neighborhood Graph Construction
Model Weight Computation
Weight Adjustment
![Page 34: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/34.jpg)
34/49
Outline• Introduction to transfer learning• Related work
– Sample selection bias– Semi-supervised learning– Multi-task learning
• Learning from one or multiple source domains– Locally weighted ensemble framework– Graph-based heuristic
• Experiments• Conclusions
![Page 35: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/35.jpg)
35/49
Data Sets
• Different applications
– Synthetic data sets– Spam filtering: public email collection personal inb
oxes (u01, u02, u03) (ECML/PKDD 2006)– Text classification: same top-level classification probl
ems with different sub-fields in the training and test sets (Newsgroup, Reuters)
– Intrusion detection data: different types of intrusions in training and test sets.
![Page 36: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/36.jpg)
36/49
Baseline Methods• Baseline Methods
– One source domain: single models • Winnow (WNN), Logistic Regression (LR), Support Vect
or Machine (SVM)• Transductive SVM (TSVM)
– Multiple source domains:• SVM on each of the domains• TSVM on each of the domains
– Merge all source domains into one: ALL• SVM, TSVM
– Simple averaging ensemble: SMA– Locally weighted ensemble without local structure based adj
ustment: pLWE– Locally weighted ensemble: LWE
• Implementation– Classification: SNoW, BBR, LibSVM, SVMlight– Clustering: CLUTO package
![Page 37: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/37.jpg)
37/49
Performance Measure
• Prediction Accuracy– 0-1 loss: accuracy– Squared loss: mean squared error
• Area Under ROC Curve (AUC)
– Tradeoff between true positive rate and false positive rate– Should be 1 ideally
![Page 38: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/38.jpg)
38/49
A Synthetic Example
Training(have conflicting concepts)
Test
Partially overlapping
![Page 39: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/39.jpg)
39/49
Experiments on Synthetic Data
![Page 40: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/40.jpg)
40/49
Spam Filtering
• Problems– Training set: p
ublic emails– Test set: pers
onal emails from three users: U00, U01, U02
pLWE
LR
SVM
SMA
TSVM
WNN
LWE
pLWE
LR
SVM
SMA
TSVM
WNN
LWE
Accuracy
MSE
![Page 41: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/41.jpg)
41/49
20 Newsgroup
C vs S
R vs T
R vs S
C vs T
C vs R
S vs T
![Page 42: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/42.jpg)
42/49
pLWE
LR
SVM
SMA
TSVM
WNN
LWE
Acc
pLWE
LR
SVM
SMA
TSVM
WNN
LWE
MSE
![Page 43: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/43.jpg)
43/49
Reuters
pLWE
LR
SVM
SMA
TSVM
WNN
LWE
pLWE
LR
SVM
SMA
TSVM
WNN
LWE
Accuracy
MSE
• Problems– Orgs vs Peopl
e (O vs Pe)– Orgs vs Place
s (O vs Pl)– People vs Pla
ces (Pe vs Pl)
![Page 44: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/44.jpg)
44/49
Intrusion Detection
• Problems (Normal vs Intrusions)– Normal vs R2L (1)– Normal vs Probing (2)– Normal vs DOS (3)
• Tasks– 2 + 1 -> 3 (DOS)– 3 + 1 -> 2 (Probing)– 3 + 2 -> 1 (R2L)
![Page 45: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/45.jpg)
45/49
Parameter Sensitivity
• Parameters– Selection threshold in lo
cal structure based adjustment
– Number of clusters
![Page 46: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/46.jpg)
46/49
Outline• Introduction to transfer learning• Related work
– Sample selection bias– Semi-supervised learning– Multi-task learning
• Learning from one or multiple source domains– Locally weighted ensemble framework– Graph-based heuristic
• Experiments• Conclusions
![Page 47: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/47.jpg)
47/49
Conclusions• Locally weighted ensemble framework
– transfer useful knowledge from multiple source domains
• Graph-based heuristics to compute weights– Make the framework practical and effecti
ve
![Page 48: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/48.jpg)
48/49
Feedbacks• Transfer learning is real problem
– Spam filtering– Sentiment analysis
• Learning from multiple source domains is useful– Relax the assumption– Determine parameters
![Page 49: Knowledge Transfer via Multiple Model Local Structure Mapping](https://reader035.vdocument.in/reader035/viewer/2022070410/568145fd550346895db30cc9/html5/thumbnails/49.jpg)
49/49
Thanks!
• Any questions?
http://www.ews.uiuc.edu/~jinggao3/kdd08transfer.htm
Office: 2119B