graph-based consensus maximization among multiple supervised and unsupervised models
DESCRIPTION
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models. Jing Gao 1 , Feng Liang 2 , Wei Fan 3 , Yizhou Sun 1 , Jiawei Han 1 1 CS UIUC 2 STAT UIUC 3 IBM TJ Watson. A Toy Example. x1. x2. x1. x2. x1. x2. x1. x2. 1. 1. x3. x4. x3. x4. x3. x4. x3. - PowerPoint PPT PresentationTRANSCRIPT
Graph-based Consensus Maximizationamong Multiple Supervised and
Unsupervised Models
Jing Gao1, Feng Liang2, Wei Fan3, Yizhou Sun1, Jiawei Han1
1 CS UIUC2 STAT UIUC
3 IBM TJ Watson
2/19
A Toy Example
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
1
2
3
1
2
3
3/19
Motivations• Consensus maximization
– Combine outputs of multiple supervised and unsupervised models on a set of objects for better label predictions
– The predicted labels should agree with the base models as much as possible
• Motivations– Unsupervised models provide useful constraints for
classification tasks
– Model diversity improves prediction accuracy and robustness
– Model combination at output level is needed in distributed computing or privacy-preserving applications
4/19
Related Work (1)
• Single models – Supervised: SVM, Logistic regression, ……– Unsupervised: K-means, spectral clustering, ……– Semi-supervised learning, collective inference
• Supervised ensemble– Require raw data and labels: bagging, boosting,
Bayesian model averaging
– Require labels: mixture of experts, stacked generalization
– Majority voting works at output level and does not require labels
5/19
Related Work (2)
• Unsupervised ensemble – find a consensus clustering from multiple par
titionings without accessing the features
• Multi-view learning– a joint model is learnt from both labeled and
unlabeled data from multiple sources– it can be regarded as a semi-supervised ens
emble requiring access to the raw data
6/19
Related Work (3)
SingleModels
Ensemble atRaw Data
Ensemble at Output
Level
K-means, Spectral Clustering,
…...
Semi-supervised Learning,
Collective Inference
SVM, Logistic Regression,
…...
Multi-view Learning
Bagging, Boosting, Bayesian
model averaging,
…...
Unsupervised Learning
Supervised Learning
Semi-supervised Learning
Clustering Ensemble
Consensus Maximization
Majority Voting
Mixture of Experts, Stacked
Generalization
7/19
Groups-Objects
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
x7
x4
x5 x6
x1 x2
x3
1
2
3
1
2
3
g1
g2
g3
g4
g5
g6
g7
g8
g9
g10
g11
g12
8/19
Bipartite Graph
],...,[ 1 icii uuu
],...,[ 1 jcjj qqq
iu
Groups Objects
M1
M3
jq
object i
group j
conditional prob vector
otherwise
qua jiij 0
1adjacency
initial probability
cg
g
y
j
j
j
]10...0[
............
1]0...01[
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
9/19
Objective
iu
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
minimize disagreement
)||||||||(min 2
11 1
2, jj
s
j
n
i
v
jjiijUQ yqqua
Similar conditional probability if the object is connected to the group
Do not deviate much from the initial probability
10/19
Methodology
iu
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
v
jij
v
jjij
i
a
qa
u
1
1
n
iij
j
n
iiij
j
a
yuaq
1
1
Iterate until convergence
Update probability of a group
Update probability of an object
11/19
Constrained Embedding
v
j
c
zn
i ij
n
i izijjzUQ
a
uaq
1 11
1,min
)||||||||(min 2
11 1
2, jj
s
j
n
i
v
jjiijUQ yqqua
objects
groups
constraints for groups from classification models
zislabelsgifq jjz '1
12/19
Ranking on Consensus Structure
iu
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
jq
1.11.11
1. )( yDqADADDq nT
v
query
adjacency matrix
personalized damping factors
13/19
Incorporating Labeled Information
iu
)||||||||(min 2
11 1
2, jj
s
j
n
i
v
jjiijUQ yqqua
Groups Objects
M1
M3
jq
[1 0 0]
[0 1 0] [0 0 1]
M2
M4……
……
v
jij
v
jjij
i
a
qa
u
1
1
n
iij
j
n
iiij
j
a
yuaq
1
1
Objective
Update probability of a group
Update probability of an object
l
iii fu
1
2||||
v
jij
v
jijij
i
a
fqa
u
1
1
14/19
Experiments-Data Sets
• 20 Newsgroup– newsgroup messages categorization– only text information available
• Cora– research paper area categorization– paper abstracts and citation information available
• DBLP– researchers area prediction– publication and co-authorship network, and
publication content– conferences’ areas are known
15/19
Experiments-Baseline Methods (1)
• Single models– 20 Newsgroup:
• logistic regression, SVM, K-means, min-cut
– Cora• abstracts, citations (with or without a labeled set)
– DBLP• publication titles, links (with or without labels from conferences)
• Proposed method– BGCM– BGCM-L: semi-supervised version combining four models– 2-L: two models– 3-L: three models
16/19
Experiments-Baseline Methods (2)
• Ensemble approaches– clustering ensemble on all of the four models-
MCLA, HBGF
SingleModels
Ensemble atRaw Data
Ensemble at Output
Level
K-means, Spectral Clustering,
…...
Semi-supervised Learning,
Collective Inference
SVM, Logistic Regression,
…...
Multi-view Learning
Bagging, Boosting, Bayesian
model averaging,
…...
Unsupervised Learning
Supervised Learning
Semi-supervised Learning
Clustering Ensemble
Consensus Maximization
Majority Voting
Mixture of Experts, Stacked
Generalization
17/19
Accuracy (1)
18/19
Accuracy (2)
19/19
20/19
Conclusions• Summary
– Combine the complementary predictive powers of multiple supervised and unsupervised models
– Lossless summarization of base model outputs in group-object bipartite graph
– Propagate labeled information between group and object nodes iteratively
– Two interpretations: constrained embedding and ranking on consensus structure
– Results on various data sets show the benefits