graph-based consensus maximization among multiple supervised and unsupervised models

Graph-based Consensus Maximizationamong Multiple Supervised and

Unsupervised Models

Jing Gao1, Feng Liang2, Wei Fan3, Yizhou Sun1, Jiawei Han1

1 CS UIUC2 STAT UIUC

3 IBM TJ Watson

2/19

A Toy Example

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

1

2

3

1

2

3

3/19

Motivations• Consensus maximization

– Combine outputs of multiple supervised and unsupervised models on a set of objects for better label predictions

– The predicted labels should agree with the base models as much as possible

• Motivations– Unsupervised models provide useful constraints for

classification tasks

– Model diversity improves prediction accuracy and robustness

– Model combination at output level is needed in distributed computing or privacy-preserving applications

4/19

Related Work (1)

• Single models – Supervised: SVM, Logistic regression, ……– Unsupervised: K-means, spectral clustering, ……– Semi-supervised learning, collective inference

• Supervised ensemble– Require raw data and labels: bagging, boosting,

Bayesian model averaging

– Require labels: mixture of experts, stacked generalization

– Majority voting works at output level and does not require labels

5/19

Related Work (2)

• Unsupervised ensemble – find a consensus clustering from multiple par

titionings without accessing the features

• Multi-view learning– a joint model is learnt from both labeled and

unlabeled data from multiple sources– it can be regarded as a semi-supervised ens

emble requiring access to the raw data

6/19

Related Work (3)

SingleModels

Ensemble atRaw Data

Ensemble at Output

Level

K-means, Spectral Clustering,

…...

Semi-supervised Learning,

Collective Inference

SVM, Logistic Regression,

…...

Multi-view Learning

Bagging, Boosting, Bayesian

model averaging,

…...

Unsupervised Learning

Supervised Learning

Semi-supervised Learning

Clustering Ensemble

Consensus Maximization

Majority Voting

Mixture of Experts, Stacked

Generalization

7/19

Groups-Objects

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

1

2

3

1

2

3

g1

g2

g3

g4

g5

g6

g7

g8

g9

g10

g11

g12

8/19

Bipartite Graph

],...,[ 1 icii uuu

],...,[ 1 jcjj qqq

iu

Groups Objects

M1

M3

jq

object i

group j

conditional prob vector

otherwise

qua jiij 0

1adjacency

initial probability

cg

g

y

j

j

j

]10...0[

............

1]0...01[

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

9/19

Objective

iu

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

minimize disagreement

)||||||||(min 2

11 1

2, jj

s

j

n

i

v

jjiijUQ yqqua

Similar conditional probability if the object is connected to the group

Do not deviate much from the initial probability

10/19

Methodology

iu

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

v

jij

v

jjij

i

a

qa

u

1

1

n

iij

j

n

iiij

j

a

yuaq

1

1

Iterate until convergence

Update probability of a group

Update probability of an object

11/19

Constrained Embedding

v

j

c

zn

i ij

n

i izijjzUQ

a

uaq

1 11

1,min

)||||||||(min 2

11 1

2, jj

s

j

n

i

v

jjiijUQ yqqua

objects

groups

constraints for groups from classification models

zislabelsgifq jjz '1

12/19

Ranking on Consensus Structure

iu

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

jq

1.11.11

1. )( yDqADADDq nT

v

query

adjacency matrix

personalized damping factors

13/19

Incorporating Labeled Information

iu

)||||||||(min 2

11 1

2, jj

s

j

n

i

v

jjiijUQ yqqua

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

v

jij

v

jjij

i

a

qa

u

1

1

n

iij

j

n

iiij

j

a

yuaq

1

1

Objective

Update probability of a group

Update probability of an object

l

iii fu

1

2||||

v

jij

v

jijij

i

a

fqa

u

1

1

14/19

Experiments-Data Sets

• 20 Newsgroup– newsgroup messages categorization– only text information available

• Cora– research paper area categorization– paper abstracts and citation information available

• DBLP– researchers area prediction– publication and co-authorship network, and

publication content– conferences’ areas are known

15/19

Experiments-Baseline Methods (1)

• Single models– 20 Newsgroup:

• logistic regression, SVM, K-means, min-cut

– Cora• abstracts, citations (with or without a labeled set)

– DBLP• publication titles, links (with or without labels from conferences)

• Proposed method– BGCM– BGCM-L: semi-supervised version combining four models– 2-L: two models– 3-L: three models

16/19

Experiments-Baseline Methods (2)

• Ensemble approaches– clustering ensemble on all of the four models-

MCLA, HBGF

SingleModels

Ensemble atRaw Data

Ensemble at Output

Level

K-means, Spectral Clustering,

…...

Semi-supervised Learning,

Collective Inference

SVM, Logistic Regression,

…...

Multi-view Learning

Bagging, Boosting, Bayesian

model averaging,

…...

Unsupervised Learning

Supervised Learning

Semi-supervised Learning

Clustering Ensemble

Consensus Maximization

Majority Voting

Mixture of Experts, Stacked

Generalization

17/19

Accuracy (1)

18/19

Accuracy (2)

20/19

Conclusions• Summary

– Combine the complementary predictive powers of multiple supervised and unsupervised models

– Lossless summarization of base model outputs in group-object bipartite graph

– Propagate labeled information between group and object nodes iteratively

– Two interpretations: constrained embedding and ranking on consensus structure

– Results on various data sets show the benefits

graph-based consensus maximization among multiple supervised and unsupervised models

Documents

semisupervised ensemble

groupupdate probability

unsupervised ensemble

classification models

convergenceupdate probability

logistic regression

base models

consensus clustering