graph-based consensus maximization among multiple supervised and unsupervised models jing gao 1,...

46
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1 , Feng Liang 2 , Wei Fan 3 , Yizhou Sun 1 , Jiawei Han 1 1 CS UIUC 2 STAT UIUC 3 IBM TJ Watson NIPS’2009

Upload: jonathan-houston

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models

Jing Gao1, Feng Liang2, Wei Fan3, Yizhou Sun1, Jiawei Han1

1 CS UIUC2 STAT UIUC

3 IBM TJ Watson

NIPS’2009

Page 2: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

2/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 3: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

3/46

Ensemble

Data

……

model 1

model 2

model k

Ensemble model

Applications: classification, clustering, collaborative filtering, anomaly detection……

Combine multiple models into one!

Page 4: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

4/46

Stories of Success• Million-dollar prize

– Improve the baseline movie recommendation approach of Netflix by 10% in accuracy

– The top submissions all combine several teams and algorithms as an ensemble

• Data mining competitions– Classification problems– Winning teams employ an

ensemble of classifiers

Page 5: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

5/46

Why Ensemble Works? (1)• Intuition

– combining diverse, independent opinions in human decision-making as a protective mechanism (e.g. stock portfolio)

• Uncorrelated error reduction– Suppose we have 5 completely independent

classifiers for majority voting– If accuracy is 70% for each

• 10 (.7^3)(.3^2)+5(.7^4)(.3)+(.7^5) • 83.7% majority vote accuracy

– 101 such classifiers• 99.9% majority vote accuracy

from T. Holloway, Introduction to Ensemble Learning, 2007.

Page 6: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

6/46

Why Ensemble Works? (2)

Model 1

Model 2Model 3

Model 4Model 5

Model 6

Some unknown distribution

Ensemble gives the global picture!

from W. Fan, Random Decision Tree.

Page 7: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

7/46

Why Ensemble Works? (3)

• Overcome limitations of single hypothesis– The target function may not be implementable with individual

classifiers, but may be approximated by model averaging

Decision Tree Model Averagingfrom I. Davidson et. al., When Efficient Model Averaging Out-Performs Boosting and Bagging, ECML 06.

Page 8: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

8/46

Research Focus

• Base models– Improve diversity!

• Combination scheme– Consensus (unsupervised)

– Learn to combine (supervised)

• Tasks– Classification (supervised ensemble)

– Clustering (unsupervised ensemble)

Page 9: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

9/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 10: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

10/46

Bagging• Bootstrap

– Sampling with replacement– Contains around 63.2% original records in each sample

• Ensemble– Train a classifier on each bootstrap sample

– Use majority voting to determine the class label of ensemble classifier

• Discussions– Incorporate diversity through bootstrap samples

– Sensitive base classifiers work better, such as decision tree

Page 11: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

11/46

Boosting• Principles

– Boost a set of weak learners to a strong learner– Make records currently misclassified more important

• AdaBoost– Initially, set uniform weights on all the records

– At each round• Create a bootstrap sample based on the weights

• Train a classifier on the sample and apply it on the original training set• Records that are wrongly classified will have their weights increased• Records that are classified correctly will have their weights decreased• If the error rate is higher than 50%, start over

– Final prediction is weighted average of all the classifiers with weight representing the training accuracy

Page 12: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

12/46

Classifications (colors) and Weights (size) after 1 iterationOf AdaBoost

3 iterations20 iterations

from Elder, John. From Trees to Forests and Rule Sets - A Unified Overview of Ensemble Methods. 2007.

Page 13: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

13/46

Random Forest• Algorithm

– For each tree• Choose a training set by choosing N times with

replacement from the training set

• For each node, randomly choose m<M features and calculate the best split

• Fully grown and not pruned

– Use majority voting among all the trees

• Discussions– Bagging+random features: improve diversity

Page 14: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

14/46

B1: {0,1}

B2: {0,1}

B3: continuous

B2: {0,1}

B3: continuous

B2: {0,1}

B3: continuous

B3: continous

B1 == 0

B2 == 0?

Y

B3 < 0.3?

N

Y N

……… B3 < 0.6?

Random threshold 0.3

Random threshold 0.6

B1 chosen randomly

B2 chosen randomly

B3 chosen randomly

Random Decision Tree

from W. Fan, Random Decision Tree.

Page 15: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

15/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 16: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

16/46

Clustering Ensemble• Goal

– Combine “weak” clusterings to a better one

from A. Topchy et. al. Clustering Ensembles: Models of Consensus and Weak Partitions. PAMI, 2005

Page 17: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

17/46

Methods• Base Models

– Bootstrap samples, different subsets of features– Different clustering algorithms– Random number of clusters

• Combination– find the correspondence between the labels in

the partitions and fuse the clusters with the same labels

– treat each output as a categorical variable and cluster in the new feature space

Page 18: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

18/46

Meta Clustering (1)• Cluster-based

– Regard each cluster from a base model as a record

– Similarity is defined as the percentage of shared common examples

– Conduct meta-clustering and assign record to the associated meta-cluster

• Instance-based– Compute the similarity between two records as the percentage of models

that put them into the same cluster

v2

v4

v3

v1

v5

v6

c2 c5

c3c1

c7

c9

c4 c6

c8

c10

from A. Gionis et. al. Clustering Aggregation. TKDD, 2007

Page 19: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

19/46

Meta Clustering (2)• Probability-based

– Assume output comes from a mixture of models– Use EM algorithm to learn the model

• Spectral clustering– Formulate the problem as a bipartite graph

– Use spectral clustering to partition the graph

v2 v4v3v1 v5 v6

c2 c5c3c1

c7 c8

c4 c6

c9 c10

from A. Gionis et. al. Clustering Aggregation. TKDD, 2007

Page 20: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

20/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 21: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

21/46

Multiple Source Classification

Image Categorization Like? Dislike? Research Area

images, descriptions, notes, comments, albums, tags…….

movie genres, cast, director, plots…….

users viewing history, movie ratings…

publication and co-authorship network, published papers, …….

Page 22: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

22/46

Model Combination helps!

Some areas share similar keywordsSIGMOD

SDM

ICDM

KDD

EDBT

VLDB

ICML

AAAI

Tom

Jim

Lucy

Mike

Jack

Tracy

Cindy

Bob

Mary

Alice

People may publish in relevant but different areas

There may be cross-discipline co-operations

supervised

unsupervised

Supervised or unsupervised

Page 23: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

23/46

Problem

Page 24: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

24/46

Motivations• Consensus maximization

– Combine output of multiple supervised and unsupervised models on a set of objects

– The predicted labels should agree with the base models as much as possible

• Motivations– Unsupervised models provide useful constraints for

classification tasks

– Model diversity improves prediction accuracy and robustness

– Model combination at output level is needed due to privacy-preserving or incompatible formats

Page 25: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

25/46

Related Work (1)

• Single models – Supervised: SVM, Logistic regression, ……– Unsupervised: K-means, spectral clustering, ……– Semi-supervised learning, transductive learning

• Supervised ensemble– Require raw data and labels: bagging, boosting, Bay

esian model averaging

– Require labels: mixture of experts, stacked generalization

– Majority voting works at output level and does not require labels

Page 26: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

26/46

Related Work (2)

• Unsupervised ensemble – find a consensus clustering from multiple par

titionings without accessing the features

• Multi-view learning– a joint model is learnt from both labeled and

unlabeled data from multiple sources– it can be regarded as a semi-supervised ens

emble requiring access to the raw data

Page 27: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

27/46

Related Work (3)

Page 28: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

28/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 29: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

29/46

A Toy Example

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

1

2

3

1

2

3

Page 30: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

30/46

Groups-Objects

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

x7

x4

x5 x6

x1 x2

x3

1

2

3

1

2

3

g1

g2

g3

g4

g5

g6

g7

g8

g9

g10

g11

g12

Page 31: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

31/46

Bipartite Graph

],...,[ 1 icii uuu

],...,[ 1 jcjj qqq

iu

Groups Objects

M1

M3

jq

object i

group j

conditional prob vector

otherwise

qua jiij 0

1adjacency

initial probability

cg

g

y

j

j

j

]10...0[

............

1]0...01[

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

Page 32: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

32/46

Objective

iu

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

minimize disagreement

)||||||||(min 2

11 1

2, jj

s

j

n

i

v

jjiijUQ yqqua

Similar conditional probability if the object is connected to the group

Do not deviate much from the initial probability

Page 33: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

33/46

Methodology

iu

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

v

jij

v

jjij

i

a

qa

u

1

1

n

iij

j

n

iiij

j

a

yuaq

1

1

Iterate until convergence

Update probability of a group

Update probability of an object

n

iij

n

iiij

j

a

uaq

1

1

Page 34: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

34/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 35: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

35/46

Constrained Embedding

v

j

c

zn

i ij

n

i izijjzUQ

a

uaq

1 11

1,min

)||||||||(min 2

11 1

2, jj

s

j

n

i

v

jjiijUQ yqqua

objects

groups

constraints for groups from classification models

zislabelsgifq jjz '1

Page 36: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

36/46

Ranking on Consensus Structure

iu

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

jq

1.11.11

1. )( yDqADADDq nT

v

query

adjacency matrix

personalized damping factors

Page 37: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

37/46

Incorporating Labeled Information

iu

)||||||||(min 2

11 1

2, jj

s

j

n

i

v

jjiijUQ yqqua

Groups Objects

M1

M3

jq

[1 0 0]

[0 1 0] [0 0 1]

M2

M4……

……

v

jij

v

jjij

i

a

qa

u

1

1

n

iij

j

n

iiij

j

a

yuaq

1

1

Objective

Update probability of a group

Update probability of an object

l

iii fu

1

2||||

v

jij

v

jijij

i

a

fqa

u

1

1

n

iij

n

iiij

j

a

uaq

1

1

Page 38: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

38/46

Outline• An overview of ensemble methods

– Introduction– Supervised ensemble techniques– Unsupervised ensemble techniques

• Consensus among supervised and unsupervised models– Problem and motivation

– Methodology

– Interpretations

– Experiments

Page 39: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

39/46

Experiments-Data Sets

• 20 Newsgroup– newsgroup messages categorization– only text information available

• Cora– research paper area categorization– paper abstracts and citation information available

• DBLP– researchers area prediction– publication and co-authorship network, and

publication content– conferences’ areas are known

Page 40: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

40/46

Experiments-Baseline Methods (1)

• Single models– 20 Newsgroup:

• logistic regression, SVM, K-means, min-cut

– Cora• abstracts, citations (with or without a labeled set)

– DBLP• publication titles, links (with or without labels from conferences)

• Proposed method– BGCM– BGCM-L: semi-supervised version combining four models– 2-L: two models– 3-L: three models

Page 41: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

41/46

Experiments-Baseline Methods (2)

• Ensemble approaches– clustering ensemble on all of the four models-

MCLA, HBGF

SingleModels

Ensemble atRaw Data

Ensemble at Output

Level

K-means, Spectral Clustering,

…...

Semi-supervised Learning,

Collective Inference

SVM, Logistic Regression,

…...

Multi-view Learning

Bagging, Boosting, Bayesian

model averaging,

…...

Unsupervised Learning

Supervised Learning

Semi-supervised Learning

Clustering Ensemble

Consensus Maximization

Majority Voting

Mixture of Experts, Stacked

Generalization

Page 42: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

42/46

Accuracy (1)

Page 43: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

43/46

Accuracy (2)

Page 44: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

44/46

Page 45: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

45/46

Conclusions• Ensemble

– Combining independent, diversified models improves accuracy

– Information explosion, various learning packages available

• Consensus Maximization– Combine the complementary predictive powers of

multiple supervised and unsupervised models– Propagate labeled information between group and

object nodes iteratively over a bipartite graph– Two interpretations: constrained embedding and

ranking on consensus structure• Applications

– Multiple source learning, Ranking, Truth Finding……

Page 46: Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1

46/46

Thanks!

• Any questions?

http://www.ews.uiuc.edu/~jinggao3/nips09bgcm.htm

[email protected]