hybrids of generative and discriminative methods for machine learning

Post on 20-Jan-2016

42 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

MSRC Summer School - 30/06/2009. Hybrids of generative and discriminative methods for machine learning. Cambridge – UK. Motivation. Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However - PowerPoint PPT Presentation

TRANSCRIPT

MSRC Summer School - 30/06/2009

Cambridge – UK

Hybrids of generative anddiscriminative methods for

machine learning

Motivation

Generative models• prior knowledge• handle missing data such as labels

Discriminative models• perform well at classification

However• no straightforward way to combine them

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

Generative methods

Answer: “what does a cat look like? and a dog?” => data and labels joint distribution

x : data

c : label

: parameters

Generative methods

Objective function:G() = p() p(X, C|)

G() = p() n p(xn, cn|)

1 reusable model per class, can deal with incomplete data

Example: GMMs

Example of generative model

Discriminative methods

Answer: “is it a cat or a dog?” => labels posterior distribution

x : data

c : label

: parameters

Discriminative methods

The objective function isD() = p() p(C|X, )

D() = p() n p(cn|xn, )

Focus on regions of ambiguity, make faster predictions

Example: neural networks, SVMs

Example of discriminative model

SVMs / NNs

Generative versus discriminative

No effect of the double mode on the decision boundary

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

Semi-supervised learning

Few labelled data / lots of unlabelled data

Discriminative methods overfit, generative models only help classify if they are “good”

Need to have the modelling power of generative models while performing at discriminating => hybrid models

Discriminative trainingBach et al, ICASSP 05

Discriminative objective function:D() = p() n p(cn|xn, )

Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]

D() = p() n c p(xn, c|)

p(xn, cn|)

Convex combinationBouchard et al, COMPSTAT 04

Generative objective function:G() = p() n p(xn, cn|)

Discriminative objective function:D() = p() n p(cn|xn, )

Convex combination:log L() = log D() + (1- ) log G()

[0,1]

A principled hybrid model

A principled hybrid model

A principled hybrid model

A principled hybrid model

A principled hybrid model

- posterior distribution of the labels

’- marginal distribution of the data

and ’ communicate through a prior

Hybrid objective function:

L(,’) = p(,’) n p(cn|xn, ) n p(xn|’)

A principled hybrid model

= ’ => p(, ’) = p() (-’)

L(,’) = p() (-’) n p(cn|xn, ) n p(xn|’)

L() = G() generative case

’ => p(, ’) = p() p(’) L(,’) = [ p() n p(cn|xn, ) ] [ p(’) n p(xn|’) ] L(,’) = D() f(’) discriminative case

A principled hybrid model

Anything in between – hybrid case

Choice of prior:p(, ’) = p() N(’|, ())

0 => = ’

1 => => ’

Why principled?

Consistent with the likelihood of graphical models

=> one way to train a system

Everything can now be modelled => potential to be Bayesian

Potential to learn

Learning

EM / Laplace approximation / MCMC either intractable or too slow

Conjugate gradients flexible, easy to check BUT sensitive to

initialisation, slow

Variational inference

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

Toy example

Toy example

2 elongated distributions

Only spherical gaussians allowed => wrong model

2 labelled points per class => strong risk of overfitting

Toy example

Decision boundaries

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

A real example

Images are a special case, as they contain several features each

2 levels of supervision: at the image level, and at the feature level• Image label only => weakly labelled• Image label + segmentation => fully labelled

The underlying generative model

gaussian

multinomial

multinomial

The underlying generative model

weakly – fully labelled

Experimental set-up

3 classes: bikes, cows, sheep

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

Results

When increasing the proportion of fully labelled data, the trend is:

generative hybrid discriminative

Weakly labelled data has little influence on the trend

With sufficient fully labelled data, HF tends to perform better than CC

Experimental set-up

3 classes: lions, tigers and cheetahs

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

Results

Hybrid models consistently perform better

However, generative and discriminative models haven’t reached saturation

No clear difference between HF and CC

Conclusion

Principled hybrid framework

Possibility to learn the best trade-off

Helps for ambiguous datasets when labelled data is scarce

Problem of optimisation

Future avenues

Bayesian version (posterior distribution of ) under study

Replace by a diagonal matrix to allow flexibility => need for the Bayesian version

Choice of priors

Thank you!

top related