hybrids of generative and discriminative methods for machine learning

MSRC Summer School - 30/06/2009

Cambridge – UK

Hybrids of generative anddiscriminative methods for

machine learning

Motivation

Generative models• prior knowledge• handle missing data such as labels

Discriminative models• perform well at classification

However• no straightforward way to combine them

Content

Generative and discriminative methods

A principled hybrid framework• Study of the properties on a toy example• Influence of the amount of labelled data

Content

Generative methods

Answer: “what does a cat look like? and a dog?” => data and labels joint distribution

x : data

c : label

: parameters

Generative methods

Objective function:G() = p() p(X, C|)

G() = p() n p(xn, cn|)

1 reusable model per class, can deal with incomplete data

Example: GMMs

Example of generative model

Discriminative methods

Answer: “is it a cat or a dog?” => labels posterior distribution

x : data

c : label

: parameters

Discriminative methods

The objective function isD() = p() p(C|X, )

D() = p() n p(cn|xn, )

Focus on regions of ambiguity, make faster predictions

Example: neural networks, SVMs

Example of discriminative model

SVMs / NNs

Generative versus discriminative

No effect of the double mode on the decision boundary

Content

Semi-supervised learning

Few labelled data / lots of unlabelled data

Discriminative methods overfit, generative models only help classify if they are “good”

Need to have the modelling power of generative models while performing at discriminating => hybrid models

Discriminative trainingBach et al, ICASSP 05

Discriminative objective function:D() = p() n p(cn|xn, )

Using a generative model:D() = p() n [ p(xn, cn|) / p(xn|) ]

D() = p() n c p(xn, c|)

p(xn, cn|)

Convex combinationBouchard et al, COMPSTAT 04

Generative objective function:G() = p() n p(xn, cn|)

Discriminative objective function:D() = p() n p(cn|xn, )

Convex combination:log L() = log D() + (1- ) log G()

A principled hybrid model

- posterior distribution of the labels

’- marginal distribution of the data

and ’ communicate through a prior

Hybrid objective function:

L(,’) = p(,’) n p(cn|xn, ) n p(xn|’)

= ’ => p(, ’) = p() (-’)

L(,’) = p() (-’) n p(cn|xn, ) n p(xn|’)

L() = G() generative case

’ => p(, ’) = p() p(’) L(,’) = [ p() n p(cn|xn, ) ] [ p(’) n p(xn|’) ] L(,’) = D() f(’) discriminative case

Anything in between – hybrid case

Choice of prior:p(, ’) = p() N(’|, ())

0 => = ’

1 => => ’

Why principled?

Consistent with the likelihood of graphical models

=> one way to train a system

Everything can now be modelled => potential to be Bayesian

Potential to learn

Learning

EM / Laplace approximation / MCMC either intractable or too slow

Conjugate gradients flexible, easy to check BUT sensitive to

initialisation, slow

Variational inference

Content

Toy example

2 elongated distributions

Only spherical gaussians allowed => wrong model

2 labelled points per class => strong risk of overfitting

Toy example

Decision boundaries

Content

A real example

Images are a special case, as they contain several features each

2 levels of supervision: at the image level, and at the feature level• Image label only => weakly labelled• Image label + segmentation => fully labelled

The underlying generative model

gaussian

multinomial

The underlying generative model

weakly – fully labelled

Experimental set-up

3 classes: bikes, cows, sheep

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

Results

When increasing the proportion of fully labelled data, the trend is:

generative hybrid discriminative

Weakly labelled data has little influence on the trend

With sufficient fully labelled data, HF tends to perform better than CC

Experimental set-up

3 classes: lions, tigers and cheetahs

: 1 Gaussian per class => poor generative model

75 training images for each category

HF framework

HF versus CC

Results

Hybrid models consistently perform better

However, generative and discriminative models haven’t reached saturation

No clear difference between HF and CC

Conclusion

Principled hybrid framework

Possibility to learn the best trade-off

Helps for ambiguous datasets when labelled data is scarce

Problem of optimisation

Future avenues

Bayesian version (posterior distribution of ) under study

Replace by a diagonal matrix to allow flexibility => need for the Bayesian version

Choice of priors

Thank you!

hybrids of generative and discriminative methods for machine learning

p n pxn

p n pcnxn

p n pcnxn

n pxna principled hybrid

generative objective

hybrid casechoice of

cn pxn d

g generative case

Documents

a hybrid discriminative/generative approach for modeling

generative adversarial networks (gans) -...

machine learning: generative and discriminative...

day 5: generative models, structured...

generative and discriminative methods for online adaptation...

principled hybrids of generative and discriminative...

6 decision theory; generative and discriminative models

generative models vs. discriminative models

discriminative and generative classifiers

machine learning: generative and discriminative models

machine learning: generative and discriminative...

a mixed generative-discriminative framework for pedestrian...

generative vs. discriminative models, maximum likelihood...

on discriminative vs. generative classifiers: a · pdf...

networkscampar.in.tum.de/files/teaching/2019ws/mlmi/protected/gans/gan… ·...

hybrid generative-discriminative visual...

image classification using multimodal...

natural language processing - university of...

discriminative and generative methods for bags of features

a joint discriminative generative model for deformable...