em and model selection cs8690 computer vision university of missouri at columbia missing variable...

53
EM and model selection

Upload: sara-davidson

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

EM and model selection

Page 2: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Missing variable problems• In many vision problems, if some variables were known the

maximum likelihood inference problem would be easy

– fitting; if we knew which line each token came from, it would be easy to determine line parameters

– segmentation; if we knew the segment each pixel came from, it would be easy to determine the segment parameters

– fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix

– etc.

• This sort of thing happens in statistics, too

• In many vision problems, if some variables were known the maximum likelihood inference problem would be easy

– fitting; if we knew which line each token came from, it would be easy to determine line parameters

– segmentation; if we knew the segment each pixel came from, it would be easy to determine the segment parameters

– fundamental matrix estimation; if we knew which feature corresponded to which, it would be easy to determine the fundamental matrix

– etc.

• This sort of thing happens in statistics, too

Page 3: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Missing variable problems• Strategy

– estimate appropriate values for the missing variables

– plug these in, now estimate parameters

– re-estimate appropriate values for missing variables, continue

• eg

– guess which line gets which point

– now fit the lines

– now reallocate points to lines, using our knowledge of the lines

– now refit, etc.

• We’ve seen this line of thought before (k means)

• Strategy

– estimate appropriate values for the missing variables

– plug these in, now estimate parameters

– re-estimate appropriate values for missing variables, continue

• eg

– guess which line gets which point

– now fit the lines

– now reallocate points to lines, using our knowledge of the lines

– now refit, etc.

• We’ve seen this line of thought before (k means)

Page 4: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Missing variables - strategy• We have a problem with

parameters, missing variables

• This suggests:

• Iterate until convergence

– replace missing variable with expected values, given fixed values of parameters

– fix missing variables, choose parameters to maximize likelihood given fixed values of missing variables

• We have a problem with parameters, missing variables

• This suggests:

• Iterate until convergence

– replace missing variable with expected values, given fixed values of parameters

– fix missing variables, choose parameters to maximize likelihood given fixed values of missing variables

• e.g., iterate till convergence

– allocate each point to a line with a weight, which is the probability of the point given the line

– refit lines to the weighted set of points

• Converges to local extremum

• Somewhat more general form is available

• e.g., iterate till convergence

– allocate each point to a line with a weight, which is the probability of the point given the line

– refit lines to the weighted set of points

• Converges to local extremum

• Somewhat more general form is available

Page 5: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

K-Means• Choose a fixed number of

clusters

• Choose cluster centers and point-cluster allocations to minimize error

• can’t do this by search, because there are too many possible allocations.

• Choose a fixed number of clusters

• Choose cluster centers and point-cluster allocations to minimize error

• can’t do this by search, because there are too many possible allocations.

• Algorithm– fix cluster centers;

allocate points to closest cluster

– fix allocation; compute best cluster centers

• x could be any set of features for which we can compute a distance (careful about scaling)

• Algorithm– fix cluster centers;

allocate points to closest cluster

– fix allocation; compute best cluster centers

• x could be any set of features for which we can compute a distance (careful about scaling)

x j i

2

jelements of i'th cluster

iclusters

* From Marc Pollefeys COMP 256 2003

Page 6: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

K-Means

Page 7: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

K-Means

* From Marc Pollefeys COMP 256 2003

Page 8: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Image Segmentation by K-Means

• Select a value of K

• Select a feature vector for every pixel (color, texture, position, or combination of these etc.)

• Define a similarity measure between feature vectors (Usually Euclidean Distance).

• Apply K-Means Algorithm.

• Apply Connected Components Algorithm.

• Merge any components of size less than some threshold to an adjacent component that is most similar to it.

• Select a value of K

• Select a feature vector for every pixel (color, texture, position, or combination of these etc.)

• Define a similarity measure between feature vectors (Usually Euclidean Distance).

• Apply K-Means Algorithm.

• Apply Connected Components Algorithm.

• Merge any components of size less than some threshold to an adjacent component that is most similar to it.

* From Marc Pollefeys COMP 256 2003

Page 9: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

K-Means• Is an approximation to EM

– Model (hypothesis space): Mixture of N Gaussians– Latent variables: Correspondence of data and Gaussians

• We notice: – Given the mixture model, it’s easy to calculate the

correspondence– Given the correspondence it’s easy to estimate the mixture

models

• Is an approximation to EM– Model (hypothesis space): Mixture of N Gaussians– Latent variables: Correspondence of data and Gaussians

• We notice: – Given the mixture model, it’s easy to calculate the

correspondence– Given the correspondence it’s easy to estimate the mixture

models

Page 10: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Generalized K-Means (EM)

Page 11: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Generalized K-Means• Converges!

• Proof [Neal/Hinton, McLachlan/Krishnan]:– E/M step does not decrease data likelihood– Converges at saddle point

• Converges!

• Proof [Neal/Hinton, McLachlan/Krishnan]:– E/M step does not decrease data likelihood– Converges at saddle point

Page 12: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Idea• Data generated from mixture of Gaussians

• Latent variables: Correspondence between Data Items and Gaussians

• Data generated from mixture of Gaussians

• Latent variables: Correspondence between Data Items and Gaussians

Page 13: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Learning a Gaussian Mixture(with known covariance)

k

n

x

x

ni

ji

e

e

1

)(2

1

)(2

1

22

22

m

iiijj xzE

m 1

][1M-Step

k

nni

jiij

xxp

xxpzE

1

)|(

)|(][

E-Step

Page 14: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

• In general we may imagine that an image comprises L segments (labels)

• Within segment l the pixels (feature vectors) have a probability distribution represented by

• represents the parameters of the data in segment l – Mean and variance of the greylevels

– Mean vector and covariance matrix of the colours

– Texture parameters

• In general we may imagine that an image comprises L segments (labels)

• Within segment l the pixels (feature vectors) have a probability distribution represented by

• represents the parameters of the data in segment l – Mean and variance of the greylevels

– Mean vector and covariance matrix of the colours

– Texture parameters

14

The Expectation/Maximization (EM) algorithm

)|( ll xp

l

Page 15: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia 15

1

2

3

4

5

The Expectation/Maximization (EM) algorithm

Page 16: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm• Once again a chicken and egg problem arises

– If we knew then we could obtain a labelling for each by simply choosing that label which maximizes

– If we knew the label for each we could obtain by using a simple maximum likelihood estimator

• The EM algorithm is designed to deal with this type of problem but it frames it slightly differently

– It regards segmentation as a missing (or incomplete) data estimation problem

• Once again a chicken and egg problem arises

– If we knew then we could obtain a labelling for each by simply choosing that label which maximizes

– If we knew the label for each we could obtain by using a simple maximum likelihood estimator

• The EM algorithm is designed to deal with this type of problem but it frames it slightly differently

– It regards segmentation as a missing (or incomplete) data estimation problem

16

Lll ..1 : )|( ll xp x

Lll ..1 : x

Page 17: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm• The incomplete data are just the measured pixel greylevels or

feature vectors

– We can define a probability distribution of the incomplete data as

• The complete data are the measured greylevels or feature vectors plus a mapping function which indicates the labelling of each pixel

– Given the complete data (pixels plus labels) we can easily work out estimates of the parameters

– But from the incomplete data no closed form solution exists

• The incomplete data are just the measured pixel greylevels or feature vectors

– We can define a probability distribution of the incomplete data as

• The complete data are the measured greylevels or feature vectors plus a mapping function which indicates the labelling of each pixel

– Given the complete data (pixels plus labels) we can easily work out estimates of the parameters

– But from the incomplete data no closed form solution exists

).....,;( 21 Li xp

(.)f

Lll ..1 :

Page 18: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm• Once again we resort to an iterative strategy and

hope that we get convergence

• The algorithm is as follows:

• Once again we resort to an iterative strategy and hope that we get convergence

• The algorithm is as follows:

18

Initialize an estimate of Repeat

Step 1: (E step)Obtain an estimate of the labels based on

the current parameter estimates

Step 2: (M step)Update the parameter estimates based on

the current labellingUntil Convergence

Lll ..1 :

Page 19: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm• A recent approach to applying EM to image

segmentation is to assume the image pixels or feature vectors follow a mixture model– Generally we assume that each component of the

mixture model is a Gaussian – A Gaussian mixture model (GMM)

• A recent approach to applying EM to image segmentation is to assume the image pixels or feature vectors follow a mixture model– Generally we assume that each component of the

mixture model is a Gaussian – A Gaussian mixture model (GMM)

19

L

l lll xpxp1

)|()|(

))()(2

1exp(

)det()2(

1)|( 1

2/12/ llT

ll

dll xxxp

11

L

l l

Page 20: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm

• Our parameter space for our distribution now includes the mean vectors and covariance matrices for each component in the mixture plus the mixing weights

• We choose a Gaussian for each component because the ML estimate of each parameter in the E-step becomes linear

• Our parameter space for our distribution now includes the mean vectors and covariance matrices for each component in the mixture plus the mixing weights

• We choose a Gaussian for each component because the ML estimate of each parameter in the E-step becomes linear

20

LLL ,,.,.........,, 111

Page 21: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm• Define a posterior probability as the

probability that pixel j belongs to region l given the value of the feature vector

• Using Bayes rule we can write the following equation

• This actually is the E-step of our EM algorithm as allows us to assign probabilities to each label at each pixel

• Define a posterior probability as the probability that pixel j belongs to region l given the value of the feature vector

• Using Bayes rule we can write the following equation

• This actually is the E-step of our EM algorithm as allows us to assign probabilities to each label at each pixel

21

),|( ljxlP

jx

L

k kkk

ljlllj

xp

xpxlP

1)|(

)|(),|(

Page 22: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The Expectation/Maximization (EM) algorithm• The M step simply updates the parameter

estimates using MLL estimation• The M step simply updates the parameter

estimates using MLL estimation

22

n

j

mlj

ml xlP

n 1

)()1( ),|(1

n

j

mlj

n

j

mljj

ml

xlP

xlPx

1

)(

1

)(

)1(

),|(

),|(

n

j

mlj

n

j

Tmlj

mlj

mlj

ml

xlP

xxxlP

1

)(

1

)()()(

)1(

),|(

))((),|(

Page 23: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Figure from “Color and Texture Based Image Segmentation Using EM and Its Application to Content Based Image Retrieval”,S.J. Belongie et al., Proc. Int. Conf. Computer Vision, 1998, c1998, IEEE

Segmentation with EM

Page 24: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

EM Clustering: Results

Page 25: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Lines and robustness• We have one line, and n

points• Some come from the

line, some from “noise”• This is a mixture model:

• We have one line, and n points

• Some come from the line, some from “noise”

• This is a mixture model:

• We wish to determine– line parameters

– p(comes from line)

• We wish to determine– line parameters

– p(comes from line)

P point | line and noise params P point | line P comes from line P point | noise P comes from noise P point | line P point | noise (1 )

Page 26: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Estimating the mixture model• Introduce a set of hidden

variables, , one for each point. They are one when the point is on the line, and zero when off.

• If these are known, the negative log-likelihood becomes (the line’s parameters are c):

• Introduce a set of hidden variables, , one for each point. They are one when the point is on the line, and zero when off.

• If these are known, the negative log-likelihood becomes (the line’s parameters are c):

• Here K is a normalising constant, kn is the noise intensity (we’ll choose this later).

• Here K is a normalising constant, kn is the noise intensity (we’ll choose this later).

Qc x; i

xi cos yi sin c 2

2 2

1 i kn

i

K

Page 27: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Substituting for delta• We shall substitute the

expected value of , for a given

• recall =(, c, )• E(_i)=1. P(_i=1|)

+0....

• We shall substitute the expected value of , for a given

• recall =(, c, )• E(_i)=1. P(_i=1|)

+0....

• Notice that if kn is small and positive, then if distance is small, this value is close to 1 and if it is large, close to zero

• Notice that if kn is small and positive, then if distance is small, this value is close to 1 and if it is large, close to zero

P i 1|,xi P xi |i 1, P i 1

P xi | i 1, P i 1 P xi |i 0, P i 0

exp 1

2 2 xi cos yi sin c 2 exp 1

2 2 xi cos yi sin c 2 exp kn 1

Page 28: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Algorithm for line fitting• Obtain some start point

• Now compute’s using formula above

• Now compute maximum likelihood estimate of

– , c come from fitting to weighted points

– comes by counting

• Obtain some start point

• Now compute’s using formula above

• Now compute maximum likelihood estimate of

– , c come from fitting to weighted points

– comes by counting

• Iterate to convergence • Iterate to convergence

0 0 ,c 0 , 0

1

Page 29: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Page 30: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

The expected values of the deltas at the maximum(notice the one value close to zero).

Page 31: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Closeup of the fit

Page 32: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Choosing parameters• What about the noise parameter, and the sigma for the line?

– several methods • from first principles knowledge of the problem (seldom

really possible)• play around with a few examples and choose (usually quite

effective, as precise choice doesn’t matter much)– notice that if kn is large, this says that points very seldom

come from noise, however far from the line they lie• usually biases the fit, by pushing outliers into the line• rule of thumb; its better to fit to the better fitting points,

within reason; if this is hard to do, then the model could be a problem

• What about the noise parameter, and the sigma for the line?– several methods

• from first principles knowledge of the problem (seldom really possible)

• play around with a few examples and choose (usually quite effective, as precise choice doesn’t matter much)

– notice that if kn is large, this says that points very seldom come from noise, however far from the line they lie

• usually biases the fit, by pushing outliers into the line• rule of thumb; its better to fit to the better fitting points,

within reason; if this is hard to do, then the model could be a problem

Page 33: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Other examples• Segmentation

– a segment is a gaussian that emits feature vectors (which could contain colour; or colour and position; or colour, texture and position).

– segment parameters are mean and (perhaps) covariance

– if we knew which segment each point belonged to, estimating these parameters would be easy

– rest is on same lines as fitting line

• Segmentation– a segment is a gaussian that

emits feature vectors (which could contain colour; or colour and position; or colour, texture and position).

– segment parameters are mean and (perhaps) covariance

– if we knew which segment each point belonged to, estimating these parameters would be easy

– rest is on same lines as fitting line

• Fitting multiple lines– rather like fitting one line,

except there are more hidden variables

– easiest is to encode as an array of hidden variables, which represent a table with a one where the i’th point comes from the j’th line, zeros otherwise

– rest is on same lines as above

• Fitting multiple lines– rather like fitting one line,

except there are more hidden variables

– easiest is to encode as an array of hidden variables, which represent a table with a one where the i’th point comes from the j’th line, zeros otherwise

– rest is on same lines as above

Page 34: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Issues with EM• Local maxima

– can be a serious nuisance in some problems– no guarantee that we have reached the “right”

maximum

• Starting– k means to cluster the points is often a good idea

• Local maxima– can be a serious nuisance in some problems– no guarantee that we have reached the “right”

maximum

• Starting– k means to cluster the points is often a good idea

Page 35: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Local maximum

Page 36: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

which is an excellent fit to some points

Page 37: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

and the deltas for this maximum

Page 38: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

A dataset that is well fitted by four lines

Page 39: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Result of EM fitting, with one line (or at least, one available local maximum).

Page 40: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Result of EM fitting, with two lines (or at least, one available local maximum).

Page 41: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Seven lines can produce a rather logical answer

Page 42: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Motion segmentation with EM• Model image pair (or video

sequence) as consisting of regions of parametric motion– affine motion is popular

Now we need to– determine which pixels belong

to which region

– estimate parameters

• Model image pair (or video sequence) as consisting of regions of parametric motion– affine motion is popular

Now we need to– determine which pixels belong

to which region

– estimate parameters

• Likelihood– assume

• Straightforward missing variable problem, rest is calculation

• Likelihood– assume

• Straightforward missing variable problem, rest is calculation

vx

vy

a b

c d

x

y

tx

ty

I x, y, t I x vx , y vy, t 1 noise

Page 43: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Three frames from the MPEG “flower garden” sequence

Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

Page 44: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Grey level shows region no. with highest probability

Segments and motion fields associated with themFigure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

Page 45: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

If we use multiple frames to estimate the appearanceof a segment, we can fill in occlusions; so we canre-render the sequence with some segments removed.

Figure from “Representing Images with layers,”, by J. Wang and E.H. Adelson, IEEE Transactions on Image Processing, 1994, c 1994, IEEE

Page 46: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Some generalities• Many, but not all

problems that can be attacked with EM can also be attacked with RANSAC– need to be able to get a

parameter estimate with a manageably small number of random choices.

– RANSAC is usually better

• Many, but not all problems that can be attacked with EM can also be attacked with RANSAC– need to be able to get a

parameter estimate with a manageably small number of random choices.

– RANSAC is usually better

• Didn’t present in the most general form

– in the general form, the likelihood may not be a linear function of the missing variables

– in this case, one takes an expectation of the likelihood, rather than substituting expected values of missing variables

– Issue doesn’t seem to arise in vision applications.

• Didn’t present in the most general form

– in the general form, the likelihood may not be a linear function of the missing variables

– in this case, one takes an expectation of the likelihood, rather than substituting expected values of missing variables

– Issue doesn’t seem to arise in vision applications.

Page 47: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Model Selection• We wish to choose a

model to fit to data– e.g. is it a line or a circle?

– e.g is this a perspective or orthographic camera?

– e.g. is there an aeroplane there or is it noise?

• We wish to choose a model to fit to data– e.g. is it a line or a circle?

– e.g is this a perspective or orthographic camera?

– e.g. is there an aeroplane there or is it noise?

• Issue– In general, models with

more parameters will fit a dataset better, but are poorer at prediction

– This means we can’t simply look at the negative log-likelihood (or fitting error)

• Issue– In general, models with

more parameters will fit a dataset better, but are poorer at prediction

– This means we can’t simply look at the negative log-likelihood (or fitting error)

Page 48: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Top is not necessarily a betterfit than bottom(actually, almost always worse)

Page 49: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Page 50: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

We can discount the fitting error with some term in the numberof parameters in the model.

Page 51: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Discounts• AIC (an information

criterion)– choose model with

smallest value of

– p is the number of parameters

• AIC (an information criterion)– choose model with

smallest value of

– p is the number of parameters

• BIC (Bayes information criterion)– choose model with smallest

value of

– N is the number of data points

• Minimum description length– same criterion as BIC, but

derived in a completely different way

• BIC (Bayes information criterion)– choose model with smallest

value of

– N is the number of data points

• Minimum description length– same criterion as BIC, but

derived in a completely different way

2L D;* p log N

2L D;* 2 p

Page 52: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Cross-validation• Split data set into two

pieces, fit to one, and compute negative log-likelihood on the other

• Average over multiple different splits

• Choose the model with the smallest value of this average

• Split data set into two pieces, fit to one, and compute negative log-likelihood on the other

• Average over multiple different splits

• Choose the model with the smallest value of this average

• The difference in averages for two different models is an estimate of the difference in KL divergence of the models from the source of the data

• The difference in averages for two different models is an estimate of the difference in KL divergence of the models from the source of the data

Page 53: EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables

CS8690 Computer Vision University of Missouri at Columbia

Model averaging • Very often, it is smarter to use

multiple models for prediction than just one

• e.g. motion capture data

– there are a small number of schemes that are used to put markers on the body

– given we know the scheme S and the measurements D, we can estimate the configuration of the body X

• Very often, it is smarter to use multiple models for prediction than just one

• e.g. motion capture data

– there are a small number of schemes that are used to put markers on the body

– given we know the scheme S and the measurements D, we can estimate the configuration of the body X

• We want

• If it is obvious what the scheme is from the data, then averaging makes little difference

• If it isn’t, then not averaging underestimates the variance of X --- we think we have a more precise estimate than we do.

• We want

• If it is obvious what the scheme is from the data, then averaging makes little difference

• If it isn’t, then not averaging underestimates the variance of X --- we think we have a more precise estimate than we do.

P X | D P X | S1, D P S1 | D P X | S2, D P S2 | D P X | S3, D P S3 | D