an introduction to the em algorithm by naala brewer and kehinde salau project advisor – prof....

25
An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI, Arizona State University

Upload: milo-nickolas-parrish

Post on 14-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

An Introduction to the EM AlgorithmBy Naala Brewer and Kehinde Salau

Project Advisor – Prof. Randy EubankAdvisor – Prof. Carlos Castillo-ChavezMTBI, Arizona State University

Page 2: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

An Introduction to the EM AlgorithmOutline•History of the EM Algorithm

•Theory behind the EM Algorithm

•Biological Examples including derivations, coding in R, Matlab, C++

•Graphs of iterations and convergence

Page 3: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Brief History of the EM Algorithm

•Method frequently referenced throughout field of statistics

•Term coined in 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin

Page 4: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Breakdown of the EM Task•To compute MLEs of latent variables

and unknown parameters in probabilistic models

•E-step: computes expectation of complete/unobserved data

•M-step: computes MLEs of unknown parameters

•Repeat!!

Page 5: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Generalization of the EM Algorithm•X- Full sample (latent variable) ~ f(x; θ) Y - Observed sample (incomplete data) ~

f(y;θ) such that y(x) = y

•We define Q(θ;θp) = E[lnf(x;θ)|Y, θp]

•θp+1 obtained by solving, = 0

Page 6: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Generalization (cont.)

•Iterations continue until |θp+1 - θp| or |Q(θp+1;θp) - Q(θp;θp)| are sufficiently small

•Thus, optimal values for Q(θ;θp) and θ are obtained

•Likelihood nondecreasing with each iteration:

Q(θp+1;θp) ≥ Q(θp;θp)

Page 7: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Binomial Distribution – Bin(n,p)

Example 1 – Household Model•n-people, p-probability of getting disease•Derivation•Graphs

Page 8: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Binomial Distribution - Derivation

Page 9: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Binomial Derivation (cont.)

Page 10: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Binomial Derivation (cont.)

Page 11: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Example 2 – Population of Animals

Rao (1965, pp.368-369), Genetic Linkage Model• Suppose 197 animals are distributed multinomially into

four categories, y = (125, 18, 20, 34) = (y1, y2, y3, y4)

• A genetic model for the population specifies cell probabilities (1/2, ¼ – ¼л, ¼ – ¼л, ¼л)

• Represent y as incomplete data, y1=x1+x2, y2=x3, y3=x4, y4=x5.

Page 12: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Multinomial Distribution-Derivation

Page 13: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Multinomial Derivation (cont.)

Page 14: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Multinomial Derivation (cont.)

Page 15: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Multinomial Coding

Example 2 – Population of Animals•R Coding•Matlab Coding•C++ Coding

Page 16: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

R Coding

#initial vector of data

y <- c(125, 18, 20, 34)

#Initial value for unknown parameter

pik <- .5

for(k in 1:10){

x2k <-y[1]*(.25*pik)/(.5 +.25*pik)

pik <- (x2k + y[4])/(x2k + sum(y[2:4]))

print(c(x2k,pik)) #Convergent values

}

Matlab Coding

%initial vector of data

y = [125, 18, 20, 34];

%Initial value for unknown parameter

pik = .5;

for k = 1:10

x2k = y(1)*(.25*pik)/(.5 + .25*pik)

pik = (x2k + y(4))/(x2k + sum(y(2:4)))

end

%Convergent values

[x2k,pik]

Multinomial Coding

Page 17: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

C++ Coding

#include <iostream>

int main () {

int x1, x2, x3, x4;

float pik, x2k;

std::cout << "enter vector of values, there should be four inputs\n";

std::cin >> x1 >> x2 >> x3 >> x4;

std::cout << "enter value for pik\n";

std::cin >> pik;

for (int counter = 0; counter < 10; counter++){

x2k = x1*((0.25)*pik)/((0.5) + (0.25)*pik);

pik = (x2k + x4)/(x2k + x2 + x3 + x4);

std::cout << "x2k is " << x2k << " and " << " pik is " << pik << std::endl;

}

 

return 0;

}

Matlab Coding

%initial vector of data

y = [125, 18, 20, 34];

%Initial value for unknown parameter

pik = .5;

for k = 1:10

x2k = y(1)*(.25*pik)/(.5 + .25*pik)

pik = (x2k + y(4))/(x2k + sum(y(2:4)))

end

%Convergent values

[x2k,pik]

Multinomial Coding

Page 18: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Graph of Convergence of Unknowns,πk and x2

k

Multinomial Distribution

Page 19: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Example 2 -Failure TimesFlury and Zoppè (2000)▫Suppose the lifetime of bulbs follows an

exponential distribution with mean θ

▫The failure times (u1,...,un) are known for n light bulbs

▫In another experiment, m light bulbs (v1,...,vm) are tested; no individual recordings The number of bulbs, r, that fail at time t0 are

recorded

Page 20: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Exponential Distribution - Derivation

Page 21: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,
Page 22: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Exponential Derivation (cont.)

Page 23: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

•Example 2 – Failure Times Graphs

Page 24: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

Future Work

•More Biological Examples

Page 25: An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,

An Introduction to the EM AlgorithmReferences[1] Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum

Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1, , pp. 1-38

[2] Redner, R.A., Walker, H.F. (Apr., 1984). Mixture Densities, Maximum Likelihood and the EM Algorithm. SIAM Review, Vol. 26, No. 2., pp. 195-239.

[3] Tanner, A.T. (1996). Tools for Statistical Inference. Springer-Verlag New York, Inc. Third Edition