maximum likelihood and glms
TRANSCRIPT
Maximum Likelihood and GLMs
Jonathan Pillow
Mathematical Tools for Neuroscience (NEU 314)Fall, 2021
lecture 20
1
quiz
1) Compute the conditional P(x | y = 1) 2) Compute the mean E(y)3) Compute the P(x)P(y), the independent approximation to P(x,y)4) Compute the entropy of P(x)5) Write down a formula for mutual information, I(x,y).
1
0.25 0.5
0.25 0
2
1
2
x
P(x,y)
y
BONUS: Compute the mutual information between x and y.
(Feel free to use calculator for this one, or you can use the fact that the entropy of the distribution [1/3 2/3] is approximately 0.9)
2
Estimation
( )1 2, ,..., Nr r r=r
s
neuron #
spik
e co
unt
parameter(“stimulus”)
measured dataset(“population response”)
model
Maximum likelihood estimator= value of at which the likelihood is maximal ✓̂ML
Maximum a posteriori (MAP) estimator= value of at which the posterior is maximal ✓̂MAP p(✓|m)
3
Simple Example: Gaussian noise & prior
1. Likelihood
additive Gaussian noise
zero-mean Gaussian2. Prior
mean variance
⟹ Posterior:
encoding model:
4
8 0 8
8
0
8
-
-
θ
m
Observation model
5
θ
m
8 0 8
8
0
8
-
-
Observation model
6
θ
m
8 0 8
8
0
8
-
-
Observation model
7
θ
m
8 0 8
8
0
8
-
-
Likelihood: considering as a function of θ
8
θ
m
8 0 8
8
0
8
-
-
8 0 8-
8 0 8-
Likelihood: considering as a function of θ
9
Prior
θ
m
8 0 8-
8
0
8
-
10
Computing the posterior
x
likelihood prior
00
∝
posterior
0
0
θm
11
x ∝
likelihood prior posterior
00 0
00 0
0
bias
m*
θm
Making an Bayesian Estimate:
12
x ∝
likelihood prior posterior
00 0
00 0
0
largerbias
θm
High Measurement Noise: large bias
13
x ∝
likelihood prior posterior
00 0
00 0
0
smallbias
θm
Low Measurement Noise: small bias
14
Bayesian Estimation:
• Likelihood and prior combine to form posterior
• MAP estimate is always biased towards the prior (compared to the ML estimate)
15
+
Which grating moves faster?
Application #1: Biases in Motion Perception
16
+
Which grating moves faster?
Application #1: Biases in Motion Perception
17
Explanation from Weiss, Simoncelli & Adelson (2002):
• In the limit of a zero-contrast grating, likelihood becomes infinitely broad ⇒ percept goes to zero-motion.
prior priorlikelihood
likelihoodposterior
0 0
Noisier measurements, so likelihood is broader⇒ posterior has
larger shift toward 0(prior = no motion)
• Claim: explains why people actually speed up when driving in fog!
18
Maximum Likelihood Estimation: 2 worked examples for spike count
encoding models
19
Example 1: linear Poisson neuron
spike count
spike rate
encoding model:
stimulusparameter
important distributions
Gaussian0 1 2 3 4 5 6 7 8 9 10
−3 −2 −1 0 1 2 3
Poisson
0 1 2 3 4 5 6 7 8 9 10
−3 −2 −1 0 1 2 3
others that may come up: Bernoulli, binomial, multinomial, exponential, gamma,
37
= mean
P(y)
20
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
conditional distributionp(y|x)
21
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
conditional distributionp(y|x)
22
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
conditional distributionp(y|x)
23
Maximum Likelihood Estimation:
• given observed data , find that maximizes
all spikecounts
all stimuli
parameters
}single-trial probability
Q: what assumption are we making about the responses?A: conditional independence across trials!
24
Q: when do we call a likelihood?
Maximum Likelihood Estimation:
• given observed data , find that maximizes
all spikecounts
all stimuli
parameters
}single-trial probability
Q: what assumption are we making about the responses?A: conditional independence across trials!
A: when considering it as a function of !
25
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)Maximum Likelihood Estimation:
• given observed data , find that maximizes
p(y|x)
• could in theory do this by turning a knob
26
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)Maximum Likelihood Estimation:
• given observed data , find that maximizes
p(y|x)
• could in theory do this by turning a knob
27
0 20 400
20
40
60
(contrast)
(spi
ke c
ount
)Maximum Likelihood Estimation:
• given observed data , find that maximizes
p(y|x)
• could in theory do this by turning a knob
28
likelihood
Likelihood function: as a function of
Because data are independent:
0 1 2
29
0 1 2
log-likelihood
log
Likelihood function: as a function of
Because data are independent:
0 1 2
likelihood
30
0 1 2
log-likelihood
Do it: solve for
31
•Closed-form solution:
0 1 2
log-likelihood
(let’s notice: this is kind of a weird result!)
32
Example 2: linear Gaussian neuron
spike count
spike rate
encoding model:
stimulusparameter
33
0 20 40
0
20
40
60
(contrast)
(spi
ke c
ount
)
0 20 40 60
All slices have same width
encoding distribution
34
Do it: differentiate, set to zero, and solve for .θ
Log-Likelihood
35