final project...final project meet with me over the next couple of weeks to discuss possibilities...
Post on 13-Aug-2020
0 Views
Preview:
TRANSCRIPT
Final Project meet with me over the next couple of
weeks to discuss possibilities
READ FOR NEXT WEEK: Zhang, W., & Luck, S.J. (2008). Discrete fixed-resolution representations in visual working memory. Nature, 453, 233-235. Along with the Lewandowsky & Farrell chapters
Using Models to Test Hypotheses
Prototype Model Exemplar Model
Mixture Model
Prototype Model Exemplar Model
Mixture Model
compare nonnested models
compare nested models
compare nested models
Prototype Model Exemplar Model
Mixture Model
compare nonnested models
compare nested models
compare nested models
Saturated Model
Prototype Model Exemplar Model
Mixture Model
compare nonnested models
compare nested models
compare nested models
Saturated Model
Null Model
Some issues regarding fit measures
• speed of computation • consistency
– allow you to recover the true parameters? – all we’ve discussed are consistent
• efficiency – minimum variance of parameter estimates? – SSE and %Var are inefficient – lnL, χ2, weighted SSE are efficient
• permit statistical tests
Prototype Model Exemplar Model
Mixture Model
compare nonnested models
compare nested models
compare nested models
Saturated Model
Null Model
Exemplar Model
Mixture Model
compare nested models
Exemplar Model
Mixture Model
compare nested models
Null Model
Saturated Model this has to be as good as any model can do …
Exemplar Model
Mixture Model
compare nested models
Null Model
Saturated Model this has to be as good as any model can do …
One parameter equal to each free data point
Exemplar Model
Mixture Model
compare nested models
Null Model
Saturated Model this has to be as good as any model can do …
this is not as bad as any model could do, but it’s a good floor
Exemplar Model
Mixture Model
compare nested models
Null Model
Saturated Model this has to be as good as any model can do …
this is not as bad as any model could do, but it’s a good floor
clearly, this model needs to fit better than the Null Model
Exemplar Model
Mixture Model
compare nested models
Null Model
Saturated Model this has to be as good as any model can do …
this is not as bad as any model could do, but it’s a good floor
clearly, this model needs to fit better than the Null Model
logically, this model MUST fit better than the exemplar model … does the exemplar model fit significantly worse?
Does a model have to account for all the variability in the observed data?
Does a model have to account for all the variability in the observed data? Does it need to fit as well as the saturated model? Well, that’s perhaps the ultimate goal (and having no free parameters to boot). But accounting for some of the variability is what it means to have a theory. You explain some of the variability. Better models explain MORE of the variability. And some of the variability could simply be noise (of various sorts). see Dell, G.S., Schwartz, M.F., Martin, N., Saffran, E.M.,
Gagnon, D.A. (2000). The role of computational models in neuropsychological investigations of language: Reply to Ruml and Caramazza (2000). Psychological Review, 107, 635-645.
fit the exemplar model
fit the mixed model
does the exemplar model fit significantly worse than the exemplar model?
tests a hypothesis of whether people need do abstract a prototype on top of remembering specific exemplars
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
We will MAXIMIZE likelihood “Maximum Likelihood Parameter Estimation”
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
FLln
RLln
ln L of the full model (e.g., mixed model)
ln L of the restricted model (e.g., exemplar model)
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
FLln
RLln
RF LL lnln −
ln L of the full model (e.g., mixed model)
ln L of the restricted model (e.g., exemplar model)
difference between fit to full versus restricted model
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
FLln
RLln
RF LL lnln −
]ln[ln2 RF LL −×
ln L of the full model (e.g., mixed model)
ln L of the restricted model (e.g., exemplar model)
difference between fit to full versus restricted model
need to multiply by 2 (because God said so)
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
FLln
RLln
RF LL lnln −
]ln[ln2 RF LL −×
]ln[ln22RF LLG −×=
ln L of the full model (e.g., mixed model)
ln L of the restricted model (e.g., exemplar model)
difference between fit to full versus restricted model
need to multiply by 2 (because God said so)
log Likelihood ratio statistic distributed as χ2 with df = NparmsF - NparmsR
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
]ln[ln22RF LLG −×= log Likelihood ratio statistic
distributed as χ2 with df = NparmsF - NparmsR
If that statistic exceeds the critical χ2 with the specified df (at selected alpha level), then the restricted model fits significantly worse than the general model
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
]ln[ln22RF LLG −×= log Likelihood ratio statistic
distributed as χ2 with df = NparmsF - NparmsR
EXAMPLE
12.263ln −=FL12.293ln −=RL
19545240 =−=df
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
]ln[ln22RF LLG −×= log Likelihood ratio statistic
distributed as χ2 with df = NparmsF - NparmsR
EXAMPLE
12.263ln −=FL12.293ln −=RL
40.61]82.29312.263[22 =−−−×=G
19545240 =−=df
6.228)05.,195(2 === αχ dfC
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
]ln[ln22RF LLG −×= log Likelihood ratio statistic
distributed as χ2 with df = NparmsF - NparmsR
EXAMPLE
12.263ln −=FL12.293ln −=RL
40.61]82.29312.263[22 =−−−×=G
19545240 =−=df
6.228)05.,195(2 === αχ dfC
the restricted model DOES NOT fit significantly worse …
Likelihood (L) log Likelihood (ln L)
instead of SSE or r2
]ln[ln22RF LLG −×= log Likelihood ratio statistic
distributed as χ2 with df = NparmsF - NparmsR
Likelihood Ratio Testing
⎟⎟⎠
⎞⎜⎜⎝
⎛×=
R
F
LLG ln22
Likelihood (L) log Likelihood (ln L)
What is Likelihood? I’ll start with just giving you the equations … more later (so you can do the homework)
ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response.
ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response.
NOTE: I will be giving you a maximum likelihood equation that can be used with this kind of choice data. Maximum likelihood methods are extremely general (and can get rather complicated). Maximum Likelihood techniques are not only used for evaluating computational models, but are also used widely in statistics.
ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response. In order to use the following form of maximum likelihood statistic, we need to have data in the form of response frequencies, not response probabilities (that limitation is only true for this example).
ln L We will stay concrete for now. There are n stimuli and m responses. For identification, each stimulus has a unique response. For categorization, groups of stimuli can have the same (category) response. In order to use the following form of maximum likelihood statistic, we need to have data in the form of response frequencies, not response probabilities.
ijf
iN)|( ij SRP
observed frequency with which stimulus i is given response j
number of presentations of stimulus i
predicted probability with which stimulus i is given response j
ln L
ijf
iN)|( ij SRP
observed frequency with which stimulus i is given response j
number of presentations of stimulus i
predicted probability with which stimulus i is given response j
∑ ∑∑ ∑∑+−=i i j i j
ijijiji SRPffNL )|(ln!ln!lnln
∏=
=N
iiN
1
! ∏=
=N
iiN
1
ln!ln ∑=
=N
iiN
1ln!ln
ln L
ijf
iN)|( ij SRP
observed frequency with which stimulus i is given response j
number of presentations of stimulus i
predicted probability with which stimulus i is given response j
∑ ∑∑ ∑∑+−=i i j i j
ijijiji SRPffNL )|(ln!ln!lnln
imii fim
fi
fi
i imii
i SRPSRPSRPfff
NL )|()|()|( 21
2121
⋅⋅⋅⎟⎟⎠
⎞⎜⎜⎝
⎛
⋅⋅⋅=∏
∑ ∑∑ ∑∑+−=i i j i j
ijijiji SRPffNL )|(ln!ln!lnln
imii fim
fi
fi
i imii
i SRPSRPSRPfff
NL )|()|()|( 21
2121
⋅⋅⋅⎟⎟⎠
⎞⎜⎜⎝
⎛
⋅⋅⋅=∏
Why are we taking logs?
Why are we taking logs?
)()()log()log()log(
)log()log()/log()log()log()log(
))(log(
)(
xfexfeapa
babababa
xf
xf
p
=
=
×=
−=
+=×
Why are we taking logs?
log(f(x)) is a “monotonic function” so max[f(x)] is the same as max[log(f(x))]
ln L
ijf
iN)|( ij SRP
observed frequency with which stimulus i is given response j
number of presentations of stimulus i
predicted probability with which stimulus i is given response j
∑ ∑∑ ∑∑+−=i i j i j
ijijiji SRPffNL )|(ln!ln!lnln
You want to MAXIMIZE the lnL … which is the same as MINIMIZING the –lnL …
First, let’s talk about probability
Prob(data|parm) probability of some data given the parameters of some model
knowing parameters à predict some outcome
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4
obviously, the probability of getting a head is .6
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4
what is the probability of getting two heads on two flips?
coin flips are independent
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.4
what is the probability of getting two heads on two flips?
coin flips are independent
p(event A AND event B) = p(event A) x p(event B) if A and B are INDEPENDENT
p(head) x p(head) .6 x .6 .36
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6
what is the probability of getting one head and one tail on two flips?
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6
what is the probability of getting one head and one tail on two flips?
p(head) x p(tail) .6 x .4 .24
+ p(tail) x p(head) .4 x .6 .24
.48
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6
what is the probability of getting x heads on N flips?
imagine an unfair coin that gives heads with probability p=.6 and tails with probability q=1-p=.6
what is the probability of getting x heads on N flips?
Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
Binomial Distribution f(x; p) give the probability of observing x “successes” for a Bernoulli process with probability p
Matlab Example
coin flips have only two outcomes (heads or tails) that is, a flip of the coin can have only two mutually exclusive events … what about a situation with more than just two possible outcomes? what if an event has three possible outcomes? or four possible outcomes?
Multinomial Distribution probability of outcome 1 is p1 probability of outcome 2 is p2 probability of outcome 3 is p3 we want to know what the probability of observing x1 events with outcome 1, x2 events with outcome 2, and x3 events with outcome 3
Prob(x1, x2, x3 | p1p2p3) =N
x1 x2 x3
!
"##
$
%&& p1
x1p2x2 p3
x3 =N!
x1!x2 !x3!p1x1p2
x2 p3x3
Multinomial Distribution probability of outcome 1 is p1 probability of outcome 2 is p2 probability of outcome 3 is p3 we want to know what the probability of observing x1 events with outcome 1, x2 events with outcome 2, and x3 events with outcome 3
Prob(x1, x2,..., xm | p1p2,..., pm ) =N
x1 x2 ... xm
!
"##
$
%&& p1
x1p2x2 ' ' ' pm
xm =N!
x1!x2 !' ' ' xm!p1x1p2
x2 ' ' ' pmxm
Matlab Example
What is Likelihood?
prob(data|parm) probability of some data given the known parameters of some model
L(data|parm) likelihood of known data given particular candidate parameters of the model
know parameters à predict some outcome
observing some data à estimate some parameters that maximize the likelihood of the data
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p?
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
now, x (and N) are fixed we want to find the value of p that maximizes the likelihood L of the data
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= 0+ x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
imagine we flip a coin 10 times and get 4 heads (N=10, x=4) what is the maximum likelihood estimate of p? USING CALCULUS
L(x | p) = Prob(x | p) = Nx
!
"#
$
%& px (1' p)N'x = N!
x!(N ' x)!px (1' p)N'x
lnL = log Nx
!
"#
$
%&
!
"##
$
%&&+ x ln p+ (N ' x)ln(1' p)
d lnLdp
= 0+ x 1p
!
"#
$
%&+ (N ' x) 1
1' p!
"#
$
%&('1) = 0
x(1' p)' (N ' x)p = 0x ' xp' Np+ xp = 0
p = xN
top related