dimension 1 - vanderbilt university

127

Upload: others

Post on 22-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dimension 1 - Vanderbilt University
Page 2: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Categorization

Category A

Category B

Page 3: Dimension 1 - Vanderbilt University

P(Rj | i) =! jE j

!KEKK!R"

Now the evidences Ej are evidence for category membership rather than evidence for identity

Page 4: Dimension 1 - Vanderbilt University

iBBiAA

iAA

EEE

iAP||

|)|(⋅+⋅

⋅=

ββ

β

What are some ways categories could be represented?

What gives rise to the evidence values?

Page 5: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Prototypes

EA|i is proportional to similarity to the prototype of category A

Page 6: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Ideals

EA|i is proportional to similarity to the ideal point of category A

Page 7: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Exemplars

EA|i is proportional to similarity to the experienced exemplars of category A

Page 8: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Decision Boundaries

EA|i is given by which side of the boundary exemplar i is on (boundary can be noise)

Page 9: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Rules

EA|i is given by which side of the rule boundary exemplar i is on (boundary can be noise)

Page 10: Dimension 1 - Vanderbilt University
Page 11: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Exemplars

EA|i is proportional to similarity to the experienced exemplars of category A

Page 12: Dimension 1 - Vanderbilt University

EA|i is proportional to similarity to the experienced exemplars of category A

Page 13: Dimension 1 - Vanderbilt University

EA|i is proportional to similarity to the experienced exemplars of category A

- similarity to closest exemplar (nearest neighbor) - average similarity to exemplars - summed similarity to exemplars

=

=

=

AN

jijiA

AjijiA

sE

sE

1|

|

Page 14: Dimension 1 - Vanderbilt University

Generalized Context Model of Categorization

Page 15: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Exemplars

EA|i is proportional to similarity to the experienced exemplars of category A

i

Page 16: Dimension 1 - Vanderbilt University

Dimension 1

Dim

ensi

on 2

Exemplars

EB|i is proportional to similarity to the experienced exemplars of category B

i

Page 17: Dimension 1 - Vanderbilt University

iBBiAA

iAA

EEE

iAP||

|)|(⋅+⋅

⋅=

ββ

β

Page 18: Dimension 1 - Vanderbilt University

∑=

=AN

jijiA sE

1|

iBBiAA

iAA

EEE

iAP||

|)|(⋅+⋅

⋅=

ββ

β

Page 19: Dimension 1 - Vanderbilt University

∑=

=AN

jijiA sE

1|

iBBiAA

iAA

EEE

iAP||

|)|(⋅+⋅

⋅=

ββ

β

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

Page 20: Dimension 1 - Vanderbilt University
Page 21: Dimension 1 - Vanderbilt University

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

iKRK

K

ijjj s

siRP

⋅=∑∈

β

β)|(

QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?

Page 22: Dimension 1 - Vanderbilt University

iKRK

K

ijjj s

siRP

⋅=∑∈

β

β)|(

QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?

Page 23: Dimension 1 - Vanderbilt University

iKRK

K

ijjj s

siRP

⋅=∑∈

β

β)|(

QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?

Shepard, Hovland, & Jenkins (1961) tested this prediction by first having people learn to identify object with a unique name. They fitted SCM to the observed data (more on this later) to obtain values of biases and sij parameters. Next, they attempted to account for categorization data using those sij parameters using the categorization model.

Page 24: Dimension 1 - Vanderbilt University

Shepard, Hovland, & Jenkins (1961) tested this prediction by first having people learn to identify object with a unique name. They fitted SCM to the observed data (more on this later) to obtain values of biases and sij parameters. Next, they attempted to account for categorization data using those sij parameters using the categorization model.

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

Page 25: Dimension 1 - Vanderbilt University

I IV II

III

V

VI

Shepard, Hovland, & Jenkins (1961)

< < < single- dimension XOR unique

identification

size

shap

e

Page 26: Dimension 1 - Vanderbilt University
Page 27: Dimension 1 - Vanderbilt University

Identification requires fine discriminations between similar stimuli … Categorization requires treating clearly discriminable stimuli as the same thing … So maybe it’s not surprising that the answer is no.

QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?

Page 28: Dimension 1 - Vanderbilt University

Identification requires fine discriminations between similar stimuli … Categorization requires treating clearly discriminable stimuli as the same thing … So maybe it’s not surprising that the answer is no.

QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?

Not so fast …

Page 29: Dimension 1 - Vanderbilt University

Generalized Context Model (GCM)

)exp( pij

dcij dces

pij ⋅−== ⋅−

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

Page 30: Dimension 1 - Vanderbilt University

(i1,i2)

dimension 1 di

men

sion

2

(j1,j2)

(k1,k2)

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

weighted general distance metric wm is weight on dimension m

Page 31: Dimension 1 - Vanderbilt University

(i1,i2)

dimension 1

dim

ensi

on 2

(j1,j2)

(k1,k2)

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

weighted general distance metric wm is weight on dimension m

(i1,i2)

dimension 1

dim

ensi

on 2

(j1,j2) (k1,k2)

w2à0

Page 32: Dimension 1 - Vanderbilt University

(i1,i2)

dimension 1

dim

ensi

on 2

(j1,j2)

(k1,k2)

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

weighted general distance metric wm is weight on dimension m

w1à0 (i1,i2)

dimension 1

dim

ensi

on 2

(j1,j2)

(k1,k2)

Page 33: Dimension 1 - Vanderbilt University

I IV II

III

V

VI

Shepard, Hovland, & Jenkins (1961)

< < < single- dimension XOR unique

identification

size

shap

e

Page 34: Dimension 1 - Vanderbilt University
Page 35: Dimension 1 - Vanderbilt University
Page 36: Dimension 1 - Vanderbilt University

Parameter Fitting Techniques

how do we find the values of model parameters that maximize the fit of a model to observed data?

Page 37: Dimension 1 - Vanderbilt University

Measures of Fit

what do I mean by “fit”?

what are some ways you could measure fit?

Page 38: Dimension 1 - Vanderbilt University

Pearson Correlation

SSE

RMSE

% Variance Accounted For

Likelihood (next week)

Page 39: Dimension 1 - Vanderbilt University

Pearson Correlation

∑ ∑∑

−−

−−=

22,)()(

))((

prdobs

prdobsprdobs

prdobs

prdobsr

µµ

µµ

Page 40: Dimension 1 - Vanderbilt University

Sum of Squared Error (SSE)

∑ −= 2, )( prdobsSSE prdobs

Page 41: Dimension 1 - Vanderbilt University

Root Mean Squared Error (RMSE)

RMSEobs,prd =(obs! prd)2"

N

Page 42: Dimension 1 - Vanderbilt University

% Variance Accounted For

null

modelnull

SSESSESSEVar −

=%

2)(∑ −=i

obsinull obsSSE µ

2)(∑ −=i

iimodel prdobsSSE

Page 43: Dimension 1 - Vanderbilt University

Parameter Fitting Techniques

Minimize SSE Maximize r

Maximize %Var

next week we’ll talk about maximum likelihood after that we’ll talk about more complex measures

Page 44: Dimension 1 - Vanderbilt University

One approach: CALCULUS

Page 45: Dimension 1 - Vanderbilt University

DUMB MODEL (example)

obs prd dij sij sij 0 1.000 1 0.368 2 0.135 3 0.050 4 0.018 5 0.007

ijij ds βα +=

find parameters (α and β) that minimize SSE between obs sij and prd sij

Page 46: Dimension 1 - Vanderbilt University

0

0

)(2

))((2

)(2

)1)((2

))((

)(

2

2

=∂

=∂

−−−=∂

−−−=∂

−−−=∂

−−−=∂

+−=

−=

β

α

βαβ

βαβ

βαα

βαα

βα

SSE

SSE

ddobsSSE

ddobsSSE

dobsSSE

dobsSSE

dobsSSE

prdobsSSE

kkkk

kkkk

kkk

kk

k

kkk

kkk

Page 47: Dimension 1 - Vanderbilt University

0

0

)(2

))((2

)(2

)1)((2

))((

)(

2

2

=∂

=∂

−−−=∂

−−−=∂

−−−=∂

−−−=∂

+−=

−=

β

α

βαβ

βαβ

βαα

βαα

βα

SSE

SSE

ddobsSSE

ddobsSSE

dobsSSE

dobsSSE

dobsSSE

prdobsSSE

kkkk

kkkk

kkk

kk

k

kkk

kkk

Page 48: Dimension 1 - Vanderbilt University

0

0

)(2

))((2

)(2

)1)((2

))((

)(

2

2

=∂

=∂

−−−=∂

−−−=∂

−−−=∂

−−−=∂

+−=

−=

β

α

βαβ

βαβ

βαα

βαα

βα

SSE

SSE

ddobsSSE

ddobsSSE

dobsSSE

dobsSSE

dobsSSE

prdobsSSE

kkkk

kkkk

kkk

kk

k

kkk

kkk

Why does this work?

Page 49: Dimension 1 - Vanderbilt University

One approach: CALCULUS

nearly impossible in many situations intractable mathematical problem

Page 50: Dimension 1 - Vanderbilt University

Other approaches: Search/Optimization Algorithms

require computer power

Page 51: Dimension 1 - Vanderbilt University

but first, a quick aside …

Page 52: Dimension 1 - Vanderbilt University

Illustration of one Common Modeling Technique (1) start with a model (2) set the free parameters to known values (3) generate predictions from the model (4) now treat those predictions as “data” (5) fit the model to the “observed data” (6) can you fit the model to the data (you should)? (7) do you get the same parameters back (depends)?

Page 53: Dimension 1 - Vanderbilt University

Illustration of one Common Modeling Technique (1) start with a model (2) set the free parameters to known values (3) generate predictions from the model (4) now treat those predictions as “data” (5) fit the model to the “observed data” (6) can you fit the model to the data (you should)? (7) do you get the same parameters back (depends)? Why would you do this? (a) test that your model fitting program works right (b) check that the parameters are “identifiable (more later) (c) compare models based on their “flexibility”

image Model A can fit data generated by Model A and by Model B, but Model B can only really fit data generated by Model B then perhaps Model A is too flexible

Page 54: Dimension 1 - Vanderbilt University

Generalized Context Model (GCM)

)exp( pij

dcij dces

pij ⋅−== ⋅−

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

Page 55: Dimension 1 - Vanderbilt University

Categorization Task

unidimensional stimuli

e.g., proportion of white vs. black squares

MATLAB EXAMPLE

Page 56: Dimension 1 - Vanderbilt University

Categorization Task

two-dimensional stimuli

MATLAB EXAMPLE

Page 57: Dimension 1 - Vanderbilt University

how do we find the values of the model parameters that minimize SSE

(or maximize r, or maximize %var)

Page 58: Dimension 1 - Vanderbilt University

GRID SEARCH

Page 59: Dimension 1 - Vanderbilt University

parameter 1

para

met

er 2

calculate SSE at each combination of parameter 1 and parameter 2

Page 60: Dimension 1 - Vanderbilt University

Matlab: See grid search for simple 1-parameter categorization model

Page 61: Dimension 1 - Vanderbilt University

What might be some limitations of a grid search?

Page 62: Dimension 1 - Vanderbilt University

the finer the grid search, the more evaluations you need to run

How fine of a grid search do you run? What if the best-fitting parameters are between the ones you’ve tried?

Page 63: Dimension 1 - Vanderbilt University

How long does it take to run a grid search?

Page 64: Dimension 1 - Vanderbilt University

evaluation time for one set of parameters

x # of evaluations

How long does it take to run a grid search?

Page 65: Dimension 1 - Vanderbilt University

evaluation time for one set of parameters

x # of evaluations

How long does it take to run a grid search?

# steps of parm1 x # steps parm 2 x # step parm 3 x …

e.g., 1000 x 1000 x 1000 x 1000

Page 66: Dimension 1 - Vanderbilt University

1 nanosecond (10-9) per evaluation x

1012 evaluations

103 seconds = 17 min

Page 67: Dimension 1 - Vanderbilt University

100 seconds (102) per evaluation x

1012 evaluations

1014 seconds = 3 million years

Page 68: Dimension 1 - Vanderbilt University

Hill-climbing Algorithms

simple hill-climbing Nedler-Meade Simplex

Hooke and Jeeves

“direct search methods”

Page 69: Dimension 1 - Vanderbilt University

Enrico Fermi and Nicholas Metropolis used one of the first digital computers, the Los Alamos Maniac, to determine which values of certain theoretical parameters (phase shifts) best fit experimental data (scattering cross sections). They varied one theoretical parameter at a time by steps of the same magnitude, and when no such increase or decrease in any one parameter further improved the fit to the experimental data, they halved the step size and repeated the process until the steps were deemed sufficiently small. Their simple procedure was slow but sure, and several of us used it on the Avidac computer at the Argonne National Laboratory for adjusting six theoretical parameters to fit the pion-proton scattering data we had gathered using the University of Chicago synchrocyclotron [7].

W. C. Davidon, Variable Metric Method for Minimization, Tech. Rep. 5990, Argonne National Laboratory, Argonne, IL, 1959.

Page 70: Dimension 1 - Vanderbilt University

Enrico Fermi and Nicholas Metropolis used one of the first digital computers, the Los Alamos Maniac, to determine which values of certain theoretical parameters (phase shifts) best fit experimental data (scattering cross sections). They varied one theoretical parameter at a time by steps of the same magnitude, and when no such increase or decrease in any one parameter further improved the fit to the experimental data, they halved the step size and repeated the process until the steps were deemed sufficiently small. Their simple procedure was slow but sure, and several of us used it on the Avidac computer at the Argonne National Laboratory for adjusting six theoretical parameters to fit the pion-proton scattering data we had gathered using the University of Chicago synchrocyclotron [7].

W. C. Davidon, Variable Metric Method for Minimization, Tech. Rep. 5990, Argonne National Laboratory, Argonne, IL, 1959.

these techniques only emerged 50 years ago (Calculus was invented 400 years ago)

Page 71: Dimension 1 - Vanderbilt University

Simple Hill Climbing

DEMONSTRATE

Page 72: Dimension 1 - Vanderbilt University
Page 73: Dimension 1 - Vanderbilt University

Simple Hill Climbing

how many points do you need to evaluate with each step?

2 parameters

21 3

4

567

8

Page 74: Dimension 1 - Vanderbilt University

Simple Hill Climbing

how many points do you need to evaluate with each step?

N parameters 5 parameters 10 parameters

3N-1 evaluations per step 242 evaluations per step 59049 evaluations per step

this ends up being inefficient because you can need to take 1000’s of steps

Page 75: Dimension 1 - Vanderbilt University

“stupid” algorithm

SimpleHillClimb.m

Page 76: Dimension 1 - Vanderbilt University

More sophisticated algorithms

Kolda, T.G., Lewis, R.M., & Torczon, V. (2003) Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review, 45, 385-482.

Page 77: Dimension 1 - Vanderbilt University

More sophisticated algorithms

DEMONSTRATE

e.g., Hooke and Jeeves a pattern search method

Page 78: Dimension 1 - Vanderbilt University

More sophisticated algorithms

DEMONSTRATE

e.g., Nelder Meade Simplex fminsearch in MATLAB

http://en.wikipedia.org/wiki/Nelder-Mead_method

http://www.scholarpedia.org/article/Nelder-Mead_algorithm

Page 79: Dimension 1 - Vanderbilt University

What is a simplex? 0 dimensions point 1 vertex 1 dimension line 2 vertices 2 dimensions triangle 3 vertices 3 dimensions tetrahedron 4 vertices 4 dimensions pentachoron 5 vertices . . . N dimensions N-simplex N+1 vertices

basically just a generalization of a triangle to N dimensions

Page 80: Dimension 1 - Vanderbilt University

reflect

Page 81: Dimension 1 - Vanderbilt University

expand

Page 82: Dimension 1 - Vanderbilt University

contract

Page 83: Dimension 1 - Vanderbilt University

shrink

Page 84: Dimension 1 - Vanderbilt University
Page 85: Dimension 1 - Vanderbilt University

Matlab examples

Page 86: Dimension 1 - Vanderbilt University

More Matlab examples

Page 87: Dimension 1 - Vanderbilt University

Medin & Schaffer (1978)

Page 88: Dimension 1 - Vanderbilt University

P(A)obs P(A)prd A1 1 1 1 2 .77 .79 A2 1 2 1 2 .78 .83 A3 1 2 1 1 .83 .88 A4 1 1 2 1 .64 .65 A5 2 1 1 1 .61 .64 B1 1 1 2 2 .39 .45 B2 2 1 1 2 .41 .44 B3 2 2 2 1 .21 .23 B4 2 2 2 2 .15 .16 T1 1 2 2 1 .56 .62 T2 1 2 2 2 .41 .47 T3 1 1 1 1 .82 .85 T4 2 2 1 2 .40 .45 T5 2 1 2 1 .32 .34 T6 2 2 1 1 .53 .61 T7 2 1 2 2 .20 .22

Medin & Schaffer ‘78

Page 89: Dimension 1 - Vanderbilt University

P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22

)exp( pij

dcij dces

pij ⋅−== ⋅−

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

w1w2w3w4c

P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22

SSE

Page 90: Dimension 1 - Vanderbilt University

Gradient-Based Techniques when you can calculate

(or approximate) derivatives

Page 91: Dimension 1 - Vanderbilt University

Simulated Annealing (generalization of Metropolis algorithm)

with noisy objective functions and with discrete parameter values

Genetic Search Algorithms with discrete parameter values

Page 92: Dimension 1 - Vanderbilt University

possible project:

explore different parameter search routines to see which

best recovers parameters and does it most quickly

Page 93: Dimension 1 - Vanderbilt University
Page 94: Dimension 1 - Vanderbilt University

Homework Assignment

fit SCM fit GCM

partly using code we used in class today and code from last week’s assignment

I encourage people to work together

conceptually, but each person should do their own programming.

Page 95: Dimension 1 - Vanderbilt University
Page 96: Dimension 1 - Vanderbilt University

Problems of local minima

importance of multiple starting positions

Page 97: Dimension 1 - Vanderbilt University
Page 98: Dimension 1 - Vanderbilt University

Genetic Algorithms and Simulated Annealing may solve these problems

Page 99: Dimension 1 - Vanderbilt University

Simulated Annealing always accept the new candidate parameter vector if it gives a better fit but also accept a new candidate parameter vector with probability P if it gives a WORSE fit e.g., P=exp(-Δfit/T) Δfit is the decrease in fit between current and candidate T is the “temperature”, which decreases according to a schedule as Δfità0 Pà1 T starts at ∞, so P starts at 1 (completely random) T goes to 0, so P goes to 0 (pure hill climbing) depending on cooling schedule, simulated annealing can take orders of magnitude longer than a basic hill climbing algorithm

Page 100: Dimension 1 - Vanderbilt University

Genetic Algorithms multiple candidate parameter vectors are recombined or mutated and only some offspring are retained, akin to natural selection

Page 101: Dimension 1 - Vanderbilt University

Problems of local minima

importance of multiple starting positions

how do you know when you’ve tested

enough starting points?

Page 102: Dimension 1 - Vanderbilt University

what starting points do you pick? (1) based on “experience” with the model (2) based on an initial coarse parameter search

followed by a fine parameter search (3) an initial “random search” first, like

simulated annealing or genetic algorithms

Page 103: Dimension 1 - Vanderbilt University

how many starting points? (1) do many starting points converge on the

same optimal parameter values (2) need to consider the amount of time it

takes to do a search from each starting point (3) if the model fits “everything” you’re okay

but it’s harder to know that a model really blows it

Page 104: Dimension 1 - Vanderbilt University
Page 105: Dimension 1 - Vanderbilt University

How to use the programs

Page 106: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

passing a function as a parameter

Page 107: Dimension 1 - Vanderbilt University

parInit = [1.6 -1.6]; parInc = [0.1 0.1]; parLow = [-4 -4]; parHigh = [ 4 4]; [HOOK_fit,HOOK_pos,HOOK_path] = … hook('mymodel',parInit,parLow,parHigh,

parInc,parInc/10);

passing the name of a function hook uses eval() MATLAB function

Page 108: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

fminsearch()

P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22

)exp( pij

dcij dces

pij ⋅−== ⋅−

( )rM

m

rmmmij jiwd

/1

1)(∑

=

−=

∑∑

==

=

⋅+⋅

=BA

A

N

jijB

N

jijA

N

jijA

ss

siAP

11

1)|(ββ

β

P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22

params fit

change params to try to decrease fit

mymodel()

Page 109: Dimension 1 - Vanderbilt University

Some things to consider when using these tools: •  do you have continuous vs. discrete parameters?

- discrete parameters may require grid search •  need for multiple starting points because of local minima

- did you use enough starting points? - do various starting points converge? - how long does each parameter search take?

•  where to place the starting points to use - based on experience with the model - preliminary exploration of the parameter space

•  has the maximum number of iterations been reached? - MaxIter and MaxFunEvals in MATLAB - should only be reached if a parameter is going to ∞

Page 110: Dimension 1 - Vanderbilt University

Some things to consider when using these tools: •  what is the initial step size in the search

- consider a large step size in step 1 - smaller step size in step 2 - does the algorithm decrease the step size? - what is the step size for each parameter?

•  what is the range of valid values for each parameter - does the search algorithm set min and max values?

Page 111: Dimension 1 - Vanderbilt University

parInit = [1.6 -1.6]; parInc = [0.1 0.1]; parLow = [-4 -4]; parHigh = [ 4 4]; [HOOK_fit,HOOK_pos,HOOK_path] = … hook('mymodel',parInit,parLow,parHigh,

parInc,parInc/10);

Hook and Jeeves lets you specify the step size (parInc) separately for each parameter

Page 112: Dimension 1 - Vanderbilt University

parInit = [1.6 -1.6]; parInc = [0.1 0.1]; parLow = [-4 -4]; parHigh = [ 4 4]; [HOOK_fit,HOOK_pos,HOOK_path] = … hook('mymodel',parInit,parLow,parHigh,

parInc,parInc/10);

Hook and Jeeves lets you specify the step size (parInc) separately for each parameter and lets you specify the min (parLow) and max (parHigh) separately for each parameter

Page 113: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

MATLAB’s fminsearch (Simplex) does not all parameters are allowed to range between -∞ and +∞ ONE SOLUTION: Use fminsearchbnd on MATLAB central file exchange http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=8277&objectType=file

Page 114: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

or program the constraints yourself function fit = mymodel (params) p1 = params(1); p2 = params(2); p3 = params(3); . . .

fit = sse;

Page 115: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

or program the constraints yourself function fit = mymodel (params) p1 = params(1); what if this can only go -∞ and +∞ ? p2 = params(2); and this can go only between 1 and +∞ ? p3 = params(3); and this can only go between 1 and 3 ? . . .

fit = sse;

Page 116: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

or program the constraints yourself function fit = mymodel (params) p1 = params(1); between -∞ and +∞ p2 = 1+params(2)^2; between 1 and +∞ p3 = 1+3*(sin(params(3))+1)/2; between 1 and 4 . . .

fit = sse;

Page 117: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

or program the constraints yourself function fit = mymodel (params) p1 = params(1); between -∞ and +∞ p2 = LOW+params(2)^2; between LOW and +∞ p3 = HIGH-params(3)^2; between -∞ and HIGH p4 = LOW+(HIGH-LOW)*(sin(params(4))+1)/2;

between LOW and HIGH . . .

fit = sse;

Page 118: Dimension 1 - Vanderbilt University

parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);

MATLAB’s fminsearch (Simplex) does not let you set the step size (for any parameters) may use an initial step size that is proportional to the value of the parameter, but this cannot be set by the user it‘s probably okay

there are some other search algorithms that assume the same step size for every parameter (e.g., subplex)

Page 119: Dimension 1 - Vanderbilt University
Page 120: Dimension 1 - Vanderbilt University

Some options the programs give you …

Max Iterations check that Max Iterations is never hit set to a big number sometimes searches can go off to infinity

Page 121: Dimension 1 - Vanderbilt University

Some options the programs give you …

Step Size / Min Step Size rule of thumb : 1/100 of expected parameter value

NOTE: some programs use the same step size for every parameter; you should rescale the parameter value within the model routine

One approach is to do an initial search with a large step size just to find a reasonable set of starting points, then switch to a smaller step

Page 122: Dimension 1 - Vanderbilt University

Some options the programs give you …

Step Size / Min Step Size what if step size is way to big?

Page 123: Dimension 1 - Vanderbilt University

Some options the programs give you …

Step Size / Min Step Size what if step size is way too big?

starting point

minimum?

Page 124: Dimension 1 - Vanderbilt University

Some options the programs give you …

Step Size / Min Step Size what if step size is way too big?

noisy objective functions – with Monte Carlo simulations

Page 125: Dimension 1 - Vanderbilt University
Page 126: Dimension 1 - Vanderbilt University

More sophisticated algorithms

combining hill climbing with grid search when some params are continuous and some are discrete

Page 127: Dimension 1 - Vanderbilt University