dimension 1 - vanderbilt university
TRANSCRIPT
Dimension 1
Dim
ensi
on 2
Categorization
Category A
Category B
P(Rj | i) =! jE j
!KEKK!R"
Now the evidences Ej are evidence for category membership rather than evidence for identity
iBBiAA
iAA
EEE
iAP||
|)|(⋅+⋅
⋅=
ββ
β
What are some ways categories could be represented?
What gives rise to the evidence values?
Dimension 1
Dim
ensi
on 2
Prototypes
EA|i is proportional to similarity to the prototype of category A
Dimension 1
Dim
ensi
on 2
Ideals
EA|i is proportional to similarity to the ideal point of category A
Dimension 1
Dim
ensi
on 2
Exemplars
EA|i is proportional to similarity to the experienced exemplars of category A
Dimension 1
Dim
ensi
on 2
Decision Boundaries
EA|i is given by which side of the boundary exemplar i is on (boundary can be noise)
Dimension 1
Dim
ensi
on 2
Rules
EA|i is given by which side of the rule boundary exemplar i is on (boundary can be noise)
Dimension 1
Dim
ensi
on 2
Exemplars
EA|i is proportional to similarity to the experienced exemplars of category A
EA|i is proportional to similarity to the experienced exemplars of category A
EA|i is proportional to similarity to the experienced exemplars of category A
- similarity to closest exemplar (nearest neighbor) - average similarity to exemplars - summed similarity to exemplars
∑
∑
=
∈
=
=
AN
jijiA
AjijiA
sE
sE
1|
|
Generalized Context Model of Categorization
Dimension 1
Dim
ensi
on 2
Exemplars
EA|i is proportional to similarity to the experienced exemplars of category A
i
Dimension 1
Dim
ensi
on 2
Exemplars
EB|i is proportional to similarity to the experienced exemplars of category B
i
iBBiAA
iAA
EEE
iAP||
|)|(⋅+⋅
⋅=
ββ
β
∑=
=AN
jijiA sE
1|
iBBiAA
iAA
EEE
iAP||
|)|(⋅+⋅
⋅=
ββ
β
∑=
=AN
jijiA sE
1|
iBBiAA
iAA
EEE
iAP||
|)|(⋅+⋅
⋅=
ββ
β
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
iKRK
K
ijjj s
siRP
⋅
⋅=∑∈
β
β)|(
QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?
iKRK
K
ijjj s
siRP
⋅
⋅=∑∈
β
β)|(
QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?
iKRK
K
ijjj s
siRP
⋅
⋅=∑∈
β
β)|(
QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?
Shepard, Hovland, & Jenkins (1961) tested this prediction by first having people learn to identify object with a unique name. They fitted SCM to the observed data (more on this later) to obtain values of biases and sij parameters. Next, they attempted to account for categorization data using those sij parameters using the categorization model.
Shepard, Hovland, & Jenkins (1961) tested this prediction by first having people learn to identify object with a unique name. They fitted SCM to the observed data (more on this later) to obtain values of biases and sij parameters. Next, they attempted to account for categorization data using those sij parameters using the categorization model.
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
I IV II
III
V
VI
Shepard, Hovland, & Jenkins (1961)
< < < single- dimension XOR unique
identification
size
shap
e
Identification requires fine discriminations between similar stimuli … Categorization requires treating clearly discriminable stimuli as the same thing … So maybe it’s not surprising that the answer is no.
QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?
Identification requires fine discriminations between similar stimuli … Categorization requires treating clearly discriminable stimuli as the same thing … So maybe it’s not surprising that the answer is no.
QUESTION: Can the same similarities that explain identification confusions also explain categorization confusions?
Not so fast …
Generalized Context Model (GCM)
)exp( pij
dcij dces
pij ⋅−== ⋅−
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
(i1,i2)
dimension 1 di
men
sion
2
(j1,j2)
(k1,k2)
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
weighted general distance metric wm is weight on dimension m
(i1,i2)
dimension 1
dim
ensi
on 2
(j1,j2)
(k1,k2)
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
weighted general distance metric wm is weight on dimension m
(i1,i2)
dimension 1
dim
ensi
on 2
(j1,j2) (k1,k2)
w2à0
(i1,i2)
dimension 1
dim
ensi
on 2
(j1,j2)
(k1,k2)
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
weighted general distance metric wm is weight on dimension m
w1à0 (i1,i2)
dimension 1
dim
ensi
on 2
(j1,j2)
(k1,k2)
I IV II
III
V
VI
Shepard, Hovland, & Jenkins (1961)
< < < single- dimension XOR unique
identification
size
shap
e
Parameter Fitting Techniques
how do we find the values of model parameters that maximize the fit of a model to observed data?
Measures of Fit
what do I mean by “fit”?
what are some ways you could measure fit?
Pearson Correlation
SSE
RMSE
% Variance Accounted For
Likelihood (next week)
Pearson Correlation
∑ ∑∑
−−
−−=
22,)()(
))((
prdobs
prdobsprdobs
prdobs
prdobsr
µµ
µµ
Sum of Squared Error (SSE)
∑ −= 2, )( prdobsSSE prdobs
Root Mean Squared Error (RMSE)
RMSEobs,prd =(obs! prd)2"
N
% Variance Accounted For
null
modelnull
SSESSESSEVar −
=%
2)(∑ −=i
obsinull obsSSE µ
2)(∑ −=i
iimodel prdobsSSE
Parameter Fitting Techniques
Minimize SSE Maximize r
Maximize %Var
next week we’ll talk about maximum likelihood after that we’ll talk about more complex measures
One approach: CALCULUS
DUMB MODEL (example)
obs prd dij sij sij 0 1.000 1 0.368 2 0.135 3 0.050 4 0.018 5 0.007
ijij ds βα +=
find parameters (α and β) that minimize SSE between obs sij and prd sij
0
0
)(2
))((2
)(2
)1)((2
))((
)(
2
2
=∂
∂
=∂
∂
−−−=∂
∂
−−−=∂
∂
−−−=∂
∂
−−−=∂
∂
+−=
−=
∑
∑
∑
∑
∑
∑
β
α
βαβ
βαβ
βαα
βαα
βα
SSE
SSE
ddobsSSE
ddobsSSE
dobsSSE
dobsSSE
dobsSSE
prdobsSSE
kkkk
kkkk
kkk
kk
k
kkk
kkk
0
0
)(2
))((2
)(2
)1)((2
))((
)(
2
2
=∂
∂
=∂
∂
−−−=∂
∂
−−−=∂
∂
−−−=∂
∂
−−−=∂
∂
+−=
−=
∑
∑
∑
∑
∑
∑
β
α
βαβ
βαβ
βαα
βαα
βα
SSE
SSE
ddobsSSE
ddobsSSE
dobsSSE
dobsSSE
dobsSSE
prdobsSSE
kkkk
kkkk
kkk
kk
k
kkk
kkk
0
0
)(2
))((2
)(2
)1)((2
))((
)(
2
2
=∂
∂
=∂
∂
−−−=∂
∂
−−−=∂
∂
−−−=∂
∂
−−−=∂
∂
+−=
−=
∑
∑
∑
∑
∑
∑
β
α
βαβ
βαβ
βαα
βαα
βα
SSE
SSE
ddobsSSE
ddobsSSE
dobsSSE
dobsSSE
dobsSSE
prdobsSSE
kkkk
kkkk
kkk
kk
k
kkk
kkk
Why does this work?
One approach: CALCULUS
nearly impossible in many situations intractable mathematical problem
Other approaches: Search/Optimization Algorithms
require computer power
but first, a quick aside …
Illustration of one Common Modeling Technique (1) start with a model (2) set the free parameters to known values (3) generate predictions from the model (4) now treat those predictions as “data” (5) fit the model to the “observed data” (6) can you fit the model to the data (you should)? (7) do you get the same parameters back (depends)?
Illustration of one Common Modeling Technique (1) start with a model (2) set the free parameters to known values (3) generate predictions from the model (4) now treat those predictions as “data” (5) fit the model to the “observed data” (6) can you fit the model to the data (you should)? (7) do you get the same parameters back (depends)? Why would you do this? (a) test that your model fitting program works right (b) check that the parameters are “identifiable (more later) (c) compare models based on their “flexibility”
image Model A can fit data generated by Model A and by Model B, but Model B can only really fit data generated by Model B then perhaps Model A is too flexible
Generalized Context Model (GCM)
)exp( pij
dcij dces
pij ⋅−== ⋅−
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
Categorization Task
unidimensional stimuli
e.g., proportion of white vs. black squares
MATLAB EXAMPLE
Categorization Task
two-dimensional stimuli
MATLAB EXAMPLE
how do we find the values of the model parameters that minimize SSE
(or maximize r, or maximize %var)
GRID SEARCH
parameter 1
para
met
er 2
calculate SSE at each combination of parameter 1 and parameter 2
Matlab: See grid search for simple 1-parameter categorization model
What might be some limitations of a grid search?
the finer the grid search, the more evaluations you need to run
How fine of a grid search do you run? What if the best-fitting parameters are between the ones you’ve tried?
How long does it take to run a grid search?
evaluation time for one set of parameters
x # of evaluations
How long does it take to run a grid search?
evaluation time for one set of parameters
x # of evaluations
How long does it take to run a grid search?
# steps of parm1 x # steps parm 2 x # step parm 3 x …
e.g., 1000 x 1000 x 1000 x 1000
1 nanosecond (10-9) per evaluation x
1012 evaluations
103 seconds = 17 min
100 seconds (102) per evaluation x
1012 evaluations
1014 seconds = 3 million years
Hill-climbing Algorithms
simple hill-climbing Nedler-Meade Simplex
Hooke and Jeeves
“direct search methods”
Enrico Fermi and Nicholas Metropolis used one of the first digital computers, the Los Alamos Maniac, to determine which values of certain theoretical parameters (phase shifts) best fit experimental data (scattering cross sections). They varied one theoretical parameter at a time by steps of the same magnitude, and when no such increase or decrease in any one parameter further improved the fit to the experimental data, they halved the step size and repeated the process until the steps were deemed sufficiently small. Their simple procedure was slow but sure, and several of us used it on the Avidac computer at the Argonne National Laboratory for adjusting six theoretical parameters to fit the pion-proton scattering data we had gathered using the University of Chicago synchrocyclotron [7].
W. C. Davidon, Variable Metric Method for Minimization, Tech. Rep. 5990, Argonne National Laboratory, Argonne, IL, 1959.
Enrico Fermi and Nicholas Metropolis used one of the first digital computers, the Los Alamos Maniac, to determine which values of certain theoretical parameters (phase shifts) best fit experimental data (scattering cross sections). They varied one theoretical parameter at a time by steps of the same magnitude, and when no such increase or decrease in any one parameter further improved the fit to the experimental data, they halved the step size and repeated the process until the steps were deemed sufficiently small. Their simple procedure was slow but sure, and several of us used it on the Avidac computer at the Argonne National Laboratory for adjusting six theoretical parameters to fit the pion-proton scattering data we had gathered using the University of Chicago synchrocyclotron [7].
W. C. Davidon, Variable Metric Method for Minimization, Tech. Rep. 5990, Argonne National Laboratory, Argonne, IL, 1959.
these techniques only emerged 50 years ago (Calculus was invented 400 years ago)
Simple Hill Climbing
DEMONSTRATE
Simple Hill Climbing
how many points do you need to evaluate with each step?
2 parameters
21 3
4
567
8
Simple Hill Climbing
how many points do you need to evaluate with each step?
N parameters 5 parameters 10 parameters
3N-1 evaluations per step 242 evaluations per step 59049 evaluations per step
this ends up being inefficient because you can need to take 1000’s of steps
“stupid” algorithm
SimpleHillClimb.m
More sophisticated algorithms
Kolda, T.G., Lewis, R.M., & Torczon, V. (2003) Optimization by direct search: New perspectives on some classical and modern methods. SIAM Review, 45, 385-482.
More sophisticated algorithms
DEMONSTRATE
e.g., Hooke and Jeeves a pattern search method
More sophisticated algorithms
DEMONSTRATE
e.g., Nelder Meade Simplex fminsearch in MATLAB
http://en.wikipedia.org/wiki/Nelder-Mead_method
http://www.scholarpedia.org/article/Nelder-Mead_algorithm
What is a simplex? 0 dimensions point 1 vertex 1 dimension line 2 vertices 2 dimensions triangle 3 vertices 3 dimensions tetrahedron 4 vertices 4 dimensions pentachoron 5 vertices . . . N dimensions N-simplex N+1 vertices
basically just a generalization of a triangle to N dimensions
reflect
expand
contract
shrink
Matlab examples
More Matlab examples
Medin & Schaffer (1978)
P(A)obs P(A)prd A1 1 1 1 2 .77 .79 A2 1 2 1 2 .78 .83 A3 1 2 1 1 .83 .88 A4 1 1 2 1 .64 .65 A5 2 1 1 1 .61 .64 B1 1 1 2 2 .39 .45 B2 2 1 1 2 .41 .44 B3 2 2 2 1 .21 .23 B4 2 2 2 2 .15 .16 T1 1 2 2 1 .56 .62 T2 1 2 2 2 .41 .47 T3 1 1 1 1 .82 .85 T4 2 2 1 2 .40 .45 T5 2 1 2 1 .32 .34 T6 2 2 1 1 .53 .61 T7 2 1 2 2 .20 .22
Medin & Schaffer ‘78
P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22
)exp( pij
dcij dces
pij ⋅−== ⋅−
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
w1w2w3w4c
P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22
SSE
Gradient-Based Techniques when you can calculate
(or approximate) derivatives
Simulated Annealing (generalization of Metropolis algorithm)
with noisy objective functions and with discrete parameter values
Genetic Search Algorithms with discrete parameter values
possible project:
explore different parameter search routines to see which
best recovers parameters and does it most quickly
Homework Assignment
fit SCM fit GCM
partly using code we used in class today and code from last week’s assignment
I encourage people to work together
conceptually, but each person should do their own programming.
Problems of local minima
importance of multiple starting positions
Genetic Algorithms and Simulated Annealing may solve these problems
Simulated Annealing always accept the new candidate parameter vector if it gives a better fit but also accept a new candidate parameter vector with probability P if it gives a WORSE fit e.g., P=exp(-Δfit/T) Δfit is the decrease in fit between current and candidate T is the “temperature”, which decreases according to a schedule as Δfità0 Pà1 T starts at ∞, so P starts at 1 (completely random) T goes to 0, so P goes to 0 (pure hill climbing) depending on cooling schedule, simulated annealing can take orders of magnitude longer than a basic hill climbing algorithm
Genetic Algorithms multiple candidate parameter vectors are recombined or mutated and only some offspring are retained, akin to natural selection
Problems of local minima
importance of multiple starting positions
how do you know when you’ve tested
enough starting points?
what starting points do you pick? (1) based on “experience” with the model (2) based on an initial coarse parameter search
followed by a fine parameter search (3) an initial “random search” first, like
simulated annealing or genetic algorithms
how many starting points? (1) do many starting points converge on the
same optimal parameter values (2) need to consider the amount of time it
takes to do a search from each starting point (3) if the model fits “everything” you’re okay
but it’s harder to know that a model really blows it
How to use the programs
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
passing a function as a parameter
parInit = [1.6 -1.6]; parInc = [0.1 0.1]; parLow = [-4 -4]; parHigh = [ 4 4]; [HOOK_fit,HOOK_pos,HOOK_path] = … hook('mymodel',parInit,parLow,parHigh,
parInc,parInc/10);
passing the name of a function hook uses eval() MATLAB function
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
fminsearch()
P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22
)exp( pij
dcij dces
pij ⋅−== ⋅−
( )rM
m
rmmmij jiwd
/1
1)(∑
=
−=
∑∑
∑
==
=
⋅+⋅
⋅
=BA
A
N
jijB
N
jijA
N
jijA
ss
siAP
11
1)|(ββ
β
P(A)obs P(A)prdA1 1 1 1 2 .77 .79A2 1 2 1 2 .78 .83A3 1 2 1 1 .83 .88A4 1 1 2 1 .64 .65A5 2 1 1 1 .61 .64B1 1 1 2 2 .39 .45B2 2 1 1 2 .41 .44B3 2 2 2 1 .21 .23B4 2 2 2 2 .15 .16T1 1 2 2 1 .56 .62T2 1 2 2 2 .41 .47T3 1 1 1 1 .82 .85T4 2 2 1 2 .40 .45T5 2 1 2 1 .32 .34T6 2 2 1 1 .53 .61T7 2 1 2 2 .20 .22
params fit
change params to try to decrease fit
mymodel()
Some things to consider when using these tools: • do you have continuous vs. discrete parameters?
- discrete parameters may require grid search • need for multiple starting points because of local minima
- did you use enough starting points? - do various starting points converge? - how long does each parameter search take?
• where to place the starting points to use - based on experience with the model - preliminary exploration of the parameter space
• has the maximum number of iterations been reached? - MaxIter and MaxFunEvals in MATLAB - should only be reached if a parameter is going to ∞
Some things to consider when using these tools: • what is the initial step size in the search
- consider a large step size in step 1 - smaller step size in step 2 - does the algorithm decrease the step size? - what is the step size for each parameter?
• what is the range of valid values for each parameter - does the search algorithm set min and max values?
parInit = [1.6 -1.6]; parInc = [0.1 0.1]; parLow = [-4 -4]; parHigh = [ 4 4]; [HOOK_fit,HOOK_pos,HOOK_path] = … hook('mymodel',parInit,parLow,parHigh,
parInc,parInc/10);
Hook and Jeeves lets you specify the step size (parInc) separately for each parameter
parInit = [1.6 -1.6]; parInc = [0.1 0.1]; parLow = [-4 -4]; parHigh = [ 4 4]; [HOOK_fit,HOOK_pos,HOOK_path] = … hook('mymodel',parInit,parLow,parHigh,
parInc,parInc/10);
Hook and Jeeves lets you specify the step size (parInc) separately for each parameter and lets you specify the min (parLow) and max (parHigh) separately for each parameter
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
MATLAB’s fminsearch (Simplex) does not all parameters are allowed to range between -∞ and +∞ ONE SOLUTION: Use fminsearchbnd on MATLAB central file exchange http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=8277&objectType=file
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
or program the constraints yourself function fit = mymodel (params) p1 = params(1); p2 = params(2); p3 = params(3); . . .
fit = sse;
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
or program the constraints yourself function fit = mymodel (params) p1 = params(1); what if this can only go -∞ and +∞ ? p2 = params(2); and this can go only between 1 and +∞ ? p3 = params(3); and this can only go between 1 and 3 ? . . .
fit = sse;
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
or program the constraints yourself function fit = mymodel (params) p1 = params(1); between -∞ and +∞ p2 = 1+params(2)^2; between 1 and +∞ p3 = 1+3*(sin(params(3))+1)/2; between 1 and 4 . . .
fit = sse;
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
or program the constraints yourself function fit = mymodel (params) p1 = params(1); between -∞ and +∞ p2 = LOW+params(2)^2; between LOW and +∞ p3 = HIGH-params(3)^2; between -∞ and HIGH p4 = LOW+(HIGH-LOW)*(sin(params(4))+1)/2;
between LOW and HIGH . . .
fit = sse;
parInit = [3 2]; options = optimset('display', 'iter', 'MaxIter', 500); [bestx,fval] = fminsearch(@mymodel,parInit,options);
MATLAB’s fminsearch (Simplex) does not let you set the step size (for any parameters) may use an initial step size that is proportional to the value of the parameter, but this cannot be set by the user it‘s probably okay
there are some other search algorithms that assume the same step size for every parameter (e.g., subplex)
Some options the programs give you …
Max Iterations check that Max Iterations is never hit set to a big number sometimes searches can go off to infinity
Some options the programs give you …
Step Size / Min Step Size rule of thumb : 1/100 of expected parameter value
NOTE: some programs use the same step size for every parameter; you should rescale the parameter value within the model routine
One approach is to do an initial search with a large step size just to find a reasonable set of starting points, then switch to a smaller step
Some options the programs give you …
Step Size / Min Step Size what if step size is way to big?
Some options the programs give you …
Step Size / Min Step Size what if step size is way too big?
starting point
minimum?
Some options the programs give you …
Step Size / Min Step Size what if step size is way too big?
noisy objective functions – with Monte Carlo simulations
More sophisticated algorithms
combining hill climbing with grid search when some params are continuous and some are discrete