optimal control approaches to language · 2017. 6. 14. · howes, lewis & vera (2009,...

Optimal Control Approaches to LanguageHow the Architecture “Shows Through”

Richard L. Lewis Michael Shvartsman Satinder Singh

Psychology, Computer ScienceUniversity of Michigan

Soar Workshop 32(!)21 June 2012

Lewis (University of Michigan) Optimal Control Approaches 21 June 2012

The idea: Applying Newell & Simon’s scissors to language(plus utility maximization)

All aspects of linguistic processing and

behavior—from parsing strategies to

production strategies to control of short

and long-term memory to eye-movement

control—may be understood as the solution

to the constrained optimization problem

posed by the external task environment,

task structure, and internal processing

structure/constraints (e.g. representation

noise, knowledge).

arg max

organism structure

(constraints) task goals (

payoff)

envir

onmen

t st

ructu

re

behavior (optimal policies)

Howes, Lewis & Vera (2009,

Psychological Review)

We’ll pursue this via the application of state-of-the-art theoretical ideas

. . . from the 1940-60s: optimal control and optimal state estimation.Lewis (University of Michigan) Optimal Control Approaches 21 June 2012

Modeling eye-movements

Overview

1 What determines the nature of eye-movements in linguistictasks?

The task and modelPredictions vs. human behaviorHow architecture shapes adaptation

2 Conclusions and looking ahead


Modeling eye-movements Task and model

The List Lexical Decision Task

TASK ENVIRONMENT

MODEL

Control policies optimizing reward

predicted behavior under ACCURACY payoff

predicted behavior under BALANCED payoff

predicted behavior under SPEED payoff

behavior under ACCURACY payoff

behavior under BALANCED payoff

behavior under SPEED payoff

ACCURACY emphasis

payoff

BALANCEDpayoff

Varying architectural constraints

HUMANSin eye-tracking experiments

SPEED emphasis

payoff

2

3

4

5

6

lead hilt robe helm guru east1

varying architectural constraints

Version of this task first used by Schvanaveldt & Meyer (1973);Meyer & Schvanavedlt (1972)



Explicit payoffs: Motivating with cash

4 LEWIS, SHVARTSMAN & SINGH

Accuracy Balanced Speed

Incorrect penalty -150 -50 -25Speed bonus (persecond under 5s)

8 6.7 5.7

Table 1Quantitative payo↵s given to both model and hu-man participants. These payo↵ points translatedinto cash bonuses for the human participants.

Three distinct payo↵s

We evaluated both model and human participantsaccording to three di↵erent payo↵ functions (spec-ified in Table 1. The payo↵s were designed to im-pose di↵erent speed-accuracy tradeo↵s for a givenlevel of success, and were all defined in terms of abonus for speed and penalty for incorrect responses.The bonus was continuous at the millisecond level,starting at zero points for responses longer than 5sand rising by a di↵erent number of points per sec-ond for each payo↵.

An optimal control model

Main theoretical assumptions

We can now briefly state our three main theoret-ical assumptions:

1. Saccadic control is a “rise-to-threshold” sys-tem (Brodersen et al., 2008) conditioned on task-specific decision variables that reflect the accu-mulation and integration of noisy evidence overtime. We model the dynamic evidence accumula-tion as Bayesian sequential sampling, and in oursimple two-alternative task this is equivalent to a Se-quential Probability Ratio Test (Wald & Wolfowitz,1948).

2. The saccade thresholds are set to maximizetask-specific payo↵, but this is one part of a jointoptimization problem that includes all other policyparameters that determine behavior in the task. Inour model of the LLDT, this consists of a separatedecision variable and threshold that determines thetask-level response to the entire trial (but does notinclude architectural parameters, which are fixed).These two thresholds together determine how longthe model fixates on individual strings, how manystrings it reads, and when and how it responds.

3. The shape of the payo↵ surface (and thus itsmaxima) over the multi-dimensional policy space isdetermined jointly by the payo↵ function and prop-erties of the perceptual and oculomotor system, in-cluding saccade programming duration, eye-brain-lag, saccade execution duration, manual motor pro-gramming duration, and representational noise.

Overview

We provide a brief overview of a typical trialbefore focusing in on specific detail of each as-pect of the model specification. See Figure 2 fora schematic diagram of the full model, and Fig-ure 3 for simulated traces from two sample trials.On a given trial, the first fixation starts on the left-most string. During each fixation, noisy informationabout the fixated string is acquired at every timestep(with some delay, the eye-brain-lag, (VanRullen &Thorpe, 2001)). This noisy information is usedfor updating the model’s beliefs about the status ofthe current string as well as the trial as a whole.This continues until either the string-level or thetrial-level belief reaches some threshold, at whichpoint either a saccade is initiated (if the string-levelthreshold is reached), or a manual response is initi-ated (if the trial-level threshold is hit). We will re-fer to these thresholds as the saccade threshold anddecision threshold. Information acquisition contin-ues while the saccade or manual response is beingprogrammed and until the saccade begins execution(with some visual persistence o↵set). Once saccadeprogramming and execution is complete, the modelfixates on the following string (if there are stringsremaining), or initiates a response otherwise. Oncemotor programming and execution is complete, themodel receives point feedback (i.e. the payo↵) andthe trial is over.

Dynamics assumptions: Oculo-motor ar-chitecture and noise

The model’s sequential perceptual inferencemechanism is embedded in a simple oculomotorcontrol machine, drawing upon modern mathemati-cal models of oculomotor control in reading. Thedelays noted above (eye-brain-lag, saccade pro-gramming and execution times, and motor time) aredrawn from gamma distributions (chosen for con-venience because they are constrained to be posi-tive and have been previously used to model these



An adaptive model that performs the complete task

Manualresponse Posterior Update Lexicon +

Model of experiment

Experiment Environment

Critic

Feedback

Noisy sample from position

(Bounded) Optimal Control

Initiate saccade program

Reward Function

Stimulus

Oculomotor System

Initiate press

Button press indicating trial response

s kI(·)

saccade program saccade eye-brain lag ⇠ Gamma(50ms)

⇠ Gamma(250ms)

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.match

SD= 1.75, threshold=0.95Samples

Posterio

r

RT dist. for RN.match

RT

Frequen

cy

300 500 700

0500

10001

500200

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r

RT dist. for RN.mismatch

RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

r

RT dist. for YES.match

RT

Frequen

cy

300 500 700

0100

0200

0300

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.match


Posterio

r

RT dist. for NRN.match

RT

Frequen

cy

300 500 700

0100

0300

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.mismatch


Posterio

r

RT dist. for NRN.mismatch

RT

Frequen

cy

400 500 600 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1000

1500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

10001

500200

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1000

1500

2000

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1000

2000

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.match


Posterio

r


RT

Frequen

cy

300 500 700

0100

0300

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

rRT dist. for YES.match

RT

Frequen

cy

300 500 700

0500

1500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.match


Posterio

r


RT

Frequen

cy

350 400 450 500

0500

1500

2500

saccade decision

trial decision

⇠ Gamma(40ms)⇠ Gamma(125ms)

optimal thresholds


Next, we update the string-level beliefs:

Prnew(S k = Wi|T ! N k, sk) =

=Pr(sk |S k = Wi,T ! N k)Prold(S k = Wi|T ! N k)

Prold(sk |T ! N k)

=Pr(sk |S k = Wi)Prold(S k = Wi|T ! N k)

Prold(sk |T ! N k)(4)

Prnew(S k = Ni|T = Nk, sk) =

=Pr(sk |S k = Ni,T = N k)Prold(S k = Ni|T = N k)

Prold(sk |T = N k)

=Pr(sk |S k = Ni)Prold(S k = Ni|T = N k)

Prold(sk |T = N k)(5)

Finally, we update the trial level beliefs:

Prnew(T =W|sk) =

=Prold(sk |T =W)Prold(T =W)

Prold(sk)=

=Prold(sk |T ! N k)Prold(T =W)

Prold(sk)

(6)

Prnew(T = N j!k |sk) =

=Prold(sk |T = N j)Prold(T = N j)

Prold(sk)

=Prold(sk |T ! N k)Prold(T = N j)

Prold(sk),

(7)

Prnew(T = N k |sk) =

=Prold(sk |T = N k)Prold(T = N k)

Prold(sk)

(8)

In order to make decisions, in addition to the proba-bility that the trial is a word trial or not that we computeabove, we also need the probability that the string at po-sition k is a word or nonword, i.e., Pr(S k ∈ ∪n

i=1Wi) andPr(S k ∈ ∪m

i=1Ni):

Pr(S k ∈ ∪ni=1Wi) =

n∑

i=1

Pr(S k = Wi|T ! N k)Pr(T ! Nk)

(9)

Pr(S k ∈ ∪mi=1Ni) = 1.0 − Pr(S k ∈ ∪n

i=1Wi) (10)

The full process then iterates, with each Prnew becom-ing the next Prold.

AppendixReferences

Anderson, J. R. (1990). The Adaptive Character ofThought. Hillsdale, NJ: Lawrence Erlbaum.

Ballard, D. H., & Hayhoe, M. M. (2009). Modelling therole of task in the control of gaze. Visual Cognition,17(6-7), 1185-1204.

Bicknell, K. (2010). Rational eye movements in read-ing combining uncertainty about previous words withcontextual probability. Proceedings of the 32nd An-nual Conference of the Cognitive Science Society.

Bicknell, K., & Levy, R. (2010). A rational model of eyemovement control in reading. Proceedings of the 48thAnnual Meeting of the Association for ComputationalLinguistics, 1168–1178.

Bicknell, K., & Levy, R. (2012). The utility of modelingword identification from visual input within models ofeye movements in reading. Visual Cognition, 1–18.

Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen,J. D. (2006, October). The physics of optimal decisionmaking: A formal analysis of models of performancein two-alternative forced-choice tasks. PsychologicalReview, 113(4), 700-765.

Brodersen, K. H., Penny, W. D., Harrison, L. M., Dau-nizeau, J., Ruff, C. C., Duzel, E., et al. (2008). Inte-grated bayesian models of learning and decision mak-ing for saccadic eye movements. Neural Networks,21(9), 1247–1260.

Edwards, W. (1961, January). Behavioral decision theory.Annual Review of Psychology, 12, 473–98.

Edwards, W. (1965). Optimal strategies for seeking infor-mation: Models for statistics, choice reaction-times,and human information-processing. Journal of Math-ematical Psychology, 2(2), 312-329.

Engbert, R., Nuthmann, A., Richter, E., & Kliegl, R.(2005, October). Swift: A dynamical model of sac-cade generation during reading. Psychological Re-view, 112(4), 777-813.

Forster, K. (1979). Levels of processing and the struc-ture of the language processor. In W. E. Cooper &E. C. Walker (Eds.), Sentence processing: Psycholin-guistic studies presented to merrill garrett. Hillsdale,NJ: Lawrence Erlbaum.

Geisler, W. (1989). Sequential ideal-observer analysis ofvisual discriminations. Psychological Review, 96(2),267.

Grainger, J. (1990). Word frequency and neighborhoodfrequency effects in lexical decision and naming. Jour-nal of Memory and Language, 29(2), 228–244.

Green, D. M., & Swets, J. A. (1966). Signal DetectionTheory and Psychophysics. New York: Wiley.

Hale, J. T. (2011). What a rational parser would do. Cog-nitive Science, 35, 399–443.

Howes, A., Lewis, R. L., & Vera, A. H. (2009). Rationaladaptation under task and processing constraints: Im-plications for testing theories of cognition and action.Psychological Review, 116(4), 717–751.

Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996).

,... (see Appendix)

1

2

34

5

6

7

lead hilt robe helm guru east

See Legge et al (1997); Bicknell & Levy (2010); Ratcliff & Mckoon (2008); Norris(2009).



“Random walk” of Bayesian posterior update

accumulators that are assumed to gradually accrue noisysensory input until a threshold of activation is reached[19,20]. More recent models proposed in the field of compu-tational neuroscience explicitly model individual integratorneurons [21–23] or assume that each accumulator corre-sponds to a population of integrator neurons associatedwith a particular choice alternative [24,25]. Importantly,all these models assume that SAT is controlled by thedistance between the initial activity of the integrators (i.e.the baseline) and the response threshold. If this difference issmall, decisions are fast but error-prone; if the difference islarge, decisions are accurate but slow (Figure 1). Therefore,these models predict that neural changes associated withSAT should be visible in brain areas involved in decisionmaking (i.e. areas containing integrator neurons), ratherthan in areas specialized in stimulus encoding and motorexecution.

Mathematical models can account for SAT in two ways:either by changing the baseline of accumulators or bychanging their threshold (Figure 2). Although many mod-elling studies have assumed for simplicity that the base-line stays constant and that SAT is controlled by changingthe threshold, in almost all mathematical models thecrucial factor controlling SAT is the baseline–thresholddistance. In most models an increase in baseline and acorresponding decrease in threshold produce exactly thesame changes in simulated behavior. Thus, behavioraldata alone do not allow one to distinguish whether SATis controlled by changing the baseline, threshold, or both.To address this question, one needs to analyze neuralactivity.

fMRI studies of SAT: advantages and limitationsTo date, the most direct evidence concerning the neuralbasis of SAT comes from three recent BOLD-fMRI studiesin humans [4–6]. BOLD is an fMRI technique that revealsthe local changes in blood oxygenation that are closelycoupled with local increases in neural activation [26].Compared to cell recordings in animals, human fMRIhas distinct advantages as a method for studying SAT.First, unlike animals, human subjects can simply beinstructed to be fast or accurate. Furthermore, fMRI per-mits whole-brain coverage at a spatial resolution sufficientto delineate regional changes in activation. Whole-braincoverage is important for studying phenomena, like SAT,that are likely to be dependent on the interplay betweenvarious brain areas.

Two general limitations of fMRI are its low spatial andtemporal resolution. In contrast to neurophysiologicalrecordings, fMRI does not allow one to monitor the activityof specific integrator neurons as evidence accumulates overtime. In addition to these general limitations, fMRI studiesof SAT are also confronted with two specific challenges.First, as pointed out by van Veen et al. [6], the amplitude ofdecision-related BOLD responses is not only proportionalto the baseline–threshold distance but also to the durationof the decision process [26], and this duration is likely to belonger in the accuracy condition. Therefore it is hard toattribute changes in decision-related BOLD signalsunequivocally to changes in baseline–threshold distance.Second, it is difficult to examine response thresholds with

Figure 1. An accumulator model account of SAT. The figure shows a simulation ofa choice between two alternatives. The model includes two accumulators, whoseactivity is shown by blue lines. The inputs to both accumulators are noisy, but theinput to the accumulator shown in dark blue has a higher mean, because thisaccumulator represents the correct response. Lowering the threshold (horizontallines) leads to faster responses at the expense of an increase in error rate. In thisexample, the green threshold leads to a correct but relatively slow response,whereas the red threshold leads to an incorrect but relatively fast response.

Figure 2. Schematic illustration of changes in the activity of neural integrators associated with SAT. Horizontal axes indicate time, while vertical axes indicate firing rate.The blue lines illustrate the average activity of a neural integrator selective for the chosen alternative, and the dashed lines indicate baseline and threshold. (a) Accuracyemphasis is associated with a large baseline–threshold distance. (b,c) Speed emphasis can be caused either by increasing the baseline (panel b) or by lowering thethreshold (panel c); in formal models, these changes are often mathematically equivalent.

Review Trends in Neurosciences Vol.33 No.1

11

OPTIMAL CONTROL = setting thresholds optimally

OPTIMAL STATE ESTIMATION =evidence integration via Bayesian update

Graph from Bogacz (2009), TINS.



Bayesian priors and stimulus representation

Model maintains belief probabilities over:

(a) probability distribution over all possible strings in the currentlyfixated position;

(b) the probability of a nonword in each position; probability currenttrial is a word trial is 1− sum over these.

The prior over (a) is based on Brown Corpus frequency; prior over (b) is

probability of a nonword trial (0.5) divided by the # of positions (6).

The string stimulus is represented with a simple indicator vector coding

(length 26 × 4) (Norris, 2009). At each sample (10ms), Guassian noise

of mean zero and SD = 1.2 (more on this) is added to the true

representation.



Sample model behavior

0 500 1000 1500 2000

CORRECT WORD Trial

Saccade threshold=0.92000, Decision yes= 0.99900, Decision no= 0.99900, Noise=1.15, Sac. prog =125 msTrial Time (ms)

FIXATION

brag

FIXATION

none

FIXATION

mean

FIXATION

bloc

FIXATION

hilt

FIXATION

hair

EBL

EBL

EBL

EBL

EBL

EBLsampling sampling sampling sampling sampling sampling

prog prog prog prog prog prog

MOTOR All words!!



Sample model behavior

0 500 1000 1500 2000

INCORRECT WORD Trial

Saccade threshold=0.92000, Decision yes= 0.99900, Decision no= 0.99900, Noise=1.15, Sac. prog =125 msTrial Time (ms)

FIXATION

lard

FIXATION

sane

FIXATION

helm toot east lost

EBL

EBL

EBLsampling sampling sampling

prog prog prog

MOTOR Not all words!!



Generating predictions

Selecting a policy (pair of thresholds) “programs” themachine to perform the task.

Policy selected through payoff optimization—not data fitting.

Optimal OptimalPayoff condition Saccade Threshold Response Threshold

Accuracy 0.99 0.999Balanced 0.97 0.999

Speed 0.92 0.99

I.e, π∗speed = (0.92, 0.99) and so on. With fixed policy, machine generates

dozens of behavioral measures (e.g. RTs, errors, RTs for accurate vs./

inaccurate, fixation durations for words, nonwords, frequency effects, . . . )



Finding the sweet spot: Payoff as function of thresholds

π∗ = argmaxπ∈ΠEtrial∼ExperimentU(π, trial) (1)

0.80 0.85 0.90 0.95 1.00

14

16

18

20

22

24

26

ACC payoff vs. Saccade Threshold

Saccade Threshold

Accu

racy

Pay

off

(MODEL, noise=1.20)

●(0.9900, 0.9990)

0.80 0.85 0.90 0.95 1.00

14

16

18

20

22

24

26

BAL payoff vs. Saccade Threshold

Saccade Threshold

Bala

nce

Payo

ff

(MODEL, noise=1.20)

●(0.9700, 0.9990)

●

0.80 0.85 0.90 0.95 1.00

14

16

18

20

22

24

26

SPD payoff vs. Saccade Threshold

Saccade Threshold

Spee

d Pa

yoff

(MODEL, noise=1.20)

●(0.9200, 0.9900)

0.80 0.85 0.90 0.95 1.00

−10

0

10

20

ACC payoff vs. Decision Threshold

Decision Threshold

Accu

racy

Pay

off

(MODEL, noise=1.20)

●(0.9900, 0.9990)

0.80 0.85 0.90 0.95 1.00

−10

0

10

20

BAL payoff vs. Decision Threshold

Decision Threshold

Bala

nce

Payo

ff

(MODEL, noise=1.20)

●(0.9700, 0.9990)

0.80 0.85 0.90 0.95 1.00

−10

0

10

20

SPD payoff vs. Decision Threshold

Decision Threshold

Spee

d Pa

yoff

(MODEL, noise=1.20)

●(0.9200, 0.9900)Lewis (University of Michigan) Optimal Control Approaches 21 June 2012


Payoff as a function of behavior

0.80 0.85 0.90 0.95 1.00

200

250

300

350

SFD vs. Saccade Threshold

Saccade Threshold

Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)

●

●

●

0.80 0.85 0.90 0.95 1.00

1000

1200

1400

1600

1800

2000

Trial RT vs. Saccade Threshold

Saccade Threshold

Tria

l Res

pons

e Ti

me

(ms)

(MODEL, noise=1.20)

●

●●

0.80 0.85 0.90 0.95 1.00

0.75

0.80

0.85

0.90

0.95

1.00

% Correct vs. Saccade Threshold

Saccade Threshold

% C

orre

ct

(MODEL, noise=1.20)

●

●●

200 250 300 350

18

20

22

24

26

Payoffs vs. SFD

Single Fixation Duration (ms)

Payo

ff

(MODEL, noise=1.20)

●

●

●

1300 1500 1700 1900

18

20

22

24

26

Payoffs vs. Trial RT

Trial Response Time (ms)

Payo

ff

(MODEL, noise=1.20)

●

●

●

0 2 4 6 8 10 12

18

20

22

24

26

Payoffs vs. Frequency Effect

Log Frequency Effect on SFD

Payo

ff

(MODEL, noise=1.20)

●

●

●


Modeling eye-movements Model vs. human

Model and human at level of trial

●

1000

1200

1400

1600

1800

2000

2200

Response Time for Word Trials

Tria

l Res

pons

e Ti

me

(ms)

(HUMAN)

Accuracy Balance Speed

●

●●

●

●

●

CORRECT trialINCORRECT trial

1000

1200

1400

1600

1800

2000

2200


Tria

l Res

pons

e Ti

me

(ms)

(MODEL, noise=1.20)


●

●●

●

●●●●

●

●

●

●●●●

●

●●

●

●●

●●

●●

●●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

1000

1200

1400

1600

1800

2000

2200

Response Time for Nonword Trials

Tria

l Res

pons

e Ti

me

(ms)

(HUMAN)


●

●●

●

●●


1000

1200

1400

1600

1800

2000

2200


Tria

l Res

pons

e Ti

me

(ms)

(MODEL, noise=1.20)


●

●●

●

●●●●●●

●●●●●

●

●●

●

●●

●

●

●

●

●

●

●

●●●●

●●

●

●

●

●

●

●●

●●●●●●●●●●●●

●●

●

●

●

●

●

●

●

85

90

95

100

Percent Correct

Perc

ent C

orre

ct

(HUMAN)


●

●●

85

90

95

100

Percent Correct

Perc

ent C

orre

ct

(MODEL, noise=1.20)


●●●●

●●●●●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

1000

1200

1400

1600

1800

2000

2200


Tria

l Res

pons

e Ti

me

(ms)

(HUMAN)


●

●●

●

●

●


1000

1200

1400

1600

1800

2000

2200


Tria

l Res

pons

e Ti

me

(ms)

(MODEL, noise=1.20)


●

●●

●

●●●●

●

●

●

●●●●

●

●●

●

●●

●●

●●

●●

●

●

●

●

●

●●●●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

1000

1200

1400

1600

1800

2000

2200


Tria

l Res

pons

e Ti

me

(ms)

(HUMAN)


●

●●

●

●●


1000

1200

1400

1600

1800

2000

2200


Tria

l Res

pons

e Ti

me

(ms)

(MODEL, noise=1.20)


●

●●

●

●●●●●●

●●●●●

●

●●

●

●●

●

●

●

●

●

●

●

●●●●

●●

●

●

●

●

●

●●

●●●●●●●●●●●●

●●

●

●

●

●

●

●

●

85

90

95

100

Percent Correct

Perc

ent C

orre

ct

(HUMAN)


●

●●

85

90

95

100

Percent Correct

Perc

ent C

orre

ct

(MODEL, noise=1.20)


●●●●

●●●●●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●



Model and human at level of word/string

●

200

220

240

260

280

300

Single Fixation Duration

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●

●

●

200

220

240

260

280

300

SFD by Frequency Class

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●●

●

●

LOW frequencyHIGH frequency

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●

●

●

●

●

●

3

4

5

6

7

8

9

Frequency Effect on SFD

Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(HUMAN)


●

●

●

3

4

5

6

7

8

9


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●

●

●

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●

●

●

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●●

●

●


200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●

●

●

●

●

●

3

4

5

6

7

8

9


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(HUMAN)


●

●

●

3

4

5

6

7

8

9


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●

●



Model and human: Words vs. nonwords, position effects


Model and human at level of word/string

●

200

250

300

350

400

Word SFD by correctness

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●●

●

●●

CORRECT trialsINCORRECT trials

200

250

300

350

400


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

200

250

300

350

400

Nonword SFD by correctness

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●

●

●

●


200

250

300

350

400


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●

●

●

●●

●●●●●●

●●●●●

●●●●

●

●

●

●

●

●

●

220

240

260

280

300

320

Word SFD by Position in List

Position in List

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)

2 3 4 5

●

●●

●

●● ●

●

●

● ●●

ACCURACYBALANCESPEED

220

240

260

280

300

320


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)

2 3 4 5

●●

●●

●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●●

●●●

●●●

●●●

●●

●●

●●

● ● ● ●

● ● ● ●

●●

●●

●

200

250

300

350

400


Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●●

●

●●


200

250

300

350

400


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●●

●●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

200

250

300

350

400


Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●

●

●

●


200

250

300

350

400


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)


●●

●●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●

●

●

●●

●●●●●●

●●●●●

●●●●

●

●

●

●

●

●

●

220

240

260

280

300

320


Position in List

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)

2 3 4 5

●

●●

●

●● ●

●

●

● ●●


220

240

260

280

300

320


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.20)

2 3 4 5

●●

●●

●●

●●

●●

●●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●

●

●●

●●●

●●●

●●●

●●

●●

●●

●

●●

●●●

●●●

●●●

●●

●●

●●

● ● ● ●

● ● ● ●

●●

●●

Lewis (University of Michigan) Optimal Control Approaches 5 May 2012 42 / 54Lewis (University of Michigan) Optimal Control Approaches 21 June 2012

Modeling eye-movements How architecture shapes adaptation


The task and modelPredictions vs. human behaviorHow architecture shapes adaptation




Does the processing architecture matter?

The theoretical claim here is that eye-movement control is jointlyshaped by both task payoff and architecture. What is the evidencefor this?

Through modeling we can explore adaptation to differentarchitectures than the one hypothesized for the human oculomotorsystem.



The “minimal model”

Manualresponse Posterior Update Lexicon +

Model of experiment

Experiment Environment

Critic

Feedback

Noisy sample from position

(Bounded) Optimal Control

Initiate saccade program

Reward Function

Stimulus

Oculomotor System

Initiate press

Button press indicating trial response

s kI(·)

saccade program saccade eye-brain lag ⇠ Gamma(50ms)

⇠ Gamma(250ms)

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.match


Posterio

r


RTFreq

uency

300 500 700

0500

10001

500200

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

r


RT

Frequen

cy

300 500 700

0100

0200

0300

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.match


Posterio

r


RT

Frequen

cy

300 500 700

0100

0300

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.mismatch


Posterio

r


RT

Frequen

cy

400 500 600 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1000

1500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

10001

500200

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1000

1500

2000

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.match


Posterio

r


RT

Frequen

cy

300 500 700

0500

1000

2000

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.match


Posterio

r


RT

Frequen

cy

300 500 700

0100

0300

0

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

RN.mismatch


Posterio

r


RT

Frequen

cy

300 500 700

0500

1500

2500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

YES.match


Posterio

rRT dist. for YES.match

RT

Frequen

cy

300 500 700

0500

1500

0 10 20 30 40 50

0.00.2

0.40.6

0.81.0

NRN.match


Posterio

r


RT

Frequen

cy

350 400 450 500

0500

1500

2500

saccade decision

trial decision

⇠ Gamma(40ms)⇠ Gamma(125ms)

optimal thresholds


Next, we update the string-level beliefs:

Prnew(S k = Wi|T ! N k, sk) =

=Pr(sk |S k = Wi,T ! N k)Prold(S k = Wi|T ! N k)

Prold(sk |T ! N k)

=Pr(sk |S k = Wi)Prold(S k = Wi|T ! N k)

Prold(sk |T ! N k)(4)

Prnew(S k = Ni|T = Nk, sk) =

=Pr(sk |S k = Ni,T = N k)Prold(S k = Ni|T = N k)

Prold(sk |T = N k)

=Pr(sk |S k = Ni)Prold(S k = Ni|T = N k)

Prold(sk |T = N k)(5)

Finally, we update the trial level beliefs:

Prnew(T =W|sk) =

=Prold(sk |T =W)Prold(T =W)

Prold(sk)=

=Prold(sk |T ! N k)Prold(T =W)

Prold(sk)

(6)

Prnew(T = N j!k |sk) =

=Prold(sk |T = N j)Prold(T = N j)

Prold(sk)

=Prold(sk |T ! N k)Prold(T = N j)

Prold(sk),

(7)

Prnew(T = N k |sk) =

=Prold(sk |T = N k)Prold(T = N k)

Prold(sk)

(8)

In order to make decisions, in addition to the proba-bility that the trial is a word trial or not that we computeabove, we also need the probability that the string at po-sition k is a word or nonword, i.e., Pr(S k ∈ ∪n

i=1Wi) andPr(S k ∈ ∪m

i=1Ni):

Pr(S k ∈ ∪ni=1Wi) =

n∑

i=1

Pr(S k = Wi|T ! N k)Pr(T ! Nk)

(9)

Pr(S k ∈ ∪mi=1Ni) = 1.0 − Pr(S k ∈ ∪n

i=1Wi) (10)

The full process then iterates, with each Prnew becom-ing the next Prold.

AppendixReferences

Anderson, J. R. (1990). The Adaptive Character ofThought. Hillsdale, NJ: Lawrence Erlbaum.

Ballard, D. H., & Hayhoe, M. M. (2009). Modelling therole of task in the control of gaze. Visual Cognition,17(6-7), 1185-1204.

Bicknell, K. (2010). Rational eye movements in read-ing combining uncertainty about previous words withcontextual probability. Proceedings of the 32nd An-nual Conference of the Cognitive Science Society.

Bicknell, K., & Levy, R. (2010). A rational model of eyemovement control in reading. Proceedings of the 48thAnnual Meeting of the Association for ComputationalLinguistics, 1168–1178.

Bicknell, K., & Levy, R. (2012). The utility of modelingword identification from visual input within models ofeye movements in reading. Visual Cognition, 1–18.

Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen,J. D. (2006, October). The physics of optimal decisionmaking: A formal analysis of models of performancein two-alternative forced-choice tasks. PsychologicalReview, 113(4), 700-765.

Brodersen, K. H., Penny, W. D., Harrison, L. M., Dau-nizeau, J., Ruff, C. C., Duzel, E., et al. (2008). Inte-grated bayesian models of learning and decision mak-ing for saccadic eye movements. Neural Networks,21(9), 1247–1260.

Edwards, W. (1961, January). Behavioral decision theory.Annual Review of Psychology, 12, 473–98.

Edwards, W. (1965). Optimal strategies for seeking infor-mation: Models for statistics, choice reaction-times,and human information-processing. Journal of Math-ematical Psychology, 2(2), 312-329.

Engbert, R., Nuthmann, A., Richter, E., & Kliegl, R.(2005, October). Swift: A dynamical model of sac-cade generation during reading. Psychological Re-view, 112(4), 777-813.

Forster, K. (1979). Levels of processing and the struc-ture of the language processor. In W. E. Cooper &E. C. Walker (Eds.), Sentence processing: Psycholin-guistic studies presented to merrill garrett. Hillsdale,NJ: Lawrence Erlbaum.

Geisler, W. (1989). Sequential ideal-observer analysis ofvisual discriminations. Psychological Review, 96(2),267.

Grainger, J. (1990). Word frequency and neighborhoodfrequency effects in lexical decision and naming. Jour-nal of Memory and Language, 29(2), 228–244.

Green, D. M., & Swets, J. A. (1966). Signal DetectionTheory and Psychophysics. New York: Wiley.

Hale, J. T. (2011). What a rational parser would do. Cog-nitive Science, 35, 399–443.

Howes, A., Lewis, R. L., & Vera, A. H. (2009). Rationaladaptation under task and processing constraints: Im-plications for testing theories of cognition and action.Psychological Review, 116(4), 717–751.

Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996).

,... (see Appendix)

1

2

34

5

6

7

lead hilt robe helm guru east



Payoff structure & predictions for the minimal model

0.80 0.85 0.90 0.95 1.00

−10

0

10

20


Saccade Threshold

Accu

racy

Pay

off

(MODEL, noise=1.90)

●(0.9990, 0.9999)

0.80 0.85 0.90 0.95 1.00

5

10

15

20

25


Saccade Threshold

Bala

nce

Payo

ff

(MODEL, noise=1.90)

●(0.9950, 0.9990)

●

0.80 0.85 0.90 0.95 1.00

5

10

15

20

25


Saccade Threshold

Spee

d Pa

yoff

(MODEL, noise=1.90)

●(0.9900, 0.9990)

0.93 0.95 0.97 0.99

−10

0

10

20


Decision Threshold

Accu

racy

Pay

off

(MODEL, noise=1.90)

●(0.9990, 0.9999)

0.93 0.95 0.97 0.99

−10

0

10

20


Decision Threshold

Bala

nce

Payo

ff

(MODEL, noise=1.90)

●(0.9950, 0.9990)

0.93 0.95 0.97 0.99

−10

0

10

20


Decision Threshold

Spee

d Pa

yoff

(MODEL, noise=1.90)

●(0.9900, 0.9990)

●

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●●

●

●


150

200

250

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.85)


●●●

●●●

●●●●

●●●●

●●●

●●●

●●●●

●●●●

●

●

●

●

●

●

●

3

4

5

6

7

8

9


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(HUMAN)


●

●

●

20

21

22

23

24

25


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(MODEL, noise=1.85)


●●●

●●●

●●●●

●●●●

●

●

●

●

220

240

260

280

300

320


Position in List

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)

2 3 4 5

●

●●

●

●● ●

●

●

● ●●


150

200

250

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.85)

2 3 4 5

●

●●

●

●●

●

●●

●

●●

●●●

●

●●

●

●●

●

●●

●

●●●

●

●●●

●

●●●

●

●

●●

●

●●●

●

●

●●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

Review word-level results



Payoff structure & predictions for the minimal model

0.80 0.85 0.90 0.95 1.00

−10

0

10

20


Saccade Threshold

Accu

racy

Pay

off

(MODEL, noise=1.90)

●(0.9990, 0.9999)

0.80 0.85 0.90 0.95 1.00

5

10

15

20

25


Saccade Threshold

Bala

nce

Payo

ff

(MODEL, noise=1.90)

●(0.9950, 0.9990)

●

0.80 0.85 0.90 0.95 1.00

5

10

15

20

25


Saccade Threshold

Spee

d Pa

yoff

(MODEL, noise=1.90)

●(0.9900, 0.9990)

0.93 0.95 0.97 0.99

−10

0

10

20


Decision Threshold

Accu

racy

Pay

off

(MODEL, noise=1.90)

●(0.9990, 0.9999)

0.93 0.95 0.97 0.99

−10

0

10

20


Decision Threshold

Bala

nce

Payo

ff

(MODEL, noise=1.90)

●(0.9950, 0.9990)

0.93 0.95 0.97 0.99

−10

0

10

20


Decision Threshold

Spee

d Pa

yoff

(MODEL, noise=1.90)

●(0.9900, 0.9990)

●

200

220

240

260

280

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)


●

●

●●

●

●


150

200

250

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.85)


●●●

●●●

●●●●

●●●●

●●●

●●●

●●●●

●●●●

●

●

●

●

●

●

●

3

4

5

6

7

8

9


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(HUMAN)


●

●

●

20

21

22

23

24

25


Log

Freq

uenc

y Ef

fect

on

SFD

(ms)

(MODEL, noise=1.85)


●●●

●●●

●●●●

●●●●

●

●

●

●

220

240

260

280

300

320


Position in List

Sing

le F

ixatio

n Du

ratio

n (m

s)

(HUMAN)

2 3 4 5

●

●●

●

●● ●

●

●

● ●●


150

200

250

300


Sing

le F

ixatio

n Du

ratio

n (m

s)

(MODEL, noise=1.85)

2 3 4 5

●

●●

●

●●

●

●●

●

●●

●●●

●

●●

●

●●

●

●●

●

●●●

●

●●●

●

●●●

●

●

●●

●

●●●

●

●

●●

●

●●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

Review word-level resultsLewis (University of Michigan) Optimal Control Approaches 21 June 2012


Model fit for the alternative architectures

1.2 1.4 1.6 1.8 2.0

2040

6080

Model Error vs. Noise for Architectural Variants

Noise

RM

SE

aga

inst

Hum

an S

FD

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●Complete architectureArchitecture with only saccade programmingArchitecture without saccade programmingMinimal model (noise only)


Conclusion




Conclusion

Summary

We applied optimal control and state estimation techniquesto pursue, computationally, an interesting theoretical idea.Doing so yielded two things: arg max

organism structure

(constraints) task goals (

payoff)

envir

onmen

t st

ructu

re

behavior (optimal policies)

1 What determines the nature of eye-movements in linguistic tasks?Answer: Eye-movement control is the solution to a constrainedoptimization problem posed by task structure and payoff, linguisticknowledge, and oculomotor processing architecture.

2 A novel empirical demonstration: Humans adapt their oculomotorcontrol at the level of single fixation durations to maximize payoff inlinguistic tasks, and do so in ways sensitive to the specificcontingencies of the task at hand.


Conclusion

Stronger ties between psycholinguistics and linguistics andother areas of cognitive science?

Bayesian memory & perception

reinforcement learning

decision making

rational analysis

bounded rationality

psycholinguistics:classic parsing strategiesrational approacheslanguage-as-action

syntactic theory

language evolution

MEMORY constraints(CHANNEL CAPACITY)

PERCEPTUAL/OCULAR-MOTOR

constraints

PARSING PROCESSES

SUBJECTIVE UTILITY FUNCTION

Bounded Optimal Control Analysis

ADAPTIVE, INTEGRATED CONTROL POLICIES

for PARSING, EYE-MOVEMENTS, ACTION and COGNITION

TASK ENVIRONMENT (linguistic stimuli)


Conclusion

Optimal control and syntactic theory and evolution??

The question we can pose is: What optimization problem(specifically, bounded optimal control problem) is human languagethe solution to?

This offers a perspective on language evolution/emergencethat complements existing approaches by placing emphasis onhow the details of cognitive architecture and utility shapelanguage, abstracting away from processes of evolution.

It perhaps offers another way to pursue the “StrongMinimalist” thesis of optimality in language (recent work byChomsky).

For more, see Bratman, Shvartsman, Lewis & Singh (2010) on my web

site.Lewis (University of Michigan) Optimal Control Approaches 21 June 2012

Conclusion

Grammar as bounded optimal policyBratman, Shvartsman, Lewis & Singh (2012)

Table 1: Summary of three sets of experiments and policies learned. See text for detailed description.

ENVIRONMENT AGENT MEMORY LEXICON SIZE (S) PROPERTIES OF EMERGENT LINGUISTIC SYSTEM

Two Rooms

one symbolworking memory+ one symbollong-termmemory

3

Association and systematic order, where in additionsingle symbols uttered in isolation denote specific box-key combinations. Can only achieve 75% success.

4 Association and systematic symbol order. SPEAKERfirst describes the box, then the key (see Figure 2b).

8Highly context-dependent and idiosyncratic symbolmeanings. For example key 2 is represented by sym-bol 4 if uttered before box, but symbol 5 after.

16 Each symbol denotes a box-key combination. For ex-ample symbol 5 means key 1 and box 1.

Two rooms

two symbolworking memory(no long-termmemory)

3 Similar to case with 3 symbols above.

4Complex lexical forms. Describes entire box-key com-bination with two symbols which can be observed si-multaneously by LISTENER effectively creating a 2-symbol length word (see Figure 3b).

One room

one symbolworking memory+ one symbollong-termmemory

3 Symbols act as direct orders to LISTENER, but other-wise policy is similar to the cases of 3 symbols above.

4

Association and symbol order, but no storing or re-trieving from long-term memory is necessary becauseLISTENER can act immediately upon hearing a symbol(see Figure 4b).

Experiment set 1: Exploring constraints on the lexicon.We explore four different lexicon sizes: S = 16, S = 8, S = 4,and S = 3. Figure 2 shows 30 independent learning trajecto-ries for each value of S. The high variance is due to the natureof the learning algorithm which may not converge for bothagents every trial (or may get stuck on a less-than-optimalpolicy)—but what we are interested in are the best policieslearned (because the mechanism used can be improved sig-nificantly beyond our initial implementation of Sarsa(l) withfixed parameters across all experiments).

The first four rows of Table 1 summarize the results. Herewe will discuss the resulting policies in more detail. For 16available symbols, as expected, a different symbol is associ-ated with each box-key combination and the agents arrive atperfect performance. With eight symbols, again the best per-forming policies use two-symbol utterances for each box-keycombination, but not always in the same order (i.e. for somecombinations keys are uttered first and in other boxes are ut-tered first). For the case of four symbols, the best performingpolicies communicate box and key in a particular order, witheach symbol able to refer to either box or key (see Figure 2b).Of particular interest is that the the agents settle on a consis-tent order across box-key combinations, but this order mightbe different over seperate experiments: the linear position is

necessary but the specific order is not. Finally, for the case ofonly three symbols the agents again learn a policy where lin-ear symbol order matters. Curiously, this alone should onlyafford success in 56% of combinations; some policies how-ever achieved 75% success. The policy succeeds in the addi-tional box-key combinations by associating each with a singlesymbol uttered in isolation. That is, with limitations in sym-bol size utterance length becomes informative in addition topositional information.

As we can see, this method of systematically altering onlya single constraint (lexicon size) yields broad variation in lin-guistic properties even in this extremely simple domain, in-cluding the denotation of symbols and the use of order in-formation. The case of three and four symbols suggests thatlimited memory (paired with environmental pressures) leadsto the systematic use of symbol order in optimal performance,especially when the lexicon size is limited.

Experiment set 2: Modified agent constraints. Here ouraim is to explore further what specific constraints led to thesystematic use of order in Experiment 1. We alter the con-straints on the agents by allowing the LISTENER two symbolsin working memory instead of one (and no long-term mem-ory). All the other dynamics of the Treasure Box Domainare kept constant. The actions of store and retrieve have new


Conclusion

Collaborators and sponsors

Michael ShvartsmanU of Michigan Psychology

Satinder SinghU of Mich Computer Sci

Andrew HowesU of Birmingham Comp Sci

Marc BermanU of Toronto Psych

Julie BolandU of Michigan Psychology

Sam EpsteinU of Michigan Linguistics

Miki ObataMie University Linguistics

John JonidesU of Michigan Psychology

Language & Cognitive Architecture Lab: Bryan Berend, Yasaman

Kazerooni, Emmanuel Kumar, Craig Sanders, Mehgha Shyam

National Science Foundation


optimal control approaches to language · 2017. 6. 14. · howes, lewis & vera (2009,...

Documents