modeling the modulatory effect of attention on human ... · modeling the modulatory effect of...
TRANSCRIPT
Modeling the Modulatory Effect of Attention on Human Spatial Vision
Laurent Itti Computer Science Department, Hedco Neuroscience Building HNB-30A, University of Southern California, Los Angeles, CA 90089-2520, U.S.A.
J oehen Braun nstitute of Neuroscience and School of Computing,
University of Plymouth, Plymouth Devon PL4 8AA, U.K.
Christof Koch Computation and Neural Systems Program, MC 139-74,
California Institute of Technology, Pasadena, CA 91125 , U.S.A.
Abstract
We present new simulation results , in which a computational model of interacting visual neurons simultaneously predicts the modulation of spatial vision thresholds by focal visual attention, for five dual-task human psychophysics experiments. This new study complements our previous findings that attention activates a winnertake-all competition among early visual neurons within one cortical hypercolumn. This "intensified competition" hypothesis assumed that attention equally affects all neurons, and yielded two singleunit predictions: an increase in gain and a sharpening of tuning with attention. While both effects have been separately observed in electrophysiology, no single-unit study has yet shown them simultaneously. Hence, we here explore whether our model could still predict our data if attention might only modulate neuronal gain, but do so non-uniformly across neurons and tasks. Specifically, we investigate whether modulating the gain of only the neurons that are loudest , best-tuned, or most informative about the stimulus, or of all neurons equally but in a task-dependent manner, may account for the data. We find that none of these hypotheses yields predictions as plausible as the intensified competition hypothesis, hence providing additional support for our original findings.
1 INTRODUCTION
Psychophysical studies as well as introspection indicate that we are not blind outside the focus of attention, and that we can perform simple judgments on objects not being attended to [1], though those judgments are less accurate than in the
presence of attention [2, 3]. While attention thus appears not to be mandatory for early vision, there is mounting experimental evidence from single-neuron electrophysiology [4, 5, 6, 7, 8, 9, 10], human psychophysics [11 , 12, 13, 14,3, 2, 15, 16] and human functional imaging experiments [17, 18, 19, 20, 21, 22, 23] that focal visual attention modulates, top-down, activity in early sensory processing areas. In the visual domain, this modulation can be either spatially-defined (i.e., neuronal activity only at the retinotopic location attended to is modulated) or feature-based (i.e., neurons with stimulus preference matching the stimulus attended to are enhanced throughout the visual field), or a combination of both [7, 10, 24].
Computationally, the modulatory effect of attention has been described as enhanced gain [8, 10], biased [4] or intensified [14, 2] competition, enhanced spatial resolution [3], sharpened neuronal tuning [5, 25] or as modulated background activity [19], effective stimulus strength [26] or noise [15]. One theoretical difficulty in trying to understand the modulatory effect of attention in computational terms is that, although attention profoundly alters visual perception, it is not equally important to all aspects of vision. While electrophysiology demonstrates "increased firing rates" with attention for a given task, psychophysics show "improved discrimination thresholds" on some other tasks, and functional magnetic resonance imaging (fMRI) reports "increased activation" for yet other tasks, the computational mechanism at the origin of these observations remains largely unknown and controversial.
While most existing theories are associated to a specific body of data, and a specific experimental task used to engage attention in a given experiment, we have recently proposed a unified computational account [2] that spans five such tasks (32 thresholds under two attentional conditions, i.e., 64 datapoints in total). This theory predicts that attention activates a winner-take-all competition among neurons tuned to different orientations within a single hyper column in primary visual cortex (area VI). It is rooted in new information-theoretic advances [27], which allowed us to quantitatively relate single-unit activity in a computational model to human psychophysical thresholds. A consequence of our "intensified competition hypothesis" is that attention both increases the gain of early visual neurons (by a factor 3.3), and sharpens their tuning for the orientation (by 40%) and spatial frequency (by 30%). While gain modulation has been observed in some of the single-unit studies mentioned above [8, 10] (although much smaller effects are typically reported, on the order of 10-15%, probably because these studies do not use dual-task paradigms and thus poorly engage the attention of the animal towards or away from the stimulus of interest), and tuning modulation has been observed in other single-unit studies [5, 25], both gain and tuning modulation have not been simultaneously observed in a single electrophysiological set of experiments [10].
In the present study, we thus investigate alternatives to our intensified competition hypothesis which only involve gain modulation. Our previous results [2] have shown that both increased gain and sharper tuning were necessary to simultaneously account for our five pattern discrimination tasks, if those modulatory effects were to equally affect all visual neurons at the location of the stimulus and to be equal for all tasks. Thus, we here extend our computational search space under two new hypotheses: First, we investigate whether attention might only modulate the gain of selected sub-populations of neurons (responding the loudest, best tuned, or most informative about the stimulus) in a task-independent manner. Second, we investigate whether attention might equally modulate the gain of all visual neurons responding to the stimulus, but in a task-dependent manner. Thus, the goal of the present study is to determine, using new computational simulations, whether the modulatory effect of attention on early visual processing might be explained by gain-only modulations, if such modulations are allowed to be sufficiently complex
(affecting only select visual neurons, or task-dependent). Although attention certainly affects most stages of visual processing, we here continue to focus on early vision, as it is widely justified by electrophysiological and fMRI evidence that some modulation does happen very early in the processing hierarchy [5, 8, 9, 23].
2 PSYCHOPHYSICAL DATA
Our recent study [2] measured psychophysical thresholds for three pattern discrimination tasks (contrast, orientation and spatial frequency discriminations), and two spatial masking tasks (32 thresholds) . We used a dual-task paradigm to measure thresholds either when attention was fully available to the task of interest (presented in the near periphery), or when it was poorly available because engaged elsewhere by a concurrent attention-demanding task (a letter discrimination task at the center of the display). The results are summarized in Fig. 1 and [2].
'I'
A c:: B C 0
4:: c:: c:: ~ .g .g .S
E ~ ~ .;: .S
.S ... .§
.~
[' . ~ Q .. 8 I • Fully Attended
I ...
~ 0 Poorly Attended ~ ",0.3 I I .~
.~ c:: Q) Q 6
Q - I > c:: 810.1 t ; CU oJ ... ::s 1:3 0.2 .g ~ 4j '" E W I I 0- £- I ~ : ~ : t~ ; ~ ~ 0.1 I I .s ! c:: c:: 2
~ f- !! a .~
10-2 /1 4:: 00 0 00 0 10'2 10" ~ 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1
Mask contrast ~ Contrast Contrast
'I'
D E
I't I:J\ I:J\ .S .S '" .lC .lC
'" 0.4 h '" 0.4 I ~ • Fully Attended ~
:E 1;; :E 1;; ... l!! 0.3 o Poorly Attended ... l!! 0.3
'" C '" c E 8 0.2 1+ E 0 0.2 " c:: W c:: W
~ ~ 0.1 I
~ ~ 0.1 f- I I f-
0 0 20 40 60 80 0
0.5 2 Mask e - Target e C) Mask w / Target w
Figure 1: Psychophysical data from Lee et ai. Central targets appeared at 0 - 0.8° eccentricity and measured 0.4° across. Peripheral targets appeared for 250 ms at 4° eccentricity, in a circular aperture of 1.50 • They were either sinusoidal gratings (B, C) or vertical stripes whose luminance profile was given by the 6th derivative of a Gaussian (A, D, E) . Mask patterns were generated by superimposing 100 Gabor filters , positioned randomly within the circular aperture (A, D, E). Thresholds were established with an adaptive staircase method (80 trials per block). A complex pattern of effects is observed, with a strong modulation of orientation and spatial frequency discriminat ions (B, C) , smaller modulation of contrast discriminations (A) , and modulation of contrast masking that depends on stimulus configurations (D, E). These complex observations can be simultaneously accounted for by our computational model of one hypercolumn in primary visual cortex.
3 COMPUTATIONAL MODEL
Linear filters Divisive inhibition Decision
The model developed to quantitatively account for this data comprises three successive stages [14, 27]. In the first stage, a bank of Gabor-like linear filters (12 orientations and 5 spatial scales) analyzes a given visual location, similarly to a cortical hyper column. In the second stage, filters nonlinearly interact through both a selfexcitation component, and a divisive inhibition component that is derived
from a pool of similarly-tuned units. With E)."o being the linear response from a unit tuned to spatial period A and orientation (), the response R)."o after interactions is given by (see [27] for additional details):
R - (A.E)."o) ' + B ).,,0 - (S)O + L W)."o(A',()') (A.E)..!,O')O '
(1)
()..! ,O') EA x 8
where: W (A' ()') = (_ (log(A') -log(A))2 _ (()' - ())2) )., ,0, exp 2A2 2A2
)., 0 (2)
is a 2D Gaussian weighting function centered around (A, ()) whose widths are determined by the scalars Ao and A).,. The neurons are assumed to be noisy, with noise variance V{o given by a generalized Poisson model: V{o = (3(R)."o + <:).
The third stage relates activity in the population of interacting noisy units to behavioral discrimination performance. To allow us to quantitatively predict thresholds from neural activity for any task, our decision stage assumes that observers perform close to an unbiased efficient statistic, that is, the best possible estimator (in the statistical estimation sense) of the characteristics of the stimulus given the noisy neuronal responses. This methodology (described further in [27]) allows us to quantitatively compute thresholds in any behavioral situation, and eliminates the need for task-dependent assumptions about the decision strategy used by the observers.
4 RESULTS and DISCUSSION
The 10 free model parameters (Fig. 2) were automatically adjusted to best fit the psychophysical data from all experiments, using a multidimensional downhill simplex with simulated annealing overhead (see [27]) , running on our 16-CPU Linux Beowulf system (16 x 733 MHz, 4 GB RAM, 0.5 TB disk; see http://iLab . usc. edu/beo/). Parameters were simultaneously adjusted for both attentional conditions; that is, the total fit error was the sum of the error obtained with the baseline set of parameters on the poorly attended data, and of the error obtained with the same parameters plus some attentional perturbation on the fully attended data. Thus, no bias was given to any of the two attentional conditions.
For the "separate fits" (Fig. 2), all parameters were allowed to differ with attention [2], while only the interaction parameters b, 8) could differ in the "intensified competition" case. The "loudest filter" was the one responding loudest to the entire visual pattern presented (stimulus + mask), the "best-tuned filter" was that responding best to the stimulus component alone, and the "most informative filter" was that for which the Fisher information about the stimulus was highest (see
(I) -~ Ul CU:!:: Q.LL (I) en
"5 (1).-.- --..- -Ul (I) co. ,SE c 0 -0
-~ ... ",S :::l= OLL -I
" (I) c ... :::l (I)
!= UlLL (I) m
-c ~(I) Ul" CU c 1-8.
(I) c
Attentional manipulations
y
Top-down Attention
Top-down Attention
Top-down Attention [stimulus-dependent; only affects filter responding
most to whole (targr+maSk) stimulus]
~. Q~~ Top-down Attention
[stimulus-dependent; only affects filter beslluned to/arget stimulus]
~~Q~~ Top-down Attention
[stimulus-dependent; only affects filter most infonnative a~out target stimulus]
~~Q~Q Top-down Attention
[affects all f ilters, but differently for each task] .~ I ~.
... QQQ ... ,,,,,,. . 'y" .... - - - -......... / - -........ ..-
Model Parameters
y,o: Interaction strength
~: ~~; ~::~:~ ~~~~g S, £: Dark noise 11 Light noise S: Semisaturation
Fully Poorly Parameters - -
3.78 1.85 3.421 .80
0 0(0) 9.6913.19 " ,,(oct) 0.440.36 r Q ( " ) 23.01 23.90 :!: ,,(oct) 0.810.18 B / A 0.30 1.10 S / A 10.12 8.05 11 0.17 0.01 £ / Rmu, 0.03 0.11
***** Fully Poorly
Parameters - -
3.941.40 3.551.00
0 0 (0) 8.92 O,,(OCI) 0.41 I: 9 (") 20.80 I: ,,(oct) 0.31 B I A 0.96 S / A 5.39 P 0.56 EI R""" 0.02
***** Parameters
Parameters
Fully Poorly
2.22~56 1.03 ~
6.62 \'""i. 1 ~:~ \.'\ 1.09 0.21
12.16 0.65 0.01
* Fully Poorly
1.40~~
16:: '\\ 14.71 V-"\.
7.03 0.24 3.00
1.5e-9 8635.52
**** Parameters
°9 (" )
o .. (OCI) I: 9(") I: .. (OCI) B I A S I A ~ EI R"""
Fully Poorly
1.45~05
~:~~~
*
0.31 \~"\ 3~:!~ .. ~ ~:: '\,
3.4<-4 0.01
Parameters Fully Poorl y
(1" 9(") (I" ., (oct) I:9(") r ., (oct) B I A S / A , £/R""",
Discussion
- very good fit overall - all parameters biologically
plausible - attention significantly
modulates interactions and noise
***** - very good fit overall - all parameters biologically
plausible - modulation of orientation
thresholds slightly underestimated
- contrast masking with variable mask orientation not perfectly predicted
- no modulation of contrast detection threshold
- no modulation of orientation thresholds
- no modulation of period thresholds
- poor prediction of masking - filter tuning too narrow - gain modulation too large
- no modulation of orientation th resholds
- no modulation of period thresholds
- contrast discrimination and masking well fit
- only fit predicting broad pooling in spatial period
- noise parameters unrealistic
- no contrast discrimination "dipper"
- power-law rather than sigmoidal contrast response (S=O)
- modulation of orientation thresholds slightly underestimated
- noise parameter unrealistic
*** - very good fit overall - gain modulation unreal-
istically high, especially for orientation discrimination (filter gain when attending to orientation is > 20x poorly attended)
Figure 2: Attentional modulation hypot heses and corresponding model parameters. See next page for the corresponding model predictions on our five tasks, for the hypotheses shown. The middle column shows which parameters were allowed to differ with attention, and t he best-fit values for both attent ional conditions.
~ 810"
" ~
Contrast Increment
****
10 -, "----'--,--"'7----l o 10.2 10"
~ 8'0"
~ !!'-
10~
0; • §10"
" l 10~
~
~'0" " l
10'>
Mask contrast
**** - Fully attended
Poorly attended
10"2 10"
Mask contrast
*** - Fully attended
Poorly attended
10'> 10-'
Mask contrast
**** - Fully attended
Poorly attended
10" 10" Mask contrast
**
10.2 10"
Mask contrast
***
Spatial Frequency Discrimination
lM~ R 0.2 ! .0.1
°0 0.2 0.4 0.6 0.8 1
Contrast
*****
lM~ R 02 I . ~ 0.1 ••• •
°0 0.2 0.4 0.6 0.8 1
Contrast
****
!:~~ °0 0.2 0.4 0.6 0.8 1
Contrast
Orientation Discrimination
.~ C 6 I ! ~ ,
2
°0 0.2 0.4 0.6 0.8 1
Contrast
*****
.~ C 6 I I ~ ,
2 ••• •
°0 0.2 0.4 0.6 0.8 1
Contrast
***
.~ \ ~ I I I o 0.2 0.4 0.6 0.8 1
Contrast
f3~1!! c:BI R 0.2 f % 4 •
~ 0.1 • 2 '.. •
00 0.2 0.4 0.6 0.8 1 00 0.2 0.4 0.6 0.8 1
Contrast Contrast
f.3~ r R 0.2 j i 4 .
~ 01 '. • 2 •
0 0 0.2 0.4 0.6 0.8 1 00 0.2 0.4 0.6 0.8 1
Contrast Contrast
***** ***
~ 8~ ~ 0.3 ~ 6
!o,\~~ i, t I I
~0.1 .~ 2 ' ••
0 0 0.2 0.4 0.6 0.8 1 00 0.2 0.4 0.6 0.8 1
Contrast Contrast
***** *****
Contrast Masking, Variable Mask e
• Fully attended Poorly attended
°020406080 Mask6 - Target6(O)
**** • Fullyattended
Poorly attended
°020406080 Mask6 - Target6(O)
*** • Fullyattended
POOrly attended
60 80 Mask6 - Target6(O)
• Fully attended Poorly attended
°020406080 Mask O- TargetOr}
***** • Fully attended
Poorly attended
0020406080
Mask O- TargetOr}
***** • Fully attended
Poorly attended
0020406080 MaskO - TargetO(O}
*****
Contrast Masking, Variable Mask 00
0.4[8 ~ 0.3 I C ' 8 0.2 L
~ • + ~ 0.1
o 0.5 1 2 MaskwlTargetw
***** 0.4~ ~ 0.3
~ 0.2 j . . . ~ 0.1 t
o 0.5 1 2 MaskwlTargetw
**** 0.4~ t: I . I
" ! $ 0.1 •
o 0.5 1 2 MaskwlTargetw
* 0.4~ ~ 0 .. 3 I
8 02 I . Qj • + $ 0.1 t
o 0.5 1 2 MaskwfTarget (J)
**** 0.4[8 ~ 03 , I 8 0.2 '. Qj • + $ 0.1
o 0.5 1 2
MaskwfTarget (J)
***** 0.4[8 I 0.3 t o 0.2 • ~ • + ~ 0.1
o 0.5 1 2 MaskwlTargetw
***** Figure 3: Model predictions for the different attent ional modulation hypotheses studied. The different rows correspond to the different attentional manipulations studied, as labeled in the previous figure. Ratings (stars below the plots) were derived from the residual error of the fits .
[14, 27]). Finally, in the "task-dependent" case, the gain of all filters was affected equally (parameter ')'), but with three different values for the contrast (discrimination and masking), orientation and spatial frequency tasks. Overall, very good fits were obtained in the "separate fits" and "intensified competition" conditions (as previously reported) , as well as in the "most informative filter" and "taskdependent" conditions (Fig. 3) , while the two remaining hypotheses yielded very poor predictions of orientation and spatial frequency discriminations. In the "most informative filter" case, the dipper in the contrast increment thresholds was missing because the nonlinear response function of the neurons converged to a power law rather than the usually observed sigmoid [27]; thus, this hypothesis lost some of its appeal because of its lower biological plausibility. More importantly, a careful analysis of the very promising results for the "task-dependent" case also revealed their low biological plausibility, with a gain modulation in excess of 20-fold being necessary to explain the orientation discrimination data (Fig. 2).
In summary, we found that none of the simpler (gain only) attentional manipulations studied here could explain as well the psychophysical data as our previous manipulation, "intensified competition," which implied that attention both increases the gain and sharpens the tuning of early visual neurons. Two of the four new manipulations studied yielded good quantitative model predictions: affecting the gain of the filter most informative about the target stimulus, and affecting the gain of all filters in a task-dependent manner. In both cases, however, some of the internal model parameters associated with the fits were biologically unrealistic, thus reducing the plausibility of these two hypotheses. In all manipulations studied, the greatest difficulty was in trying to account for the orientation and spatial frequency discrimination data without unrealistically high gain changes (greater than 20-fold). Our results hence provide additional evidence for the hypothesis that sharpening of tuning may be necessary to account for these thresholds, as was originally suggested by our separate fits and our intensified competition hypothesis and has been recently supported by new investigations [16].
Acknowledgements
This research was supported by the National Eye Institute, the National Science Foundation, the NSF-supported ERC center at Caltech, the National Institutes for Mental Health, and startup funds from the Charles Lee Powell Foundation and the USC School of Engineering.
References [1] Braun J & Sagi D. P ercept Psychophys , 1990;48(1):45- 58.
[2] Lee DK, Itti L, Koch C et al. Nat Neurosci, 1999;2(4):375-81.
[3] Yeshurun Y & Carrasco M. Nature, 1998;396(6706) :72- 75 .
[4] Moran J & Desimone R. Science , 1985;229(4715) :782- 4. [5] Spitzer H, Desimone R & Moran J. Science, 1988;240(4850):338- 40 .
[6] Chelazzi L, Miller EK, Duncan J et al. Nature , 1993;363(6427):345- 7. [7] Motter BC. J Neurosci, 1994;14(4):2178-89.
[8] Treue S & Maunsell JH. Nature, 1996;382(6591):539- 41. [9] Luck SJ, Chelazzi L, Hillyard SA et al. J Neurophysiol, 1997;77(1) :24- 42.
[10] Treue S & Trujillo JCM. Nature, 1999;399(6736) :575- 579 .
[11] Nakayama K & Mackeben M. Vision Res, 1989;29(11) :1631- 47.
[12] Bonnel AM, Stein JF & Bertucci P. Q J Exp Psychol A, 1992;44(4):601- 26.
[13] Lee DK, Koch C & Braun J. Vision R es, 1997;37(17):2409- 18.
[14] Itti L, Braun J, Lee DK et al. In NIPS*ll. MIT Press, 1999; pp. 789- 795.
[15] Dosher BA & Lu ZL. Vision Res, 2000;40(10-12):1269- 1292. [16] Carrasco M, Penpeci-Talgar C & Eckstein M. Vision Res, 2000;40(10-12):1203- 1215. [17] Corbett a M, Miezin FM, Dobmeyer S et al. Science, 1990;248(4962):1556- 9. [18] Rees G, Frackowiak R & Frith C. Science, 1997;215(5301):835- 8. [19] Chawla D, Rees G & Friston KJ. Nat Neurosci, 1999;2(7):671- 676. [20] Brefczynski JA & DeYoe EA. Nat Neurosci, 1999;2(4):370- 374. [21] Corbetta M, Kincade JM, Ollinger JM et al. Nat Neurosci, 2000;3(3):292- 297. [22] Kanwisher N & Wojciulik E. Nat Rev Neurosci, 2000;1:91- 100. [23] Ress D, Backus BT & Heeger DJ. Nat Neurosci, 2000;3(9):940- 945. [24] Barcelo F, Suwazono S & Knight RT. Nat Neurosci, 2000;3(4) :399- 403. [25] Desimone R & Duncan J . Annu Rev Neurosci, 1995;18:193- 222 . [26] Reynolds JH, Pasternak T & Desimone R. Neuron, 2000;26(3):703- 714. [27] Itti L, Koch C & Braun J. J Opt Soc Am A, 2000;11(11):1899- 1917.