the role of t emporal structure in human v ision · 2010. 3. 30. · 10.1177/1534582305276839 beha...

22
10.1177/1534582305276839 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE The Role of Temporal Structure in Human Vision Randolph Blake Department of Psychology, Vanderbilt University, Nashville, TN Sang-Hun Lee Department of Psychology, Seoul National University, Seoul, South Korea Gestalt psychologists identified several stimulus properties thought to underlie visual grouping and figure/ground segmen- tation, and among those properties was common fate: the ten- dency to group together individual objects that move together in the same direction at the same speed. Recent years have witnessed an upsurge of interest in visual grouping based on other time- dependent sources of visual information, including synchro- nized changes in luminance, in motion direction, and in figure/ ground relations. These various sources of temporal grouping information can be subsumed under the rubric temporal struc- ture. In this article, the authors review evidence bearing on the effectiveness of temporal structure in visual grouping. They start with an overview of evidence bearing on temporal acuity of human vision, covering studies dealing with temporal integra- tion and temporal differentiation. They then summarize psychophysical studies dealing with figure/ground segregation based on temporal phase differences in deterministic and sto- chastic events. The authors conclude with a brief discussion of neurophysiological implications of these results. Key Words: visual grouping, temporal structure, com- mon fate, temporal resolution, temporal integration, synchrony, figure/ground segmentation Visual perception, unlike most other cognitive activi- ties, seems automatic and effortless—open your eyes and the visual world immediately appears before you, popu- lated with meaningful objects and events. In fact, how- ever, the effortlessness of vision belies the complexity of the operations required to create our visual world— vision’s ease is an illusion. To borrow a metaphor from Hoffman (1998), we are all visual geniuses who are naively unaware of our immense talents. Only when con- fronted with the challenging steps involved in seeing (Marr, 1982) do we appreciate that perception is a near miracle: vision somehow manages to assemble and col- late data contained in the optical input to vision into rec- ognizable, three-dimensional objects arrayed within a cluttered, three-dimensional space. Why is vision inherently difficult? For one thing, the optical input is constantly changing as objects move about in the environment and as we ourselves move rela- tive to those objects. The dynamic character of vision means that we must be able to recognize objects from multiple perspectives and from different viewing dis- tances. For another, objects often appear in cluttered surroundings, which means that parts of objects often obscure, or occlude, one another. Figuring out which parts go with which objects—grouping, as it is called— represents another formidable challenge, as evidenced by the difficulty of dealing with occlusion in the case of machine vision. Visual perception is faced with other dif- ficulties, too, some having to do with the variable light- ing conditions illuminating the environment and others having to do with our inability to digest the entire visual scene in a single glimpse. How does perception overcome these difficulties? Fortunately, the natural environment contains regulari- ties that can be exploited by the processing machinery underlying visual perception. Some of these regularities arise from the nature of light and its interactions with the surfaces of opaque objects (e.g., “shadows are always attached to surfaces”). Others emerge from fundamen- tal principles associated with the nature of matter (e.g., “two objects cannot occupy the same location at the 21 Authors’ Note: We thank Sharon Guttman for comments on an earlier version of this article and David Bloom for help with preparing the manuscript. Supported by a NIH Grant (EY07760) to RB and a grant (M 103KV010021-04K2201-02140) from the Korea Institute of Science and Technology and Planning to SHL. Please address correspondence to Randolph Blake at [email protected]. Behavioral and Cognitive Neuroscience Reviews Volume 4 Number 1, March 2005 21-42 DOI: 10.1177/1534582305276839 © 2005 Sage Publications

Upload: others

Post on 09-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

10.1177/1534582305276839BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWSBlake, Lee / VISUAL TEMPORAL STRUCTURE

The Role of Temporal Structure in Human Vision

Randolph BlakeDepartment of Psychology, Vanderbilt University, Nashville, TN

Sang-Hun LeeDepartment of Psychology, Seoul National University, Seoul, South Korea

Gestalt psychologists identified several stimulus propertiesthought to underlie visual grouping and figure/ground segmen-tation, and among those properties was common fate: the ten-dency to group together individual objects that move together inthe same direction at the same speed. Recent years have witnessedan upsurge of interest in visual grouping based on other time-dependent sources of visual information, including synchro-nized changes in luminance, in motion direction, and in figure/ground relations. These various sources of temporal groupinginformation can be subsumed under the rubric temporal struc-ture. In this article, the authors review evidence bearing on theeffectiveness of temporal structure in visual grouping. They startwith an overview of evidence bearing on temporal acuity ofhuman vision, covering studies dealing with temporal integra-tion and temporal differentiation. They then summarizepsychophysical studies dealing with figure/ground segregationbased on temporal phase differences in deterministic and sto-chastic events. The authors conclude with a brief discussion ofneurophysiological implications of these results.

Key Words: visual grouping, temporal structure, com-mon fate, temporal resolution, temporal integration,synchrony, figure/ground segmentation

Visual perception, unlike most other cognitive activi-ties, seems automatic and effortless—open your eyes andthe visual world immediately appears before you, popu-lated with meaningful objects and events. In fact, how-ever, the effortlessness of vision belies the complexity ofthe operations required to create our visual world—vision’s ease is an illusion. To borrow a metaphor fromHoffman (1998), we are all visual geniuses who arenaively unaware of our immense talents. Only when con-fronted with the challenging steps involved in seeing(Marr, 1982) do we appreciate that perception is a nearmiracle: vision somehow manages to assemble and col-late data contained in the optical input to vision into rec-

ognizable, three-dimensional objects arrayed within acluttered, three-dimensional space.

Why is vision inherently difficult? For one thing, theoptical input is constantly changing as objects moveabout in the environment and as we ourselves move rela-tive to those objects. The dynamic character of visionmeans that we must be able to recognize objects frommultiple perspectives and from different viewing dis-tances. For another, objects often appear in clutteredsurroundings, which means that parts of objects oftenobscure, or occlude, one another. Figuring out whichparts go with which objects—grouping, as it is called—represents another formidable challenge, as evidencedby the difficulty of dealing with occlusion in the case ofmachine vision. Visual perception is faced with other dif-ficulties, too, some having to do with the variable light-ing conditions illuminating the environment and othershaving to do with our inability to digest the entire visualscene in a single glimpse.

How does perception overcome these difficulties?Fortunately, the natural environment contains regulari-ties that can be exploited by the processing machineryunderlying visual perception. Some of these regularitiesarise from the nature of light and its interactions with thesurfaces of opaque objects (e.g., “shadows are alwaysattached to surfaces”). Others emerge from fundamen-tal principles associated with the nature of matter (e.g.,“two objects cannot occupy the same location at the

21

Authors’ Note: We thank Sharon Guttman for comments on an earlierversion of this article and David Bloom for help with preparing themanuscript. Supported by a NIH Grant (EY07760) to RB and a grant(M 103KV010021-04K2201-02140) from the Korea Institute of Scienceand Technology and Planning to SHL. Please address correspondenceto Randolph Blake at [email protected].

Behavioral and Cognitive Neuroscience ReviewsVolume 4 Number 1, March 2005 21-42DOI: 10.1177/1534582305276839© 2005 Sage Publications

Page 2: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

same time”). Still others derive from geometrical regu-larities found in our natural environment, regularitiesthat bias co-occurrence statistics associated with edgesand boundaries (Geisler, Perry, Super, & Gallogly, 2001).These regularities can be—and evidently are—used byhuman visual systems to simplify the grouping of localfeatures into global forms and to promote the segrega-tion of those forms from their backgrounds. Indeed,grouping and figure/ground segregation have consti-tuted two of visual perception’s most enduring, widelystudied problems. Dating back to the Gestalt psycholo-gists, students of perception have compiled through theyears a growing list of grouping principles that seem tocapture key operations underlying human form percep-tion. Most of those principles focus on relations amonglocal “features” defined in terms of spatial discontinu-ities in luminance, color, and texture. These discontinu-ities constitute what can be termed spatial structure;together these various sources of spatial structure spec-ify the locations of edges and borders within the visualscene. They provide the raw ingredients, so to speak,for the processes of grouping and figure/ground segre-gation. These forms of spatial structure, including twoillustrated in Figure 1, can be construed as static sourcesof information in that their realization has no depend-ence on patterns of change over time. A “snapshot” ofthe visual scene is sufficient to portray contours specifiedby these spatial discontinuities.

As pointed out above, however, our visual world ishighly dynamic: objects move within the visual scene,and observers themselves are chronically moving theireyes and heads as they view objects. Consequently, theretinal input to vision typically includes dynamicchanges in the spatial structure of the retinal image. Onemight construe those temporal changes in visual signalsas obstacles in the registration of meaningful visual in-formation about objects, something biological visionmust overcome. After all, we are taught to hold a camerastill when taking photographs, for otherwise photo-graphs of moving objects will likely be blurred. In thecase of biological vision, however, visual changes overtime might actually enhance perception, by contributingto grouping and segmentation of visual features. TheGestalt psychologists foresaw this possibility and formal-ized it as one of their principles, grouping by common fate.For them, common fate involved an ensemble of ele-ments all moving in the same general direction at thesame speed, relative to a background of other elements.Textbook examples of common fate include a flock ofbirds flying overhead and a marching band of musicianswalking in lockstep. In recent years, however, the notionof common fate has been extended to dynamic stimuliother than motion, including unpredictable changes inthe shapes and contrasts of a subset of contours. In mod-

ern parlance, these changes create what is called temporalstructure. The focus of this article is the efficacy of tempo-ral structure as a grouping agent. But for temporal struc-ture to promote grouping, the dynamics portraying thatstructure must be reliably picked up and registered bythe visual nervous system. Does this, in fact, occur? If so,to what extent do spatial structure and temporal struc-ture interact to specify biologically relevant visual infor-

22 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Figure 1: Two Examples of Static Sources of Visual Information Pro-moting Figure/Ground Segmentation.

NOTE: The top panel shows shape defined by luminance and the bot-tom panel shows shape defined by texture.

Page 3: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

mation? These questions represent the central theme ofthis article.

In this article, we address the following questions: (a)How accurately and reliably does the visual system regis-ter temporal structure defining visual events? (b) Canthe visual system derive spatial structure solely on thebasis of temporal structure without discontinuities instatic properties? and (c) What properties of temporalstructure are critical for the visual system to group or seg-regate visual components? From the outset, we wish tostress the distinction between temporal structure in thestimulus domain and temporal structure in the neuraldomain. The former refers to time-varying changes inthe optical input to vision created by events in the world;the latter refers to temporal patterning in the trains ofaction potentials within ensembles of neurons. Muchrecent debate has centered on the existence and possi-ble functional significance of synchronized neural activ-ity (Engel & Singer, 2001), including the role of synchro-nization in visual feature binding (Treisman, 1999).Temporal structure among neural discharges could beevoked by external, stimulus-driven temporal structurebut may also arise from internally generated, dynamicinteractions among neurons. A review of the growing lit-erature on neural synchronization is beyond the scopeof our article; our focus is on temporal structure con-tained in the optical input to vision, structure that may ormay not evoke synchronized activity within the visualnervous system. Of course, evidence that external tem-poral structure can indeed promote perceptual group-ing (“binding” as some would term it), although notdefinitive, would be encouraging to advocates of theneural synchrony viewpoint.

With that disclaimer in place, we are ready to startwith a brief overview of temporal resolution in humanvision, to set the stage for considering temporal struc-ture and grouping.

TEMPORAL RESOLUTION IN HUMAN VISION

Repeating a point made above, the optical input tovision often contains rich temporal structure, producedby object motion, sudden illumination changes, andobserver-produced head and eye movements. Can thisdynamic temporal structure, which can be noisy andunpredictable, effectively contribute to spatial groupingand segmentation of visual features? Before attemptingto answer this question, it is important to ask whethertemporal structure is reliably registered by the visual ner-vous system. Does human vision have the requisite tem-poral resolution to exploit temporal structure containedin the optical input? Several lines of evidence, reviewedin the following paragraphs, suggest an affirmativeanswer to this question.

We shall begin by distinguishing between two seem-ingly conflicting demands faced by vision when con-fronted with dynamic visual events. On the one hand,those events may need to be integrated into a unifiedperceptual representation, a requirement we can termtemporal integration. There are many instances where theoptical input to vision is temporarily interrupted (e.g.,during eye blinks), yet we seamlessly piece togethervisual signals over time to maintain perceptual continu-ity. Temporal integration can also enhance visual sensi-tivity by summing weak neural signals over time (a capac-ity embodied in Bloch’s law). Yet in other situations,effective vision requires segregating visual events occur-ring closely in time, a requirement we can term temporaldifferentiation. When one object briefly and rapidly passesin front of another, we certainly don’t want to incorpo-rate the images of the two objects into some nonexistenthybrid object, and this requires being able to differenti-ate visual events occurring closely in time. Likewise, werely on temporal differentiation every time we read mes-sages rapidly flashed on at the same location on a videomonitor. So, then, how does vision achieve these twoseemingly incompatible demands? Let’s consider eachin turn.

Temporal Integration

The vision literature describes a number of phenom-ena that bear on the question of temporal integration.Here we shall summarize just a few, starting with visualmasking.

Figure 2a schematically illustrates the stimulus condi-tions defining backward masking. A “target” is brieflypresented and followed closely in time by “mask.” Underappropriate spatio-temporal conditions, the trailingmask can markedly reduce the visibility of the targetstimulus even though the two visual events do not over-lap in time (Breitmeyer & Ganz, 1976; Kahneman,1968). Accounts of masking typically assume that thevisual system retains an iconic representation of the tar-get that is susceptible to temporal integration with themask.1 Given this view, one should be able to estimate theintegration time by varying the interval between targetand mask to reveal the interval necessary to preclude thetarget’s reduced visibility. A number of studies point to acritical interval in the range 80-120 msec, with the spe-cific value depending on spatial frequency and retinallocation (Breitmeyer, 1978; Breitmeyer & Ganz, 1976;Corfield, Frosdick, & Campbell, 1978; Meyer & Maguire,1977).

In the case of masking, we infer something about tem-poral integration from the deterioration in target visibil-ity dependent on the asynchrony between presentationof mask and target. A complementary strategy, and onethat arguably taps temporal integration more directly, is

Blake, Lee / VISUAL TEMPORAL STRUCTURE 23

Page 4: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

the form-part integration task (Eriksen & Collins, 1967;Hogben & Di Lollo, 1974). Figure 2b shows the essenceof this procedure. An original visual pattern is dividedinto small, complementary components distributedbetween two individual images. These “half” images arethen presented successively, with the interval betweenimages (ISI) varied over trials. The observer’s task is toperform a perceptual judgment based on the originalpattern. This task can only be accomplished by integra-tion of the components into a global, complete figure;the requisite stimulus information cannot be ascer-tained from either half-image alone. Observers are ableto perform at above chance levels when the half-imagesare sequentially presented with ISIs up to about 120msec (Dixon & Di Lollo, 1994; Eriksen & Collins, 1967;Hogben & Di Lollo, 1974, 1985). This outcome implies

that the neural signals associated with the half-imagecomponents greatly outlast the duration of the visualstimuli themselves, with the visual system integratingthose signals to recreate a usable representation of thecomposite.

Another way to tap into temporal integration is to uti-lize sequential dichoptic (i.e., separate stimulation ofleft and right eyes) presentation of two half-images thattogether yield stereoscopic depth perception (see Fig-ure 2c). For this purpose, random-dot stereograms areparticularly useful, for the separate half-images containno hint of the shape of the region defined by retinal dis-parity (Julesz, 1971). To recognize this shape, neural sig-nals associated with the two monocular images must becombined binocularly, with the disparity between thoseimages then specifying the shape of the region seen indepth. We know, of course, that observers can easilyextract stereoscopic depth from a random dotstereogram when the two monocular images are pre-sented simultaneously. What happens, however, whenthose two images are presented sequentially, not simulta-neously? Several studies have shown that the visual sys-tem can extract stereoscopic depth from temporally sep-arated half-images of a random-dot stereogram, so longas the interval separating the two half-images does notexceed 50-70 msec (Julesz & White, 1969; Ross &Hogben, 1974). Binocular integration time using pic-tures containing recognizable monocular forms can bestretched a little beyond 100 msec (Efron, 1957; Ogle,1963).

The three examples described above represent just afew of the many visual phenomena all of which implythat neural signals triggered by visual stimulation outlastthe stimulus itself. Because the neural consequence ofvisual stimulation persists for some time after physicaltermination of the evoking stimulus, those persisting sig-nals are available for integration with signals arisingwithin about a tenth of a second after termination of thefirst stimulus.

One can imagine circumstances where temporal inte-gration on this time-scale would be advantageous, butsumming signals over time introduces a potentially seri-ous cost: one loses the ability to resolve events occurringvery closely in time, the result being a kind of neuralblurring of stimulus information (Burr, 1980). Speciesreliant on rapid unexpected events eschew temporalintegration altogether, as evidenced by their very highsensitivity to rapidly occurring events (i.e., they possessvery high temporal resolution). One classic example isthe fly, renowned for its ability to detect and react todynamics created by the complex optic flow generatedby flying. What is the evidence concerning temporal res-olution in human vision? We turn to this question in thenext section.

24 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Figure 2: Schematics of Procedures for Estimating Temporal Integra-tion.

NOTE: (A) Visual backward masking, where a “target” element (star inthis example) is followed closely in time by a “masker” that can renderthe target invisible when the interval separating the two (interstimulusinterval: ISI) is brief. (B) Form-part integration task, where two com-plementary parts of a component figure are presented sequentially,with a brief interval (ISI) separating the two. Observers make a percep-tual judgment that requires stimulus information from both compo-nents. In this example (modeled after Hogben & Di Lollo, 1974), one“cell” of a 5 × 5 matrix of black circles is empty—the cell with the miss-ing circle is conspicuous when the two half-images are presented withbrief ISIs. (C) Stereopsis from random-dot stereograms, with the twohalf-images presented sequentially to left and right eyes, with a brief ISIbetween presentations.

Page 5: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

Temporal Differentiation

If all visual information available to perception wereonly contained in “chunks” integrated over time, itwould be impossible to judge the temporal order ofevents occurring very closely in time (VanRullen & Koch,2003). Does human vision suffer this limitation? Toframe the question in the simplest possible terms, imag-ine a single stimulus that appears, briefly disappears, andthen reappears. Would the brief temporal gap even benoticed? The answer is yes—people can detect temporaloffsets as brief as 5 msec (Georgeson & Georgeson, 1985;Smith, Howell, & Stanley, 1982). Or consider another,slightly more complicated stimulus sequence, where astimulus appears for 100 msec at one location, disap-pears very briefly, and then reappears for 100 msec atanother, nearby location. If the two events occur within20 msec or so of one another, will they appear as a singleevent, namely, two stimuli occurring simultaneously attwo locations? The answer is no—one readily experi-ences compelling apparent motion, with a unitary stimu-lus seen to move from the initial location to the subse-quent one (Anstis, 1970). These two examples, as well asothers (e.g., Lappin & Bell, 1972), imply that the visualsystem retains very good information about when in timeevents occur relative to one another, for otherwisemotion or brief disappearance would not beexperienced.

Another, related line of evidence pointing to hightemporal resolution comes from a task called temporalorder detection (Exner, 1875; Hirsh & Sherrick, 1961;Sweet, 1953; Wertheimer, 1912; Westheimer & McKee,1977; Yund & Efron, 1974). Similar to an apparentmotion display, two stimuli appear asynchronously at dif-ferent locations in the visual field, and the observer’s taskis to indicate which one appeared first. Threshold esti-mates for this task range from about 20 msec, when thetwo stimuli are separated by several degrees of visualangle (Hirsh & Sherrick, 1961), down to 2 msec, whenthe stimuli are immediately next to one another (Sweet,1953; Westheimer & McKee, 1977).

A third task that taps into temporal resolutioninvolves detecting asynchrony between patterns flicker-ing at the same rate, with asynchrony gauged in terms oftemporal phase shift in the flicker of one pattern relativeto another. A representative example of this technique isprovided by an experiment by Motoyoshi (2004), themethods of which are summarized in Figure 3. He pre-sented four small grating patches arranged in a squareconfiguration, with the distance between patches variedover conditions. All gratings switched between verticaland horizontal, always at the same reversal rate; switcheswere produced by smoothly changing the contrast of thetwo orientations in a reciprocal fashion. Reversals in ori-entation of one of the four gratings was offset in time by

varying amounts relative to the other three (whichchanged orientation in synchrony), and on each trial,observers identified which one of the four gratings wasout of synch with the other three. Temporal resolutionon this task is thus indexed by the smallest temporal off-set between one pattern and three others. Motoyoshi’sresults showed that under optimal conditions (slowlyflickering patterns in close spatial proximity), observerscould resolve offsets on the order of 30 msec. Temporalresolution was, however, strongly dependent on bothtemporal frequency and spatial separation (see alsoForte, Hogben, & Ross, 1999), leading Motoyoshi to con-clude that perceptual synchrony is governed by mecha-nisms conjointly sensitive to spatial and temporal factors.Later in our review, we will see that the efficacy of tempo-ral structure in visual grouping is also strongly depend-ent on spatial interactions. It is worth noting, inciden-

Blake, Lee / VISUAL TEMPORAL STRUCTURE 25

Figure 3: Schematic of Temporal Resolution Task Devised byMotoyoshi (2004).

NOTE: (A) Four grating patches were arrayed in a square configura-tion (the spatial separation of the patches was varied over conditions).Within each patch, the contours switched back and forth repetitivelyfrom horizontal to vertical (with the transition occurring smoothly bysinusoidally modulating the contrast of the two orientations); theswitch rate was varied over blocks of trials. (B) and (C) Plots of changesin orientation over time (t). Orientation changes in one of the fourgrating patches (the “target” shown in light gray in panel B) wereslightly out of phase (shown in panel B as ∆φ) with the orientationchanges in the other three patches (all of which switched orientationin perfect synchrony, shown in black in panel B); the location of the tar-get within the array was varied randomly over trials, and following eachtrial, observers had to indicate which one of the four constituted thetarget.

Page 6: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

tally, that temporal resolution estimated using repetitiveflicker tends to be poorer than that indexed by discrete,nonrepetitive events. These two classes of stimuli havesubstantially different energy distributions within thetemporal frequency domain, and repetitive stimulationengages temporal summation in ways that discreteevents do not.

Besides the study of visual simultaneity (synchrony),there are also studies aimed at measuring the perceptionof event synchrony between different aspects of vision.Thus, for example, when the direction of motion of anarray of stimulus elements reverses periodically overtime and also the color of those elements changesbetween two values periodically over time, the motionchange has to occur slightly in advance of the colorchange for the two events to appear perceptually simul-taneous (Arnold, Clifford, & Wenderoth, 2001;Moutoussis & Zeki, 1997; Nishida & Johnston, 2002). Ina similar vein, when visual events and auditory events arepaired closely in time, the auditory events typically needto lag the visual events by a brief but reliable duration forthe two events to be perceived as simultaneous (e.g.,Exner, 1875). VanRullen and Koch (2003) recentlyreviewed this rich literature on perceptual simultaneity,so we will not cover those studies here. Suffice it to saythat this general question of perceptual timing has impli-cations for visual temporal resolution and, therefore,temporal structure and grouping.

In summary, results from studies measuring temporaldifferentiation indicate that the visual system can resolvesequential visual events with relatively high precision.Although estimates of resolution for temporal differenti-ation vary depending on the task and stimulus condi-tions, those values are considerably smaller than the val-ues characterizing temporal integration. How do weaccount for these seemingly large discrepancies? How isit, in other words, that human vision can realize hightemporal resolution while at the same time integratingvisual signals over relatively long durations?

One explanation appeals to the involvement of differ-ent visual channels, or pathways, distinguished by theirtemporal properties. As popularized several decadesago, this multichannel hypothesis posited the existenceof a sustained channel concerned with form analysis anda transient channel specialized for motion (Breitmeyer,1984; Burr, 1980; Watson, 1986). Given what we knowabout these putative channels, it is easy to imagine thatthe sustained channel could support temporal integra-tion and the transient channel temporal differentiation.Alternatively, one could envision both integration anddifferentiation being mediated by a single mechanismwhose neural response profile varied over time relativeto stimulus onset or stimulus offset. To our knowledge,this latter hypothesis has not been formalized, but there

are reasonable precedents for this idea in the literature.We know, for example, that the duration of a relativelyweak light flash can be adjusted to make that flash just asdetectable as a brief, bright light flash—this is the classicexample of temporal integration termed Bloch’s law. Yetthose equally detectable light flashes are nonethelesseasily discriminated from one another (Zacks, 1970).Evidently, then, observers have access not only to themagnitude of the neural response triggered by a stimu-lus but also to the distribution of that response over time,as determined by the temporal structure of the stimulus.We see no reason why the same kind of multicodingcould not be involved in the mediation of temporalintegration and temporal differentiation.

So, based on studies of temporal differentiation, weconclude that human vision possesses reasonably goodtemporal resolution. This, in turn, suggests that humanvision should be able to register the temporal structureassociated with dynamic visual events. But can humanvision exploit that temporal structure to promote visualgrouping of features possessing common temporalstructure? This question brings us to the main theme ofthis article, spatial grouping based on temporalstructure.

SPATIAL STRUCTURE FROMTEMPORAL STRUCTURE

Can common fate in the form of temporal structurepromote spatial grouping and figure/ground segrega-tion? The answer is most certainly yes, and in the follow-ing paragraphs, we describe the results that substantiatethis affirmative answer. As a prelude to reviewing thosestudies, we should say a word about the different forms oftemporal structure that have been used and about thevarious possible “carriers” of that temporal structure.The general concept of temporal structure is groundedin information theory and signal processing, and read-ers interested in the quantitative details underlyingthose conceptualizations are referred to de Coulon(1986) and/or Brook and Wynne (1988). For purposesof our literature review, however, it is sufficient to distin-guish between two categories of temporal structure:deterministic and stochastic.

Deterministic temporal structure refers to predict-able time-dependent changes in a stimulus along somevisual dimension, meaning that the points in time atwhich change occurs can be specified a priori by somemathematical expression. To give a few examples, a spotof light flickering on and off repetitively constitutes anevent with deterministic temporal structure. So, too,does a grating pattern whose spatial phase is reversedregularly over time or whose contrast is modulated upand down regularly over time. For each of these events,

26 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Page 7: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

temporal structure can be summarized by a parameterspecifying the flicker rate, the reversal rate, or the modu-lation rate. Deterministic temporal structure can alsotake more complex forms, such as that embodied in afrequency modulated counterphase grating (i.e., repeti-tive phase shifts at a rate that varies predictably overtime). Deterministic changes convey very little informa-tion, in the sense that these events involve no uncertaintyabout their time-course once the rate of change has beendetermined. In its simplest form, repetitive change ismonotonous.

Stochastic temporal structure refers to time-dependentchanges in a stimulus that are unpredictable, becausethe points in time at which change occurs are deter-mined by a random process. For stochastic temporalstructures, we can only make statistical predictions aboutwhen change will occur, with those predictions basedeither on learning or on prior probabilities. Stochastictemporal structure “looks” very different from determin-istic temporal structure,2 in the same way that a regularlytextured figure looks quite different from a randomlytextured one (see Figure 4). Stochastic temporal struc-ture can be defined using what engineers and natural sci-entists term a point process—a time series denoting whenstate changes occur in a system. Applications utilizingpoint process modeling range from predicting naturalcatastrophes such as forest fires and earthquakes to con-trolling data traffic in electronic communication net-works to analyzing the times and locations of criminalevents. In the case of vision, point process models can beused to characterize unpredictable changes along anydimension for which stochastic changes occur, includingluminance, contrast, and phase. For any given point pro-cess that defines stochastic temporal structure, we canderive measures of central tendency and variance, andwe can express the degree of unpredictability of changein terms of entropy (a quantity related to the Poissonrate parameter generated in a given time series). Weshall return to the notion of entropy shortly, whendescribing experimental results. For now, it is sufficientto point out that stochastic temporal structure is inher-ently unpredictable and, therefore, transmits moreinformation than deterministic temporal structurewhen events do occur. Visual changes embodying sto-chastic temporal structure are not monotonous.

We believe this distinction between deterministic andstochastic temporal structure is potentially important.Unlike most engineered communications systems thatuse well-defined frequencies to transmit information,biological systems often have to deal with unpredictableenvironmental events that occur within noisy back-grounds. Elsewhere we have conjectured that groupingfrom stochastic temporal structure is robust because sto-chastic change conveys more information and more

effectively engages the biological machinery from whichvision is crafted (Blake & Lee, 2000).

Before proceeding to a review of experiments, we alsoshould underscore that temporal structure, regardlessof whether it is deterministic or stochastic, can be speci-fied independently of the carrier(s) of that temporalstructure. Thus, in the examples given above, we sawinstances where temporal structure was defined by lumi-nance change (flicker), by phase change, and by con-trast change. One could also envision conditions wheretemporal structure was embodied in changes of thecolor of a test patch, changes in the orientation of a con-tour, or changes in the depth plane of a disparity-definedsurface. Indeed, a relevant question concerning spatialgrouping from temporal structure is the efficacy of dif-ferent carriers of temporal structure. Given that differ-ent aspects of vision exhibit different degrees of tempo-ral resolution (e.g., color is reputed to be temporally“sluggish”), we would expect different carriers to vary intheir effectiveness as grouping cues when temporalstructure was the defining feature. Some evidence to thiseffect has been reported (e.g., Guttman, Gilroy, & Blake,in press; Suzuki & Grabowecky, 2002).

With those distinctions in place, we will now movethrough a survey of experiments bearing on the role oftemporal structure in figure/ground segmentation andgrouping, organizing them in terms of the nature of thetemporal structure embodying it. Within each section,the order of coverage follows a more or less chronologi-cal sequence.

DETERMINISTIC TEMPORAL STRUCTURE

One of the simplest ways to manipulate temporalstructure is to introduce a temporal phase lag betweentwo sets of repetitively flickering elements. If humanvision is sensitive to the phase lag, the two sets of ele-

Blake, Lee / VISUAL TEMPORAL STRUCTURE 27

Figure 4: Regular Versus Random Spatial StructureNOTE: Both figures consist of black and white squares of the same sizeand the same average density. The two textures look quite different,however, owing to the spatial configuration (“regularity”) of thesquares. The analogs in the case of temporal structure are predictable,repetitive change in some stimulus feature (e.g., luminance) versus ir-regular, unpredictable change in that feature.

Page 8: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

ments should form separate, distinguishable groups.This is exactly the idea contained in the two-frame ani-mation created by Rogers-Ramachandran andRamachandran (1998), shown schematically in Figure5a. Black and white spots are spatially distributed on agray background creating a texture border in one frame.In Frame 2, the luminance polarity of each spot isreversed. When these two frames are rapidly and repeti-tively alternated over time, the two groups of dots willflicker in counterphase relative to one another. Whenthe spots flickered at 15 Hz (producing a 33.3 msecphase difference between light and dark elements),observers could still perceive the texture boundarydefined by luminance. However, they could not discernwhether any two spots were flickering in-phase or out-of-phase. In other words, all the spots appeared to be identi-cal when flickered at 15 Hz, with the border still beingdistinct. Rogers-Ramachandran and Ramachandrandubbed this a “phantom” contour, because clear texturesegregation was perceived even though the texture ele-ments themselves were indistinguishable. Temporalinformation alone seemed to be creating segregation ofthe two regions.

In a similar vein, Fahle (1993) created temporal struc-ture among flickering elements by manipulating thestimulus onset asynchrony between two, nonoverlap-ping groups of flickering stimuli. The stimuli were arraysof regularly or randomly spaced dots (Figure 5b), withthe dots within a small “target” region (denoted by a rect-angle in Figure 5b) flickered synchronously at a giventemporal frequency. The remaining dots outside of thetarget region were also flickered in synchrony at thesame rate, but their temporal phase was delayed relativeto that of the target dots. Observers judged the shape orthe location of the region defined by “target” dots, andFahle varied the time delay (phase lag) between the flick-ering target and surround dots. The threshold timedelay varied depending on the flicker frequency, butunder optimal conditions observers could perform thetask with phase lags as brief as 6-7 msec. Fahle concludedthat the visual system can segregate a visual scene intoseparate regions based on “purely temporal cues”because the dots in the figure and those in the back-ground were homogeneous in static information anddiffered only in temporal phase. Kojima (1998) also con-firmed this finding using spatial-frequency filtered ran-dom dot textures. In Kojima’s study, observers easily per-ceived figure from background even when the patternscomposing the two subregions were delayed by only 13msec.

The results just summarized imply that temporaldelays as brief as 5-15 msec can effectively promote spa-tial grouping and texture segregation. However, twoother studies have reported seemingly contradictory

results. Kiper et al. (1996) examined the role of tempo-ral information in visual segmentation by askingwhether onset asynchrony of texture elements can influ-

28 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Figure 5: Schematics of Visual Displays Used to Examine the Role ofTemporal Cues in Spatial Grouping.

NOTE: (A) The two frames of an animation devised by Rogers-Ramachandran and Ramachandran (1998). When these two framesare alternately presented at 15 Hz (30 frames per second), observerssee a distinct, horizontal border even though the spots in the upperand lower halves of the display appear indistinguishable in lightness.(B) Example of two frames from an animation used by Fahle (1993).When these two frames are shown in succession with a slight delay be-tween the offset of one and the onset of the other, observers readily per-ceive a rectangle. The minimal delay yielding shape perceptiondepends on the rate at which the two frames are repetitively presented.(C) Texture arrays used by Kiper, Gegenfurtner, and Movshon (1996).The “target” region was defined by differences in orientation between“target” and “background” and/or by differences in temporal phase oftarget and background (i.e., background elements were presentedslightly earlier or later than the target elements). (D) Display used byFahle and Koch (1995), consisting of two sets of partially occluded cir-cles that create the impression of two Kanizsa triangles one on top ofthe other. The “closer” triangle fluctuates between the two over time.In their experiment, Fahle and Koch flickered the stimulus elementsdefining one triangle in synchrony while flickering the stimulus ele-ments for the other triangle out of phase.

Page 9: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

ence performance in texture segmentation and group-ing. In their experiments, observers discriminated theorientation (vertical vs. horizontal) of a rectangularregion containing line segments different in orientationfrom those in a surrounding region (Figure 5c). Spatialsimilarity and temporal similarity were manipulated asindependent variables. The spatial factor was the angu-lar difference in orientation between texture elementsin target and surround regions. The temporal factor wasthe difference in onset time between target and sur-round elements. Kiper et al. reasoned that if temporalphase can be utilized by human vision for texture seg-mentation, performance should be enhanced when tex-ture elements in the target region are presented out ofphase with those in the surrounding region, becausetemporal phase provides additional information for thetask. However, they found no influence of temporalasynchrony on texture segmentation; performancedepended entirely on the magnitude of the difference inorientation between target and surround.

Also arguing against the efficacy of temporal struc-ture as a grouping cue are results from a study by Fahleand Koch (1995). These investigators created an ambig-uous stimulus, two overlapping Kanizsa triangles (Figure5d), that could be seen in either of two configurations,and they examined whether flickering components of agiven configuration could bias observers to perceive thatconfiguration. Without flicker, observers experiencedperceptual rivalry: one triangle seemed to occlude theother for several seconds, with the “front” triangleswitching between the two alternatives every few sec-onds. As expected, disrupting the spatial configurationof one of the two triangles led to the other, unperturbedtriangle being seen predominantly in front. Disruptionsin the temporal configuration, however, had no sucheffect. Over a range of flicker frequencies (10-75 Hz),temporal phase differences among the components of agiven configuration did nothing to weaken that configu-ration’s predominance. These results, together withthose of Kiper et al. (1996), question whether temporalstructure plays a prominent role in spatial grouping.

So at this point in the chronology, we have someresults showing robust grouping by temporal synchronyand other results showing no effect of temporal syn-chrony on grouping. How do we resolve these conflict-ing results? A clue to this question may come from theinteractive relationship between spatial and temporalcues in spatial organization. In the studies showing thatflicker does contribute to grouping (Fahle, 1993;Kojima, 1998; Rogers-Ramachandran & Ramachandran,1998), the displays contained no coherent spatial infor-mation for segmentation—all stimulus elements in thedisplays were identical in terms of form, orientation, andcolor. Spatial grouping was defined solely by temporal

phase lags among elements within distinct regions. Onthe other hand, prominent spatial cues were always pres-ent in the displays used in those studies that failed to findan effect of temporal phase and flicker. (In the study byKiper et al., 1996, differences in orientation defined fig-ure and ground; in the study by Fahle & Koch, 1995,luminance edges induced contours that conspicuouslydefined the two competing triangles.) Perhaps, then, theefficacy of temporal structure is constrained by thepresence and strength of spatial structure.

This possibility was explicitly tested by Leonards,Singer, and Fahle (1996), who used a display modeledafter the one employed by Kiper et al. (1996) (Figure5c). Figure and ground regions could be defined by a dif-ference in temporal phase alone, by a difference in ori-entation alone, or by both temporal phase and orienta-tion differences. Observers were able to identify a“figure” provided that the figure was defined solely bytemporal cues or when those temporal cues were conso-nant with the spatial cue. When, however, the two cueswere in conflict, spatial cues dominated.

This interaction between spatial and temporal struc-ture is also revealed in a series of experiments per-formed by Usher and Donnelly (1998). They created asquare lattice display (Figure 6), in which elementscould be grouped either into rows or into columns.When the elements in alternating rows (or columns) ofthe lattice were flickered asynchronously (out of phase),the display was perceived as rows (or columns) corre-spondingly. Temporal phase, in other words, deter-mined global perceptual organization in this otherwiseambiguous display. Of particular relevance, the efficacyof asynchronous flicker was governed by the shape of lat-tice elements. The strongest grouping effect from tem-poral structure was obtained when using circles ratherthan crosses in the display. In line with the findings ofLeonards et al. (1996), this result probably means thatthe efficacy of temporal structure is constrained by thepresence and strength of spatial structure—the lattice ofcrosses contains abundant collinearity, whereas the lat-tice of circles does not. In another experiment, Usherand Donnelly (1998) asked observers to detect a targetof collinear line segments embedded in an array of oth-erwise randomly oriented line segments. Performancewas better when target and background line segmentswere flickered asynchronously than when they were syn-chronized. Note that in this condition temporal and spa-tial structure were congruent with one another and thattemporal structure enhanced the perception of spatialstructure. Targets consisting of randomly oriented linesegments, and thus defined solely by temporal structure(flicker asynchrony), could also be detected effectively,although the phase lag required for grouping wassomewhat longer.

Blake, Lee / VISUAL TEMPORAL STRUCTURE 29

Page 10: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

In yet another demonstration of grouping based ontemporal information, Sekuler and Bennett (2001)devised a clever display consisting of a 10 × 10 array ofsquares whose individual luminance values varied ran-domly throughout the array (see Figure 7). The lumi-nance values of all squares varied sinusoidally over time,such that each square went from light to dark in asmooth, repetitive fashion. One small, rectangular-shaped cluster of squares, the “target” region, increasedand decreased in-phase (meaning that the peaks andtroughs of the sinusoidal modulations were aligned); allthe remaining squares outside of this target region also

modulated in-phase with one another but not in-phasewith the target elements. In other words, the target wasdefined by a cluster of squares that became progressivelylighter and darker together, relative to the direction ofthe luminance modulations in the background. Observ-ers were able to identify the orientation of the targetregion even when the depth of modulation was less than2%, an incredibly small amount of change over time.Sensitivity to phase differences in luminance modula-tion was best at modulation rates of 5 Hz and greater.Sekuler and Bennett also varied the phase lag betweensinusoidal modulations of target and background, amanipulation that varies the time separating the peaks inthe two sets of modulating elements. The task remainedeasy even with phase differences as small as 22 degrees,which translates into a time difference as brief as 6.5msec at a flicker rate of 9.6 Hz. Sekuler and Bennettinterpreted their findings as an indication that theGestalt notion of common fate extends beyond motionto include synchronized luminance change.

The experiments described so far all utilized dynamicstimuli that, in principle, could activate motion mecha-

30 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

A

B

C

time

Figure 6: Schematic (Not Exact or to Scale) of One of the DisplaysUsed by Usher and Donnelly (1998) in Their Study of VisualSynchrony and Grouping.

NOTE: (A) An ambiguous “composite” display seen either as rows ofblack circles or as columns of black circles; over trials, these two alterna-tive perceptual outcomes are equally likely when the array is brieflyflashed. (B) and (C) When components of the composite are pre-sented sequentially in time (one frame immediately following theother), perception is biased in favor of rows (panel B) or columns (C),even though the successive exposures were sufficiently close in time toappear simultaneous.

Figure 7: Display Used by Sekuler and Bennett (2001) to Study Group-ing by Common Luminance Changes.

NOTE: All squares within a matrix changed in luminance sinusoidallyover time within a limited range of luminance values, with the range(and mean) varying among squares. A subset of contiguous squareswithin the matrix (which defined the “target”) became lighter anddarker together (two of the target squares are shown as A and B in thefigure), whereas the remaining squares also became lighter and darkertogether but with a slight phase lag relative to the target squares. Theaverage luminance within target and background was the same; onlythe time difference in luminance modulation distinguished targetfrom background.SOURCE: Figure reproduced with permission of Sekulet and Bennett(2001) © American Psychological Society.

Page 11: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

nisms differentially within figure and ground regions.3

Cognizant of this possibility, Kandil and Fahle (2001) setout to design a figure/ground grouping display in whichmotion was explicitly present but unequivocally insuffi-cient to support grouping. They also wanted to designanimations that would prevent grouping based on lumi-nance differences, a lingering concern in some of thestudies described earlier in this section. To accomplishtheir goal, Kandil and Fahle created animations in whicheach frame contained a large number of regularlyspaced dot pairs, or “colons” as they called them (see Fig-ure 8). Each pair of dots flipped their angular orienta-tion by 90 degrees, with these flips occurring repetitivelyand periodically every other frame of the animation. Dotpairs within a virtual “figure” region flipped on even-numbered frames and dot pairs within the virtual “sur-round” flipped on odd-numbered frames. Segmenta-tion was thus defined by a phase offset between the fliptimes of the figure and ground dot pairs. Observers werereliably able to identify the shape of the virtual figuralregion at flip frequencies just over 20 Hz (which corre-sponds to a timing difference of 22 msec between figureand ground events). Interestingly, this temporal acuitywas approximately halved (i.e., temporal resolution wasreduced twofold) in observers older than 50 years of age,although Kandil and Fahle did not speculate on possiblereasons for this performance decline. They also modi-fied their animations so that they could vary the phase-lag between figure and ground flips while holding flipfrequency constant. With this maneuver, they found thatyoung observers could group dot pairs based on syn-chronized motion down to differences as small as 11msec. They interpreted their findings as definitive evi-dence for the effectiveness of “astonishingly” short tem-poral delays as a grouping cue, uncontaminated by lumi-nance or motion artifacts. In a follow-up article, Kandiland Fahle (2003) explored boundary conditions fortime-based figure/ground segmentation. They foundthat isoluminant stimuli (dot pairs defined solely bycolor) had slightly poorer temporal resolution com-pared to luminance-defined stimuli. They also testedconditions involving dichoptic presentation of stimuluselements (i.e., elements moved from one position in oneeye to another position in the other eye). Here the targetwould be perceived only if the images shown to the twoeyes were being matched between the eyes over time.Significantly, dichoptic stimulation abolished figure/ground segmentation, implying that monocular neuralmechanisms mediate this form of spatial structure frommotion.

Considered together, the results summarized in thissection indicate that the temporal structure created byrepetitive luminance flicker can promote grouping andsegmentation, with the efficacy of temporal structure

modulated by the existence and strength of spatial struc-ture within the display. This conclusion is not limited tofigure/ground organization defined by luminanceflicker. There are also a couple of studies that have exam-ined grouping of spatially distributed features based onrepetitive contrast modulation as the source of temporalstructure. Those studies are summarized in thefollowing paragraphs.

In one study, Alais, Blake, and Lee (1998) created adisplay comprising four spatially distributed apertures

Blake, Lee / VISUAL TEMPORAL STRUCTURE 31

Figure 8: Schematic of the Display (Not Exact or to Scale) Used byKandil and Fahle (2001) to Study Figure-Ground Segrega-tion Based on Temporal Phase.

NOTE: On each frame of an animation, the observer saw an array of“colon”-shaped elements (pairs of dots whose virtual orientations var-ied irregularly throughout the array). Every other frame of the anima-tion, each dot pair flipped 90 degrees in virtual orientation, with thepoint of rotation being the imaginary midpoint of the dot pair. Dotpairs within a “target” region flipped in synchrony, and dot pairs withinthe background flipped in synchrony out-of-phase with the target dotpairs (see Panel B for an example of one target dot pair and one back-ground dot pair shown on four successive frames). The target dots andbackground dots flip at different points in time, providing a temporalcue for figure/ground segmentation.

Page 12: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

each of which contained a sinusoidal grating that whenviewed alone, always appeared to drift in the directionorthogonal to its orientation (Figure 9a). When all fourgratings were viewed together, however, they intermit-tently grouped to form a unique global motion whosedirection corresponded to the vector sum of the compo-nent motions (e.g., Lorenceau & Shiffrar, 1992). Withextended viewing, then, the display appeared bistable:Observers experienced perceptual fluctuations betweenglobal motion and local motion. To evaluate the role oftemporal structure in grouping, Alais and colleaguesindependently modulated in time the contrast levels ofthe four gratings and assessed the influence of this time-varying event on the incidence of global motion. Whencontrast modulations were correlated (meaning that thetiming and direction of contrast changes were the sameamong all four gratings, even though their absolute con-trast values differed), the four gratings were much morelikely to group into a single, coherent global object mov-ing in the vector sum direction; when contrast modula-tions were uncorrelated, local component motion wasmuch more likely. Alais and colleagues obtained similarresults in another experiment using two superimposed,drifting gratings (Figure 9b). Under appropriate condi-tions, this display, too, is perceptually bistable, appearingeither as two transparent gratings drifting in differentdirections or as a coherent “plaid” pattern moving in thedirection defined by the vector sum of the two compo-nent velocities (Adelson & Movshon, 1982). Again, theincidence of coherent motion was enhanced bycorrelated contrast modulations and was reduced byuncorrelated contrast modulations.

Temporal patterning of contrast modulation andgrouping has also been examined using another form ofbistable perception, binocular rivalry. When dissimilarpatterns are imaged on corresponding areas of the twoeyes, they compete, or rival, for perceptual dominance.When one views multiple pairs of rival targets spatiallydistributed within the visual field, dominance amongthose multiple targets can become entrained if those tar-gets are similar in color, orientation, or motion (Alais &Blake, 1998; Kovacs, Papathomas, Yang, & Feher, 1997;Whittle, Bloor, & Pocock, 1968). Alais and Blake exam-ined whether correlated contrast modulations of localrivalry patterns can also promote simultaneous domi-nance during piecemeal rivalry. Figure 10 shows sche-matics of the displays used in their study. During 60-secviewing periods, observers viewed these dichoptic rivaldisplays and pressed buttons to track conjoint domi-nance of the two gratings. Correlated contrast modula-tion between the two gratings promoted joint predomi-nance of those gratings more than did uncorrelatedmodulations. Moreover, the effectiveness of correlatedcontrast modulation was dependent on the spatial con-

figuration of the two gratings, being maximum when thetwo were collinear. This latter outcome is in line with theresults described above, showing that the efficacy oftemporal structure is affected by existing spatialstructure.

In a recent article, Suzuki and Grabowecky (2002)studied the role of temporal synchrony on grouping

32 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Figure 9: Examples of Visual Stimuli Used by Alais et al. (1998) toStudy the Role of Common Temporal Structure in VisualGrouping.

NOTE: (A) Four circular grating patches arrayed in a square configu-ration. Within each patch, contours of the grating drifted steadily inone direction (indicated by arrows). Over time, observers sometimessee the four separate directions of motion (“local” motion) and othertimes see the four motion vectors group and appear to form a single,partially occluded grating that drifts upward (“global” motion). Corre-lated contrast modulation of all four gratings enhances perception ofglobal motion, implying that common temporal structure promotesgrouping. (B) A plaid produced by superimposition of two sinusoidalgratings. The two gratings drift upward in a direction orthogonal totheir contour orientations (indicated by the white arrows). Perceptu-ally, the two gratings sometimes cohere and appear to move as a single“object” upward. Correlated contrast modulation of the two gratingsenhances perception of coherent, global motion.

Page 13: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

using overlapping visual features that flickered periodi-cally, one being two pairs of orthogonally oriented barsand the other a set of small circles. The bars switchedrepetitively from diagonal left to diagonal right, and theset of circles flashed on periodically in synchrony withone of the two bar orientations. The rate of change in barorientation was varied beyond a value at which theswitches were perceptible—beyond this exchange rate,

observers saw two, superimposed sets of oriented bars.However, the two orientations waxed and waned in visi-bility, with diagonal left being dominant for several sec-onds only to be replaced by diagonal right for a few sec-onds. This outcome is not surprising (see Atkinson,Campbell, Florentini, & Maffei, 1973). What is remark-able is that the set of dots appeared to be affixed to theoriented bars with which those dots were synchronouslyflashing; when the orthogonally oriented bars were dom-inant, the dots appeared to form a separate cluster of fea-tures seen transparently in relation to the bars. This“binding” of the dots to the bars flashing in synchronywas not observed when the dots were equiluminant withthe background, and thus defined by color alone. Thefailure of temporal structure to group color may well beattributable to the temporal sluggishness of thechromatic system.

So to sum up so far, studies using luminance-definedtemporal structure and contrast-defined temporal struc-ture all support the idea that human vision can use tem-poral structure for spatial grouping. The efficacy of tem-poral structure is greatest when existing spatial cuesdefine ambiguous spatial structure or when preexistingspatial structure does not exist in the absence of tempo-ral structure. When spatial and temporal cues are in con-flict, the relative salience of the two cues determines thegrouping outcome.

Grouping by Stochastic Temporal Structure

In many of the studies reviewed in the last section,local elements in one region of the display (the “figure”)flickered on and off repetitively in phase while, at thesame time, elements in the rest of the display (the“ground”) flickered together but out of phase with theflickering figure elements. This means, in other words,that only figure elements were visible in some frames ofthe animated sequence and only ground elements werevisible in other frames. This point is dramatized by slow-ing the flicker rate to the point where individual framescan be inspected—in this case, one can easily discern thefigure from ground because the two regions are explic-itly defined by spatial discontinuities in luminance, apotent cue for figure/ground segregation. As flickerrate increases, this luminance cue becomes less conspic-uous because of temporal summation across frames;consequently, figure/ground organization becomes lesssalient although discernible. Thought of in this way, onerealizes that the studies reviewed above do not unequivo-cally test whether human vision can group visual featuresbased strictly on temporal structure. In fact, what thosestudies were measuring is the upper temporal limit fordifferentiating sequential images each of which containsspatial structure defined by luminance discontinuities.

Blake, Lee / VISUAL TEMPORAL STRUCTURE 33

Figure 10NOTE: (A) Three Examples of dichoptic stimuli (left-eye and right-eyeimages viewed through a mirror stereoscope) used by Alais and Blake(1999) to study spatial and temporal factors influencing predomi-nance during binocular rivalry. Observers viewed a pair of half-imagesand pressed a button to indicate when both circular patches of gratingwere dominant simultaneously (“joint predominance”). The inci-dence of joint predominance was highest for the rival pair with collin-ear orientations (top panel) and lowest for the pair with orthogonalorientations (bottom panel). (B) When the contrast levels of both grat-ings were modulated over time in a correlated fashion (direction andmagnitude identical for both), the incidence of joint predominancewas elevated for all three grating configurations, relative to conditionswhere contrast levels were modulated in an uncorrelated fashion.

Page 14: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

In this respect, those studies resemble the form/partintegration task described earlier in this article.

Besides not unequivocally isolating temporal struc-ture, all of those studies utilized periodic flicker, a highlypredictable, deterministic signal. For several reasons, wequestion whether deterministic temporal sequences arethe preferred choice for studying grouping. First, deter-ministic temporal structure is quite rare in the naturalenvironment where the visual system more often con-fronts irregular and unpredictable events. Second, froman information theoretic viewpoint, periodic flicker con-veys very little information, which suggests that flickermay underestimate the ability of human vision to utilizetemporal cues for grouping. Finally, flicker leaves littleroom for manipulating temporal relationships amonglocal elements. When all elements undergo periodicfluctuation in luminance or in contrast, the only way tomanipulate the temporal relationship among those ele-ments is to vary their relative phase. But human visionmay be sensitive not only to phase differences but also tocoherence/incoherence in global temporal structureamong local visual elements.

Is there an alternative means for producing temporalstructure that bypasses these limitations inherent indeterministic visual signals? This requires creating ani-mations in which individual frames contain no staticcues whatsoever about spatial structure and in whichtemporal structure is unpredictable. How can this bedone?

A key for overcoming these challenges can be foundfrom an understanding of “temporal structure” con-veyed by time-varying signals. To illustrate, consider asingle spot of light that changes in luminance irregularlyover time. We may plot the actual luminance values as afunction of time, producing a graph like the one shownin Figure 11a. Alternatively, we may produce a time serieslike the one in Figure 11b that conveys just the points intime at which changes in luminance happen, irrespec-tive of the direction and magnitude of those changes.This latter plot is termed a “point process,” and it por-trays the temporal structure conveyed by this series ofevents. Now, if we can create an animation display inwhich groups of local elements differ only in their pointprocesses, and not any other stimulus properties whenconsidered frame by frame, that display can be said to bedevoid of static cues for grouping. Moreover, becausethe point process defines a stochastic event, the informa-tion content of the event would be greater, and perhapsmore effective, than the information content ofdeterministic events like flicker.

With those considerations in mind, Lee and Blake(1999a) devised a novel class of visual displays in whichglobal spatial form was defined exclusively from differ-ences in point process among elements undergoing

rapid irregular changes. In our original version of theseanimations, each individual frame contained an array ofmany small sinusoidal gratings each windowed by a sta-tionary, circular Gaussian envelope (Gabor patch; seeFigure 12). All of the Gabor patches had the same con-trast, and their orientations were randomly established.As the animation was played, grating contours withineach small, stationary Gabor patch moved in one of twodirections orthogonal to their orientation. Each gratingreversed direction of motion irregularly over timeaccording to a random (Poisson) process. Temporalstructure of each Gabor element was described by apoint process consisting of points in time at whichmotion reversed direction (see Figure 12). All Gabor ele-ments within a virtual figure region could be assignedthe same point process, in which case each of thosefigural Gabors would reverse their directions of motionsimultaneously; elements in the virtual surround wouldall be assigned a single point process that differed fromthe one carried by the figural Gabors, meaning that allthe surround Gabors would reverse direction of motionsimultaneously but often at different times than thefigural Gabors. When displays like these are viewed, evenfor just a few hundred milliseconds, the figure is easilydistinguished from the background. Here, the figureregion and the ground region differed only in theirpoint processes and not in terms of the contrast or orien-tation of individual elements. Furthermore, there was noinformation about the figure in any two successiveframes of the animation, because all contours movedfrom frame to frame. Therefore, only the differences inthe two sets of point processes (point processes specify-ing when motion direction changes) distinguished figurefrom background.

34 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

A

B

Time

Lum

inan

ce

Figure 11: Alternative Ways to Portray Irregular Changes in Luminanceof a Stimulus Spot Over Time.

NOTE: (A) The actual level of luminance is plotted as the function oftime; this kind of graph is comparable to the plots of deterministicchange shown in Figure 7. (B) Time series showing the points in time atwhich luminance level remains constant (open circles) and points intime at which it changes (filled circles); the resulting time series iscalled a point process, and it specifies temporal structure irrespectiveof absolute luminance value.

Page 15: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

Because all local elements throughout the displaychange directions of motion irregularly over time, wewere able to manipulate two potentially important prop-

erties governing temporal structure. First, the“predictability” (or “randomness”) of temporal struc-ture conveyed by individual elements was manipulatedby changing the probability distribution designating thetwo alternative directions of motion. Borrowing a con-cept from information theory, this predictability wasquantified by computing the entropy of temporal patternsamong the elements. Time-varying signals with highentropy convey more dynamic or finer temporal struc-ture, which means that systematic manipulation ofentropy of all the elements enabled us to measure howaccurately human vision can register fine temporal struc-ture. Second, the temporal relationship among ele-ments in the figure region could be manipulated by vary-ing the extent to which all possible pairs of thoseelements are correlated. Because the time points atwhich the elements change direction of motion could berepresented by point processes, the index of temporalrelationship among those elements could be quantifiedby computing the correlations among their point pro-cesses. By varying this index systematically, we were ableto measure the efficiency with which people can utilizetemporal structure. In our initial work (Lee & Blake,1999a), we found that increases in entropy and increasesin correlation among figure elements systematicallyenhanced the perceptual quality of spatial form createdby temporal structure (as assessed by performance on aforced-choice form identification task). We took this tomean that human vision can register fine temporal struc-ture with high fidelity and can efficiently construct spa-tial structure solely based on the temporal relationsamong local elements distributed over space. Readersare encouraged to visit the following Web site to seedemonstrations of spatial structure from stochastictemporal structure: http://www.psy.vanderbilt.edu/faculty/blake/TS/TS.html.

From our results, we concluded that perception ofspatial structure in these displays required multiple com-putational steps: (a) registration of changes in directionwithin local regions of the visual field, (b) registration ofthe points in time at which those direction changesoccur, and (c) identification of boundaries defined bydiscontinuities in temporal structure (a process thatmust operate globally over space). We noted that Steps aand b could plausibly be accomplished by transient neu-ral signals generated by motion-selective neurons, butwe offered no suggestions about how Step c might beaccomplished. In some later work (Lee & Blake, 2001),we found evidence suggesting that grouping could bemediated, in part, by lateral connections among spa-tially neighboring neurons, with the strength of thoseconnections dependent on the similarity in preferredorientation among those neurons.

Blake, Lee / VISUAL TEMPORAL STRUCTURE 35

Figure 12: Schematic of Stochastic Temporal Structure Display De-vised by Lee and Blake (1999a) To Study Spatial Groupingfrom Temporal Cues.

NOTE: (A) Animations are made using an array of Gabor patches(Gaussian-windowed sinusoidal gratings), with the orientation of thegrating in each patch being randomly determined (thereby preclud-ing any texture cue based on orientation). Over time, each small grat-ing drifts in a direction orthogonal to the orientation of its contours,and at randomly determined times, the direction of motion reverses(e.g., upward drifting contours begin drifting downward). It is impor-tant to realize that the patch itself remains stationary—only the con-tours within the patch move, and there is no coherent motion withinthe array because motion directions are tied to the random orienta-tions of the contours. All gratings within a virtual “target” regionchange their directions of motion at the same time, independently ofchanges in motion direction among the remaining “background” grat-ings falling outside this region. (B) Temporal structure in these sto-chastic animations is defined in terms of a “point-process” shown hereas a series of black “dots” denoting points in time at which motion di-rection changes. Each row corresponds to a given Gabor patch, and itcan be seen that motion direction changes within the target region andwithin the background occur in synchrony (i.e., their change times areperfectly correlated even though their directions of motion are ran-dom). The correlation in change times between target and back-ground regions is uncorrelated in this example, and when viewed on avideo monitor, the target region is easily distinguished from the back-ground. There are several variants of these stochastic displays, includ-ing ones where the background elements have uncorrelated pointprocesses and the target elements have correlated point processes. Thecorrelation between target and background point processes can alsobe manipulated to vary the salience of the target. Finally, it is possible toassign all elements throughout the array the same point process but in-troduce a temporal phase lag between target and background ele-ments.

Page 16: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

Shortly after we published our initial description ofthis novel technique and the results obtained using it,Adelson and Farid (1999) published a critique of thistechnique. In that critique, they questioned whethertemporal structure per se was responsible for the visibil-ity of a figure within these dynamic displays. They specu-lated that observers might instead rely on a luminance-based cue that could occasionally become available inthese stochastic displays, owing to contrast summation (apossibility we wrote about but rejected in our originalarticle). According to their argument, elements in thetarget region, but not the background region, mayreverse directions several times in succession and,through temporal blurring, momentarily create a targetregion of heightened contrast. (Alternatively, of course,it could be elements in the surround that summate inrelation to the target.) At other times during thesequence, target but not background elements mightcontinue moving in the same direction for a number ofsuccessive frames, thereby creating through temporalblurring a washed-out region relative to the background.Adelson and Farid confirmed their intuitions by creatingnew animations from one of our original ones, this timemaking the contrast of each element in each frame aweighted average of a number of the immediately pre-ceding frames, thereby mimicking the output of alowpass temporal filter. Inspection of those hybrid ani-mations did indeed reveal infrequent instances whereoverall contrast within the figure differed from contrastin the surround.

Adelson and Farid’s (1999) simulation utilized thesimplest possible means for computing the outputs fromtheir filter: averaging luminance over time on a pixel-by-pixel basis, hardly what the visual system would do iflowpass temporal filtering were involved. Nonetheless,their analysis and simulation motivated us to performseveral tests of their hypothesis (Lee & Blake, 1999b).First, we indexed animations from our experiment interms of the magnitude of the theoretical luminance cuecreated by temporal blurring, using the Adelson andFarid lowpass filtering model. We then computed thecorrelation between this index and the actualpsychophysical performance associated with given dis-plays. We found no correlation between the two, imply-ing that observers were not using this luminance cueeven if it were indeed available. Second, we created newstochastic displays from which we removed multiplereversals (“jitter”) and extended periods without rever-sals (“runs”), the putative culprits implicated by Adelsonand Farid’s analysis. We also introduced differences inaverage luminance among elements throughout the dis-play, randomly and independently of temporal struc-ture. Finally, we randomized the contrast of each movingelement within the array on a frame-by-frame basis.

These maneuvers thoroughly eliminate any potentialluminance-based cues, as verified by simulations usingthe Adelson and Farid lowpass filter. Despite these modi-fications, form from temporal structure was clearly visi-ble. This latter observation is notable, because random-izing contrast and luminance actually creates visualnoise that would conflict with form created by temporalstructure. Nonetheless, temporal structure was aneffective cue for grouping.

Farid and Adelson (2001) next challenged the effi-cacy of temporal structure in grouping based on resultsusing a modified temporal structure display. In their newanimation, drifting dots within a “target” region simulta-neously changed directions of motion over time,whereas dots within the remaining “background” regionsimultaneously changed their directions at timesuncorrelated with target change times. Only when angu-lar changes in direction of motion were large did the tar-get dots perceptually group to form a figure whose shapecould be accurately identified; angular changes 120degrees or less yield near-chance performance despitethe presence of temporal asynchrony between targetand background regions.

Farid and Adelson (2001) reported that perceptualperformance covaried with the strength of the outputfrom a temporal filter whose response h(t) is given by:

h(t) = (kt/τ)n e–kt/τ (1/n! – (kt/τ)2/(n + 2)!).

This equation defines a biphasic temporal filter, which iswell suited for registering abrupt, transient visualchanges. Indeed, in our original article, we hypothesizedthat just such “transient detectors” may be involved insignaling temporal structure in stochastic displays in-volving spatially distributed, time-varying events. Faridand Adelson characterized the output of these filters asproviding a “temporal contrast cue” that obviates theneed for positing a role for temporal synchrony in visualgrouping. In our view, temporal synchrony in the stimulusis precisely what produces the structured output in thearray of transient filters. Again, as in their previous publi-cation (Adelson & Farid, 1999), Farid and Adelson usedstatic pictures to portray the outputs of biphasic filtersduring a given instant in time. Such a portrayal gives theimpression that these stochastic displays generate lumi-nance contrast visible in the spatial domain. But the “con-trast” cue arises in the temporal domain, which is exactlythe point we made in our descriptions of these stimuli.Moreover, the energy associated with transients (i.e., theoutputs of biphasic filters) fluctuates very rapidly overtime, in a pattern strongly correlated (r > .8) with thetemporal structure of the animation displays. This im-plies that dynamic, not static, temporal patterns of tran-sient signals underlie figure-ground segregation in these

36 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Page 17: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

displays. So, we believe Farid and Adelson have identi-fied new conditions under which the strengths of theoutputs of transient detectors are closely correlated withhuman observers’ performance on a visual groupingtask. In our view, this finding reinforces the hypothesisthat human vision possesses the ability to group localfeatures based on temporal structure.

Shortly after the appearance of Adelson and Farid’sarticles, Morgan and Castet (2002) published an articledescribing two experiments using stochastic displaysmuch like those schematized in Figure 12. In one experi-ment, small Gabor elements shifted phase randomlyover time, with the elements in the target region chang-ing direction at the same time and elements in the back-ground region changing directions at random times rel-ative to one another. Morgan and Castet found thatperformance was comparable regardless of whether allGabor elements were equal in contrast or were random-ized across the entire array (a maneuver designed to“camouflage” a figure based on contrast grouping). Thisaspect of their article replicates the findings of Lee andBlake (1999b), further undermining the hypothesis thatform from temporal structure results from contrast arti-facts in these displays. In a second experiment, Morganand Castet sought to test the hypothesis that differencesin motion appearance between target and backgroundregions were responsible for the visibility of the targetregion. By “motion appearance,” they were referring tothe runs and jitters discussed above—Morgan and Castetobserved that targets comprising sequences with longstretches of unchanging motion direction produced“strong motion,” whereas targets comprising sequenceswith a series of reversals appeared to “shiver.” To mini-mize motion appearance as a cue for distinguishing tar-get and background elements, Morgan and Castet cre-ated an array of Gabor elements all with the same pointprocess (and, hence, all with the same temporal struc-ture). Gabor elements within the target region allreversed in direction at the same time (i.e., the pointprocesses for all elements were in phase), and Gabor ele-ments within the background had starting frames thatwere randomized among the elements (i.e., the phasesof the point processes for all background elements wereuncorrelated). Based on inspection of these animations,Morgan and Castet concluded that coherent motioncues were not visible except in sequences that, bychance, had a string of runs followed by a string of jitter,or vice versa. They also observed that the target area wasnot visible, except in those unusual sequences withstrings of runs and jitters. From these observations, theyconcluded that temporal structure is at best a weak cueoperating only at relatively low temporal frequencies.Morgan and Castet acknowledged that their procedure,because of the limited duration of their sequences,

increased the opportunity for synchronized changeswithin target and background regions, which is bound tocompromise discriminability of the target region. Theyconcluded their article by writing, “We agree (as is obvi-ous on logical grounds) that synchrony of relatively lowtemporal frequency modulations of motion and contrastcan be the basis for segregation . . . but we argue that thetemporal grain is not at the 1000 Hz rate that would berequired for synchrony of individual neural spikes”(Morgan & Castet, 2002, p. 516).

We have no quarrel with this conclusion, although weare puzzled by Morgan and Castet’s failure to perceivespatial grouping when target and background were dis-tinguished only by phase. That observation is inconsis-tent with work summarized earlier (e.g., Sekuler &Bennett, 2001) and with results obtained in our lab usingforced-choice testing and the same kind of display as thatused by Morgan and Castet. In his dissertation, Lee(2002) performed an experiment in which grating ele-ments in the figure and background regions all hadexactly the same point process (and, hence, the samefirst-order temporal statistics). The only cue defining thefigure was a temporal phase shift of the “target” grating

Blake, Lee / VISUAL TEMPORAL STRUCTURE 37

Figure 13: Results From A Two-Alternative, Forced-Choice Experi-ment in Which Observers Were Required to Report the Ori-entation (“Vertical” vs. “Horizontal”) of a “Target” RegionDefined by Common Temporal Structure.

NOTE: The display was much like the one shown in Figure 11, exceptthat all Gabor patches had identical point processes, with the only dis-tinguishing feature being that the point process of the Gabor patchesdefining the target were phase-shifted in time relative to the point pro-cess of the Gabor patches defining the background. The graph plotspercent-correct (chance performance = 50%) as the function of theduration of the phase shift between target and background.

Page 18: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

elements relative to the background elements. He foundthat observers performed well above chance levels on a2AFC task with phase-lags as brief as 16.7 msec (see Fig-ure 13). Those results, which have been published inabstract form (Blake & Lee, 2002), indicate that syn-chrony per se can support figure/ground segmentation.This phase shift value also squares very nicely with thevalue of 15 msec found in other studies (e.g., Fahle,1993; Leonards et al., 1996) including a very recentlypublished article by Kandil and Fahle (2004) usingphase-lagged, deterministic temporal structure.

As an aside, we are grateful for the interest in group-ing and temporal structure expressed by Adelson andFarid (1999) and Farid and Adelson (2001) and by Mor-gan and Castet (2002)—their critiques of our techniqueand ideas have sharpened current thinking about theproblem. There is no disagreement that temporal struc-ture can support spatial grouping—the debate now cen-ters around the nature of the mechanisms responsiblefor that grouping and the temporal “grain” of thosemechanisms. It is correct to say that the articles by thesetwo groups of investigators have drawn wider attentionto the problem of temporal structure. In so doing, theirviews on our work have stimulated additional studies onthe problem, studies that we turn to next.

Perception of shape from stochastic temporal struc-ture is reminiscent of perception of depth and formfrom random dot stereograms (where form is definedsolely by retinal disparity): For both information sources(temporal structure and disparity), the emergent formcan take some time to materialize perceptually when onefirst views examples of these unusual displays. Practicehelps. In the case of temporal structure, Aslin, Blake,and Chun (2002) showed that the people get better atidentifying form from temporal structure when givendaily training sessions with feedback. Performancereached asymptotic levels within a week or less. Signifi-cantly, this improvement in the ability to identify theorientation of a target region defined by temporalstructure did not transfer to a task requiring the identi-fication of luminance-defined form. This failure oftransfer was found even though both luminance- andtemporal structure–defined animations comprisedidentical stimulus elements, and the task itself was thesame. Evidently, the cue used during the temporal struc-ture phase of the experiment had nothing to do withluminance contrast. What did observers actually learnwhile practicing this task? Aslin et al. entertained twopossibilities. First, training might produce experience-dependent increases in the temporal resolution of neu-ral elements registering changes in direction of motion.Recall that both Lee and Blake (1999a) and Farid andAdelson (2001) speculated that synchronized changes inmotion direction of the sort used in these temporal

structure displays might stimulate neurons selectivelyresponsive to stimulus transients. Experience-dependentchanges in the time constants of such neural elementscould alter the fidelity with which temporal structure isregistered. However, there is another component to thetask, namely, the extraction of the spatial distribution ofthose stimulus elements embodying common temporalstructure. Thus, training could also increase the effi-ciency with which synchronous activity is grouped acrossdistributed neuronal populations representing differentregions of visual space.

In some very recent work, Guttman, Gilroy, and Blake(in press) investigated the extent to which shape fromstochastic temporal structure depends on the similarityamong the “messengers” signaling that temporal struc-ture. In their experiments, observers viewed arrays ofGabor patches in which figure and ground were desig-nated by different temporal structures; the signals defin-ing temporal structure could be changes in orientation,in spatial frequency, in phase, and/or in contrast.Results from several experiments showed that observerscould perceive shape from temporal structure evenwhen the defining events had to be combined across dif-ferent messengers (i.e., changes within different dimen-sions). Moreover, mixing messengers of temporal struc-ture proved to be cost-free: Grouping across messengersproduced performance approximately the same as didgrouping within a single messenger. These findingsshow that vision can abstract temporal structure regard-less of the messenger of the dynamic event; a coherentspatial structure emerges from this abstracted temporalstructure.

Before concluding our survey of work on temporalstructure and spatial grouping, we want briefly to con-sider possible neurophysiological implications from thework summarized here. This we do in the followingsection.

IMPLICATIONS FOR NEUROPHYSIOLOGY

The psychophysical studies reviewed in the previoussections indicate that the human visual system possessestwo important abilities: (a) the ability to resolve tempo-ral asynchrony down to stimulus onsets differing by lessthan 5 msec and (b) the ability to utilize informationabout temporal structure among spatially distributedvisual features for the extraction of spatial structure.This, in turn, naturally leads to questions about neuralsubstrates underlying those psychophysical abilities.How do visual neurons encode fine temporal structureof dynamic visual stimuli? How does the brain computethe temporal relationship among neural populationsregistering visual features distributed over space?

38 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Page 19: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

Detailed consideration of these questions lies beyondthe scope of this article, but we do want to offer a few ten-tative conclusions about possible neural mechanisms oftemporal resolution and grouping from temporal struc-ture. Our conclusions are tempered by the realizationthat there exists no consensus among neuroscientistsconcerning how stimulus information is coded withinneural spike trains (Bullock, 1968). Within neurophy-siology, two candidate neural coding schemes aretypically contrasted, namely, rate coding and temporalcorrelation. According to the rate coding hypothesis,information is conveyed by the average firing rates ofneurons (Barlow, 1972; Shadlen & Newsome, 1994);according to the temporal correlation hypothesis, it isthe timing of individual action potentials that embodiesstimulus information (Abeles, 1982; Mainen &Sejnowski, 1995; Softky & Koch, 1993). We shall considerthe issue of temporal resolution and grouping within thecontext of these two possible codes.

Neural Coding of Temporal Visual Structure

The excellent temporal acuity/resolution evidencedby human vision (e.g., Westheimer & McKee, 1977)implies that neurons can modulate their responses in amanner time-locked to external visual events. How is thistime-locked modulation of neural response achieved?How, in other words, do neurons carry informationabout the fine temporal structure portrayed by dynamicvisual stimuli?

Temporal correlation is certainly a feasible possibilityfor registering temporal structure. Indeed, it has beenshown that individual neurons can reliably reproduceessentially the same spike trains when the same time-varying stimulus is repeatedly presented (Bair & Koch,1996; Berry, Warland, & Meister, 1997; Mainen &Sejnowski, 1995). It has also been shown that the finetemporal structure contained in external stimulationcan be reconstructed solely on the basis of informationgiven in the spike timings of single neurons (Bialek,Rieke, de Ruyter van Steveninck, & Warland, 1991;Buracas, Zador, DeWeese, & Albright, 1998).

Can the rate code hypothesis also account for the finetemporal resolution of human vision? The dischargerate of a single neuron is unlikely to modulate reliablyenough to encode the fine temporal structure of time-varying stimuli, because information about the timingsof individual spikes is not preserved in firing rate coding(rate, by definition, involves integration over time). If weassume, however, that a given stimulus feature is redun-dantly encoded by an ensemble of neurons with similarreceptive field properties, the observed temporal fidelityof human vision can be explained by the average firingrate among such an ensemble of neurons: The averagefiring rate of the ensemble can fluctuate in a time-locked

fashion to time-varying stimuli with temporal precisionof 5-10 msec, which corresponds to an effective samplingrate suggested by the rate coding hypothesis (Shadlen &Newsome, 1994). For a more complete discussion of ratecoding versus neural synchrony, see Shadlen andMovshon (1999).

Therefore, psychophysical performance itself doesnot favor one coding hypothesis over the other—bothcoding schemes can explain the temporal acuity ofhuman vision.

Neural Representation of Spatial StructureFrom Temporal Structure

Several research groups have argued forcefully thatsynchrony in spiking activity among groups of visual neu-rons is responsible for grouping, or “binding,” of thelocal features activating those neurons. Engel andSinger (2001) provided a detailed description of thishypothesis, and they summarized much of theneurophysiological evidence favoring it. Roelfsema,Lamme, and Spekreijse (2004) presented evidenceagainst the neural synchrony hypothesis, includingmultiunit recordings from the primary visual cortex ofmonkeys performing a grouping task. It is fair to say thatneural synchrony’s role in feature binding remains con-troversial and unresolved. Can we conclude that thepsychophysical evidence showing spatial grouping basedon temporal structure provides support for the neuralsynchronization hypothesis? For the reasons explainedin the following paragraphs, we believe such a conclu-sion—though it may ultimately prove correct—would bepremature.

According to the temporal correlation hypothesis,local visual elements flickering in synchrony aregrouped together into a coherent percept because theaction potentials of neurons responsive to those ele-ments are synchronized in a stimulus-locked fashion.These synchronized spike trains constitute the glueforming a neural assembly representing a coherent per-cept. This synchronization hypothesis is based on twocritical assumptions linking temporal modulation ofexternal stimuli to changes in neural responses overtime: (a) temporal modulation of external stimuli evokesneural responses whose spike timing is entrained, that is,synchronized, with the temporal structure associatedwith stimulus modulation; (b) those synchronized neu-ral responses are similar to the synchronous activity ofcortical neurons mediating perceptual grouping. Thislinking hypothesis thus requires endorsement of thesetwo assumptions. Does evidence warrant that endorse-ment? For the following reasons, we believe neither ofthose assumptions currently rests on sufficiently firmground.

Blake, Lee / VISUAL TEMPORAL STRUCTURE 39

Page 20: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

The first assumption is tantamount to endorsing thetemporal correlation hypothesis. According to thishypothesis, which claims that variations of spike trains ofneurons are locked to variations of external stimuli withhigh temporal precision, two external stimuli with iden-tical temporal structure will produce action potentials intwo neural populations that are highly correlated overtime. According to the rate coding hypothesis, however,responses of two neural populations responsive to exter-nal stimuli modulating in synchrony can be correlatednot in spike timing but in averaged firing rate. Althoughspike counts of a single neuron cannot reliably followvariation of an external stimulus, average firing rateswithin an ensemble of neurons representing the samestimulus can represent instantaneous changes of thestimulus in a stimulus-locked manner. Thus, synchro-nized external stimulation could result either in corre-lated spike timings or in correlated firing rates amongneural populations. Consequently, the ability of com-mon temporal structure to promote grouping of localvisual features could be mediated by synchronized spiketimings or synchronized firing rates. The psychophysicalevidence is not definitive with respect to eitherneurophysiological hypothesis.

Even if Assumption 1 were to prove valid, thereremains a serious problem with the second assumption.According to the neural synchronization hypothesis, theentrainment of neural discharges responsible for group-ing of visual features is achieved by intrinsic circuitrywithin the brain, not necessarily by temporal modula-tion imposed by external stimulation (Singer & Gray,1995). In the psychophysical experiments, of course, thesynchronization of neural discharges (if such exists)arises from phase locking of individual neural responsesto the temporal structure of external stimuli. It is prema-ture to assume that externally induced synchronizationof neural activity serves the same function as internallygenerated synchronization. Therefore, the conclusionsderived from psychophysical demonstrations of group-ing by temporal structure do not necessarily bear on thenature of synchronization embodied in the temporalcorrelation model. This is a point that we should havestressed more forcefully in our earlier work (Lee &Blake, 1999a), for it is the aspect of our work that hasdrawn the greatest criticism (Morgan & Castet, 2002).

CONCLUSION

So where do matters stand with respect to the capacityof temporal structure to promote spatial grouping?There is general agreement that synchronized visualevents tend to be perceptually grouped, whether thoseevents constitute flicker, changes in motion direction, orchanges in contrast. In this respect, the work summa-

rized in this article has expanded the domain of“common fate” beyond that envisioned by the Gestaltpsychologists. Moreover, the efficacy of temporal struc-ture depends on the spatial properties of the stimulusfeatures that carry information about spatial structure.Common fate, in other words, interacts synergisticallywith other Gestalt grouping principles such as goodcontinuation (e.g., Lee & Blake, 2001).

The major disagreements in the literature concernthe mechanism(s) responsible for grouping from tem-poral structure and the temporal resolution underlyinggrouping from temporal structure. We have no doubtthat future work can and will resolve these debates.

Regardless how these issues are resolved, we continueto believe that stochastic temporal structure of the sortintroduced by us (Lee & Blake, 1999a) provides a robustmeans for probing perceptual grouping based on com-mon fate; it may constitute “uncontrolled randomness”(to borrow the phrase coined by Morgan & Castet,2002), but that is precisely its virtue. We say this becausethe optical input to vision is replete with complex,unpredictable temporal structure associated with themovement of objects within the environment. Our eyesand brains have evolved in a dynamic visual world, so itstands to reason that vision would evolve mechanisms toexploit this rich source of information.

In closing, we would like to reiterate a point made inan earlier essay on temporal structure and spatial group-ing, a point having to do with the role of temporal struc-ture in solving the problem of grouping, or “binding” asit is sometimes called, visual features into coherentobject descriptions:

We are led to speculate whether the rich temporal struc-ture characteristic of normal vision may, in fact, imprintits signature from the outset of neural processing. If thiswere truly the case, then concern about the bindingproblem would fade, for there would be no need for amechanism to reassemble the bits and pieces comprisingvisual objects. Perhaps temporal structure insures thatneural representations of object “components” remainconjoined from the very outset of visual processing. Con-strued in this way, the brain’s job is rather different fromthat facing the King’s horses and men who tried to putHumpty Dumpty back together. Instead of piecingtogether the parts of a visual puzzle, the brain may reso-nate to spatio-temporal structure contained in the opti-cal input to vision. (Blake & Lee, 2000, p. 647).

NOTES1. Not all models of masking invoke temporal integration. For

example, a recent model by Enns and Di Lollo (2000) posits that mask-ing results from a mismatch between stimulus representations in differ-ent modules within the visual stream.

2. To see demonstrations of stochastic temporal structure, navigateto the first author’s Web site and follow the research links to “temporalstructure.”

40 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS

Page 21: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

3. Motion detectors activated by luminance can respond to changesin luminance within neighboring spatial regions. Thus, flickering orluminance-modulated elements can activate motion detectors,thereby creating motion-defined boundaries (see Kandil & Fahle,2003).

REFERENCES

Abeles, M. (1982). Role of the cortical neuron: Integrator or coinci-dence detector? Israeli Journal of Medical Science, 18, 83-92.

Adelson, E. H., & Farid, H. (1999). Filtering reveals form in tempo-rally structured displays. Science, 286, 2231a.

Adelson, E. H., & Movshon, J. A. (1982).Phenomenal coherence ofmoving visual patterns. Nature, 300, 523-525.

Alais, D., & Blake, R. (1998). Interactions between global motion andlocal binocular rivalry. Vision Research, 38, 637-644.

Alais, D., & Blake, R. (1999). Grouping visual features during binocu-lar rivalry. Vision Research, 39, 4341-4353.

Alais, D., Blake, R., & Lee, S.-H. (1998). Visual features that covarytogether over time group together over space. Nature Neuroscience,1, 163-168.

Anstis, S. M. (1970). Phi movement as subtraction process. VisionResearch, 10, 1411-1430.

Arnold, D. H., Clifford, C. W. G., & Wenderoth, P. (2001). Asynchron-ous processing in vision: color leads motion. Current Biology, 11,596-600.

Aslin, C., Blake, R., & Chun, M. M. (2002). Perceptual learning oftemporal structure. Vision Research, 42, 3019-3030.

Atkinson, J., Campbell, F. W., Fiorentini, A., & Maffei, L. (1973). Thedependence of monocular rivalry on spatial frequency. Perception,2, 127-133.

Bair, W., & Koch, C. (1996). Temporal precision of spike trains inextrastriate cortex of the behaving macaque monkey. Neural Com-putation, 8, 1185-1202.

Barlow, H. B. (1972). Single units and perception: A neuron doctrinefor perceptual psychology. Perception, 1, 371-394.

Berry, M. J., Warland, D. K., & Meister, M. (1997). The structure andprecision of retinal spike trains. Proceedings of the National Academyof Sciences USA, 94, 5411-5416.

Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R., & Warland, D.(1991). Reading a neural code. Science, 252, 1854-1857.

Blake, R., & Lee, S. H. (2000). Temporal structure in the input tovision can promote spatial grouping. In S.-W. Lee, H. H. Bülthoff,& T. Poggio (Eds.), Biologically motivated computer vision (pp. 635-653). Berlin: Springer-Verlag.

Blake, R., & Lee, S. H. (2002). Temporal precision of visual groupingfrom temporal structure. Journal of Vision, 2, 233a. Retrieved fromhttp://journalofvision.org/2/7/233/

Breitmeyer, B. G. (1978). Disinhibition in metacontrast masking ofvernier acuity targets: Sustained channels inhibit transient chan-nels, Vision Research, 18, 1401-1405.

Breitmeyer, B. G. (1984). Visual masking: An integrative approach. NewYork: Oxford University Press.

Breitmeyer, B. G., & Ganz, L. (1976). Implications of sustained andtransient channels for theories of visual pattern masking, saccadicsuppression, and information processing. Psychological Review, 83,1-36.

Brook, D., & Wynne, R. J. (1988). Signal processing: Principles and appli-cations. London: Edward Arnold.

Bullock, T. H. (1968). Representation of information in neurons andsites for molecular participation. Proceedings of National Academy ofSciences USA, 60, 1058-1068.

Buracas, G. T., Zador, A. M., DeWeese, M. R., & Albright, T. D. (1998).Efficient discrimination of temporal patterns by motion-sensitiveneurons in primate visual cortex. Neuron, 20, 959-969.

Burr, D. (1980). Motion smear. Nature, 284, 164-165.Corfield, R., Frosdick, J. P., & Campbell, F. W. (1978). Greyout elimi-

nation: The roles of spatial waveform, frequency and phase. VisionResearch, 18, 1305-1311.

de Coulon, F. (1986). Signal theory and processing. Dedham, MA:Artech House.

Dixon, P., & Di Lollo, V. (1994). Beyond visual persistence: An alter-native account of temporal integration and segmentation in visualprocessing. Cognitive Psychology, 26, 33-63.

Efron, R. (1957). Stereoscopic vision I: Effect of binocular summa-tion. British Journal of Ophthalmology, 41, 709-730.

Engel, A. K., & Singer, W. (2001). Temporal binding and the neuralcorrelates of sensory awareness. Trends in Cognitive Sciences, 5, 16-25.

Enns, J. T., & Di Lollo, V. (2000). What’s new in visual masking? Trendsin Cognitive Sciences, 4, 345-352.

Eriksen, C. W., & Collins, J. F. (1967). Some temporal characteristicsof visual pattern perception. Journal of Experimental Psychology, 74,476-484.

Exner, S. (1875). Experimentelle Untersuchungen der einfachstenpsychischen Processe. Pflugers Arch Ges Physiol, 11, 403-432.

Fahle, M. (1993). Figure-ground discrimination from temporal infor-mation. Proceedings of the Royal Society of London, B, 254, 199-203.

Fahle, M., & Koch, C. (1995). Spatial displacement, but not temporalsynchrony, destroys figural binding. Vision Research, 35, 491-494.

Farid, H., & Adelson, E. H. (2001). Synchrony does not promotegrouping in temporally structured displays. Nature Neuroscience, 4,875-876.

Forte, J., Hogben, J. H., & Ross, J. (1999). Spatial limitations of tempo-ral segmentation. Vision Research, 39, 4052-4061.

Geisler, W. S., Perry, J. S., Super, B. J., & Gallogly, D. P. (2001). Edge co-occurrence in natural images predicts contour grouping perfor-mance. Vision Research, 41, 711-724.

Georgeson, M. A., & Georgeson, J. M. (1985). On seeing temporalgaps between gratings: A criterion problem for measurement ofvisible persistence. Vision Research, 25, 1729-1733.

Guttman, S., Gilroy, L. A., & Blake, R. (in press). Mixed messengers,unified message: Spatial grouping from temporal structure. VisionResearch.

Hirsh, I. J., & Sherrick, C. E., Jr. (1961). Perceived order in differentsense modalities. Journal of Experimental Psychology, 62, 423-432.

Hoffman, D. D. (1998). Visual intelligence: How we create what we see.New York: W. W. Norton.

Hogben, J. H., & Di Lollo, V. (1974). Perceptual integration and per-ceptual segregation of brief visual stimuli. Vision Research, 14,1059-1069.

Hogben, J. H., & Di Lollo, V. (1985). Suppression of visible persis-tence in apparent motion. Perception & Psychophysics, 38, 450-460.

Julesz, B. (1971). Fundamentals of cyclopean perception. Chicago: Univer-sity of Chicago Press.

Julesz, B., & White, B. W. (1969). Short term visual memory and thePulfrich Phenomenon. Nature, 22, 639-641.

Kahneman, D. (1968). Method, findings, and theory in studies ofvisual masking. Psychological Bulletin, 70, 404-425.

Kandil, F. I., & Fahle, M. (2001). Purely temporal figure-ground seg-mentation. European Journal of Neuroscience, 13, 2004-2008.

Kandil, F. I., & Fahle, M. (2003). Mechanisms of time-based figure-ground segmentation. European Journal of Neuroscience, 18, 2874-2882.

Kandil, F. I., & Fahle, M. (2004). Figure-ground segregation can relyon differences in motion direction. Vision Research, 44, 3177-3182.

Kiper, D. C., Gegenfurtner, K. R., & Movshon, J. A. (1996). Corticaloscillatory responses do not affect visual segmentation. VisionResearch, 36, 539-544.

Kojima, H. (1998). Figure/ground segregation from temporal delayis best at high spatial frequencies. Vision Research, 38, 3729-3734.

Kovacs, I., Papathomas, T. V., Yang, M., & Feher, A. (1997). When thebrain changes its mind: Interocular grouping during binocularrivalry. Proceedings of the National Academy of Sciences USA, 93, 15508-15511.

Lappin, J. S., & Bell, H. H. (1972). Perceptual differentiation ofsequential visual patterns. Perception & Psychophysics, 12(2A), 129-134.

Lee, S.-H. (2002). Temporally correlated visual input and perceptualgrouping. Unpublished dissertation, Vanderbilt University, Nash-ville, TN.

Lee, S.-H., & Blake, R. (1999a). Visual form created solely from tem-poral structure. Science, 284, 1165-1168.

Blake, Lee / VISUAL TEMPORAL STRUCTURE 41

Page 22: The Role of T emporal Structure in Human V ision · 2010. 3. 30. · 10.1177/1534582305276839 BEHA VIORAL AND COGNITIVE NEUROSCIENCE REVIEWS Blake, Lee / VISUAL TEMPORAL STRUCTURE

Lee, S.-H., & Blake, R. (1999b). Reply to Adelson and Farid. Science,286, 2231.

Lee, S.-H., & Blake, R. (2001). Neural synergy in visual grouping:When good continuation meets common fate. Vision Research, 41,2057-2064.

Leonards, U., Singer, W., & Fahle, M. (1996). The influence of tempo-ral phase differences on texture segmentation. Vision Research, 36,2689-2697.

Lorenceau, J., & Shiffrar, M. (1992). The influence of terminators onmotion integration across space. Vision Research, 32, 263-273.

Mainen, Z. F., & Sejnowski, T. J. (1995). Reliability of spike timing inneocortical neurons. Science, 268, 1503-1506.

Marr, D. (1982). Vision: A computational investigation into the human rep-resentation and processing of visual information. San Francisco:Freeman.

Meyer, G. E., & Maguire, W. M. (1977). Spatial frequency and themediation of short-term visual storage. Science, 198, 524-525.

Morgan M., & Castet, E. (2002). High temporal frequency synchronyis insufficient for perceptual grouping. Proceedings of the Royal Soci-ety of London, B, 269, 513-516.

Motoyoshi, I. (2004). The role of spatial interactions in perceptualsynchrony. Journal of Vision, 4, 352-361. Retrieved from http://journalofvision.org/4/5/1/

Moutoussis, K., & Zeki, S. (1997). A direct demonstration of percep-tual asynchrony in vision. Proceedings of the Royal Society of London, B,264, 393-399.

Nishida, S., & Johnston, A. (2002). Marker correspondence, not pro-cessing latency, determines temporal binding of visual attributes.Current Biology, 12, 359-368.

Ogle, K. N. (1963). Stereoscopic depth perception and exposuredelay between images to the two eyes. Journal of the Optical Society ofAmerica, 53, 1296-1304.

Roelfsema, P. R., Lamme, V. A. F., & Spekreijse, H. (2004). Synchronyand covariation of firing rates in the primary visual cortex duringcontour grouping. Nature Neuroscience, 7, 982-991.

Rogers-Ramachandran, D. C., & Ramachandran, V. S. (1998).Psychophysical evidence for boundary and surface systems inhuman vision. Vision Research, 38, 71-77.

Ross, J., & Hogben, J. H. (1974). Short-term memory in stereopsis.Vision Research, 14, 1195-1201.

Sekuler, A. B., & Bennett, P. J. (2001). Generalized common fate:Grouping by common luminance changes. Psychological Science,12, 437-444.

Shadlen, M. N., & Movshon, J. A. (1999). Synchrony unbound: Acritical evaluation of the temporal binding hypothesis. Neuron,24, 67-77.

Shadlen, M. N., & Newsome, W. T. (1994). Noise, neural codes andcortical organization. Current Opinion in Neurobiology, 4, 569-579.

Singer, W., & Gray, C. M. (1995). Visual feature integration and thetemporal correlation hypothesis. Annual Review of Neuroscience, 18,555-586.

Smith, G., Howell, E. R., & Stanley, G. (1982). Spatial frequency andthe detection of temporal discontinuity in superimposed andadjacent gratings. Perception & Psychophysics, 31, 293-297.

Softky, W., & Koch, C. (1993). The irregular firing of cortical cells isinconsistent with temporal integration of random EPSP’s. Journalof Neuroscience, 13, 334-350.

Suzuki, S., & Grabowecky, M. (2002). Overlapping features can beparsed on the basis of rapid temporal cues that produce stableemergent percepts. Vision Research, 42, 2669-2692.

Sweet, A. L. (1953). Temporal discrimination by the human eye. Amer-ican Journal of Psychology, 66, 185-198.

Treisman, A. (1999). Solutions to the binding problem: Progressthrough controversy and convergence. Neuron, 24, 105-110.

Usher, M., & Donnelly, N. (1998). Visual synchrony affects bindingand segmentation in perception. Nature, 394, 179-182.

VanRullen, R., & Koch, C. (2003). Is perception discrete or continu-ous? Trends in Cognitive Sciences, 7, 207-213.

Watson, A. B. (1986). Temporal sensitivity. In K. R. Boff, L. Kaufman,& J. P. Thomas (Eds.), Handbook of perception and human performance(Vol. 1: Sensory processes and perception, pp. 6.1-6.43). NewYork: John Wiley.

Wertheimer, M. (1912). Experimentelle Studien Ïber das Sehen vonBewegungen’. Zeitschrift fur Psychologie und Physiologie derSinnesorgane, 61, 161-265.

Westheimer, G., & McKee, S. P. (1977). Perception of temporal orderin adjacent visual stimuli. Vision Research, 17, 887-892.

Whittle, P., Bloor, D., & Pocock, S. (1968). Some experiments onfigural effects in binocular rivalry. Perception & Psychophysics, 4,183-188.

Yund, E. W., & Efron, R. (1974). Dichoptic and dichotic micropatterndiscrimination. Perception & Psychophysics, 15, 383-390.

Zacks, J. L. (1970). Temporal summation phenomena at threshold:Their relation to visual mechanisms. Science, 170, 197-199.

42 BEHAVIORAL AND COGNITIVE NEUROSCIENCE REVIEWS