rotating columns: relating structure-from-motion...

12
Rotating columns: Relating structure-from-motion, accretion/ deletion, and figure/ground Vicky Froyen $ Department of Psychology, Center for Cognitive Science, Rutgers University, New Brunswick, NJ, USA Jacob Feldman # $ Department of Psychology, Center for Cognitive Science, Rutgers University, New Brunswick, NJ, USA Manish Singh # $ Department of Psychology, Center for Cognitive Science, Rutgers University, New Brunswick, NJ, USA We present a novel phenomenon involving an interaction between accretion deletion, figure-ground interpretation, and structure-from-motion. Our displays contain alternating light and dark vertical regions in which random-dot textures moved horizontally at constant speed but in opposite directions in alternating regions. This motion is consistent with all the light regions in front, with the dark regions completing amodally into a single large surface moving in the background, or vice versa. Surprisingly, the regions that are perceived as figural are also perceived as 3-D volumes rotating in depth (like rotating columns)— despite the fact that dot motion is not consistent with 3- D rotation. In a series of experiments, we found we could manipulate which set of regions is perceived as rotating volumes simply by varying known geometric cues to figure ground, including convexity, parallelism, symmetry, and relative area. Subjects indicated which colored regions they perceived as rotating. For our displays we found convexity to be a stronger cue than either symmetry or parallelism. We furthermore found a smooth monotonic decay of the proportion by which subjects perceive symmetric regions as figural, as a function of their relative area. Our results reveal an intriguing new interaction between accretion-deletion, figure-ground, and 3-D motion that is not captured by existing models. They also provide an effective tool for measuring figure-ground perception. Introduction The interpretation of images in three dimensions involves the determination of the relative depth ordering of surfaces, as well as the estimation of the 3- D shape of individual surfaces, both of which require integrating a wide array of separate cues. In this paper we describe a novel phenomenon in which image motion gives rise to a vivid 3-D interpretation that involves figure-ground interpretation and 3-D structure from motion but cannot completely be explained by existing models of these phenomena. Figure ground Figure-ground (f/g) interpretation, in which fore- ground regions are segmented from backgrounds, has been extensively studied in the decades since it was first noted by Rubin (1921). In f/g interpretation, the common boundary between two regions is perceived as ‘‘owned’’ by one of the regions, which means that it is perceived as the bounding contour of the figural region, while the ground region is perceived as extending behind the figure at a farther depth. Numerous factors have been identified that tend to promote figural status, including symmetry (Bahnsen, 1928; Kanizsa & Gerbino, 1976; Machilsen, Pauwels, & Wagemans, 2009), convexity (Kanizsa & Gerbino, 1976; Metzger, 1953), parallelism (Metzger, 1953), axiality and part structure (Feldman & Singh, 2006; Froyen, Feldman, & Singh, 2010; Hoffman & Singh, 1997), articulating motion (Barenholtz & Feldman, 2006), and many others (see Wagemans et al., 2012, for a review). The determination of figure and ground is, thus, an essential step in the creation of a 3-D interpretation of the image. Citation: Froyen,V., Feldman, J., & Singh, M. (2013). Rotating columns: Relating structure from motion, accretion/deletion, and figure/ground. Journal of Vision, 13(10):6, 1–12, http://www.journalofvision.org/content/13/10/6, doi:10.1167/13.10.6. Journal of Vision (2013) 13(10):6, 1–12 1 http://www.journalofvision.org/content/13/10/6 doi: 10.1167/13.10.6 ISSN 1534-7362 Ó 2013 ARVO Received December 23, 2012; published August 14, 2013

Upload: others

Post on 24-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Rotating columns: Relating structure-from-motion, accretion/deletion, and figure/ground

    Vicky Froyen $Department of Psychology, Center for Cognitive Science,

    Rutgers University, New Brunswick, NJ, USA

    Jacob Feldman # $Department of Psychology, Center for Cognitive Science,

    Rutgers University, New Brunswick, NJ, USA

    Manish Singh# $

    Department of Psychology, Center for Cognitive Science,Rutgers University, New Brunswick, NJ, USA

    We present a novel phenomenon involving aninteraction between accretion deletion, figure-groundinterpretation, and structure-from-motion. Our displayscontain alternating light and dark vertical regions inwhich random-dot textures moved horizontally atconstant speed but in opposite directions in alternatingregions. This motion is consistent with all the lightregions in front, with the dark regions completingamodally into a single large surface moving in thebackground, or vice versa. Surprisingly, the regions thatare perceived as figural are also perceived as 3-Dvolumes rotating in depth (like rotating columns)—despite the fact that dot motion is not consistent with 3-D rotation. In a series of experiments, we found wecould manipulate which set of regions is perceived asrotating volumes simply by varying known geometriccues to figure ground, including convexity, parallelism,symmetry, and relative area. Subjects indicated whichcolored regions they perceived as rotating. For ourdisplays we found convexity to be a stronger cue thaneither symmetry or parallelism. We furthermore found asmooth monotonic decay of the proportion by whichsubjects perceive symmetric regions as figural, as afunction of their relative area. Our results reveal anintriguing new interaction between accretion-deletion,figure-ground, and 3-D motion that is not captured byexisting models. They also provide an effective tool formeasuring figure-ground perception.

    Introduction

    The interpretation of images in three dimensionsinvolves the determination of the relative depth

    ordering of surfaces, as well as the estimation of the 3-D shape of individual surfaces, both of which requireintegrating a wide array of separate cues. In this paperwe describe a novel phenomenon in which imagemotion gives rise to a vivid 3-D interpretation thatinvolves figure-ground interpretation and 3-D structurefrom motion but cannot completely be explained byexisting models of these phenomena.

    Figure ground

    Figure-ground (f/g) interpretation, in which fore-ground regions are segmented from backgrounds, hasbeen extensively studied in the decades since it wasfirst noted by Rubin (1921). In f/g interpretation, thecommon boundary between two regions is perceivedas ‘‘owned’’ by one of the regions, which means that itis perceived as the bounding contour of the figuralregion, while the ground region is perceived asextending behind the figure at a farther depth.Numerous factors have been identified that tend topromote figural status, including symmetry (Bahnsen,1928; Kanizsa & Gerbino, 1976; Machilsen, Pauwels,& Wagemans, 2009), convexity (Kanizsa & Gerbino,1976; Metzger, 1953), parallelism (Metzger, 1953),axiality and part structure (Feldman & Singh, 2006;Froyen, Feldman, & Singh, 2010; Hoffman & Singh,1997), articulating motion (Barenholtz & Feldman,2006), and many others (see Wagemans et al., 2012,for a review). The determination of figure and groundis, thus, an essential step in the creation of a 3-Dinterpretation of the image.

    Citation: Froyen, V., Feldman, J., & Singh, M. (2013). Rotating columns: Relating structure from motion, accretion/deletion, andfigure/ground. Journal of Vision, 13(10):6, 1–12, http://www.journalofvision.org/content/13/10/6, doi:10.1167/13.10.6.

    Journal of Vision (2013) 13(10):6, 1–12 1http://www.journalofvision.org/content/13/10/6

    doi: 10 .1167 /13 .10 .6 ISSN 1534-7362 � 2013 ARVOReceived December 23, 2012; published August 14, 2013

    mailto:[email protected]:[email protected]://ruccs.rutgers.edu/~jacob/http://ruccs.rutgers.edu/~jacob/mailto:[email protected]:[email protected]://ruccs.rutgers.edu/~manish/http://ruccs.rutgers.edu/~manish/mailto:[email protected]:[email protected]

  • Accretion deletion

    Most research on f/g has involved static images, butmotion provides a wealth of potential cues to figuralstatus and depth ordering. One salient motion cue todepth is the accretion and deletion of texture (Gibson,Kaplan, Reynolds, & Wheeler, 1969; Hegdé, Albright,& Stoner, 2004; Kaplan, 1969; Michotte, Thinès, &Crabbé, 1964; Mutch & Thompson, 1985; Thompson,Mutch, & Berzins, 1985). When a textured surfacemoves behind another object, the texture progres-sively disappears (deleting) behind the occludingsurface and, likewise, reappears (accreting) as thetexture emerges from behind it. Either of these casescan provide a vivid sense of relative depth, with theaccreting/deleting surface interpreted as behind. Thestrength of the accretion-deletion cue can be seen inits ability to override another depth cue, motionparallax (Ono, Rogers, Ohmi, & Ono, 1988). Accre-tion-deletion has also been shown to resolve thedirection of rotation ambiguity present in ortho-graphically projected rotating spheres (Ono et al.,1988). However, recently, it has been found that thestrength of the accretion-deletion cue, relative tostereo disparity, shows individual differences (Hil-dreth & Royden, 2011). While some subjects stronglyfavor the stereo cue when put into conflict withaccretion deletion, others favor the accretion-deletioncue.

    In this paper, we describe a new phenomenon thatarises from the interaction between accretion deletionand geometric (static) cues to figure ground. Incontrast to the standard interpretation of accretion/deletion, in which the accreting or deleting texture isperceived as behind, in our displays, certain accretingor deleting textures are perceived as figural, i.e., infront, and rotating in depth, so that the accretion anddeletion is attributed to self-occlusion of a rotating 3-D object.

    3-D interpretation in our displays

    Figure 1A shows a schematic of our basic display.The display is similar to a classical f/g stimulus withalternating light and dark vertical regions but with eachregion containing texture that is moving horizontally ata constant speed. The textures in alternating regionsmove in opposite directions. Hence, in a typical display,alternating regions might have (say) a dark texturemoving to the left, while the other regions would have alight texture moving to the right (see Figure 2). Theaccretion/deletion cues in such a display are ambiguousin that every boundary has texture accreting or deletingon both sides. For example, at a given boundary, thelight texture could be seen as disappearing behind thedark texture, or, alternatively, the dark texture could beseen as disappearing behind the light texture. However,these two percepts are not seen at the same time.

    Figure 1. Display setup and phenomenology: (A) The displays were created by adding motion in one direction to odd regions and in

    the other direction to even regions in classical figure-ground displays. (B) This could yield one of two percepts depending on which

    one was perceived as figural. The black ones were perceived as rotating in front of a white background which was seen as sliding

    behind them, or vice versa.

    Figure 2. Movie of the stimuli for Experiment 1: (A) convexity

    display, (B) symmetry display, (C) parallelism display, (D)

    unbiased display (see more demonstrations at http://ruccs.

    rutgers.edu/;jacob/demos/motionfg).

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 2

    http://ruccs.rutgers.edu/~jacob/demos/motionfghttp://ruccs.rutgers.edu/~jacob/demos/motionfghttp://ruccs.rutgers.edu/~jacob/demos/motionfg

  • Rather, the interpretation tends to be determined bythe figure/ground assignment of the two regions, asdetermined by static geometric cues (e.g., convexity orsymmetry). The ground regions, which are invariablyall of one color, are interpreted as completing amodallybehind the figural regions to form one continuoustranslating sheet. The figural regions, on the otherhand, are perceived as distinct objects and in front.

    Surprisingly, however, the figural regions are alsoseen as 3-D volumes rotating in depth about a verticalaxis (Figure 1B). When subjects participating in thisstudy were asked to freely describe these displays,almost all (18/19) subjects spontaneously reportedseeing rotating columns over a flat translating texturedsheet, like rolling pins rotating over a flat sheet. Thisaspect of the percept is novel and, in two ways,inconsistent with conventional models. First, thetexture motion within each region has constantvelocity, in contrast to the sinusoidal speed profilewhich would be consistent with the geometry of 3-Drotation (assuming a locally cylindrical shape rotatingat constant angular velocity), and which is generallyunderstood as a prerequisite for the percept ofstructure-from-motion (Andersen & Bradley, 1998;Braunstein, 1962). Second, the percept of 3-D rotationis also obtained with asymmetric columns (e.g., convexcolumns in Figure 2A) which would—were theyphysically undergoing a 3-D rotation—continuouslychange their silhouette profile (2-D projection of theirbounding contour). In our displays, the boundariesbetween columns do not change at all over the coursemotion sequence, which is geometrically inconsistentwith 3-D rotation. Nevertheless, the percept of 3-Drotation survives this inconsistency.

    A natural way to understand this percept of 3-Drotation, despite its inconsistency with the constant-velocity profile of the texture, is that it allows the visualsystem to ‘‘explain’’ the accreting and deleting texturespresent on both sides of each boundary. On one side ofthe boundary, accretion deletion is attributed toocclusion behind another surface (the ‘‘standard’’interpretation of accretion deletion). Whereas, on theother side, the accreting or deleting texture is attributedto self-occlusion due to the 3-D rotation of avolumetric object. It is clear that, if the light-coloredregions (say) are being perceived as disappearingbehind adjacent surfaces, then the dark-colored regionscannot also be given the same interpretation. In otherwords, the light and dark columns cannot both bedisappearing behind each other. An interpretation of 3-D rotation on one set of regions solves this problem,since the rotating regions are now interpreted as beingin front, and the accretion/deletion of the texture isattributed to self-occlusion due this 3-D rotation. Thetexture wraps around the 3-D object; it disappears

    when the object turns away from the observer andappears when it turns towards the observer.

    This still leaves a two-way ambiguity however;namely, which set of regions—dark or light—should beassigned the interpretation of rotating in depth. This iswhere the geometric cues to figure and ground (such asconvexity) become relevant. Our observations withsuch displays suggest that the regions that are morelikely to be figural based on such static (geometric)cues, are also the ones that tend to be perceived asrotating in 3-D. The experiments reported in this paperdocument this observation empirically and, moreover,systematically manipulate geometric f/g cues—convex-ity, parallelism, symmetry, and relative area—in orderto examine their influence on the percept of 3-Drotation.

    Experiment 1

    Experiment 1 tested how the bistable perceptdepicted in Figure 1B was disambiguated by threegeometric cues to f/g interpretation: convexity, sym-metry, and parallelism. In addition to these three cues,an ‘‘unbiased’’ condition was added, containing ran-domly generated displays that were neutral with respectto the three cues manipulated in the other conditions.Subjects were shown classical f/g stimuli containingeither of three cues or the unbiased displays (Figure 2),to which motion was added as in Figure 1A. In order todetermine which of two possible percepts they had, theywere asked to indicate which regions, black or white,they perceived as rotating.

    Method

    Participants

    Nine Rutgers University students, naive to thepurpose of the experiment, participated for coursecredit.

    Stimuli

    The stimuli for this experiment were created in twostages. In the first stage, basic (static) f/g displays werecreated, containing alternating black and white stripeswith specific geometric properties. In the second stage,textural motion was added to these displays to generatethe motion stimuli (Figure 1). More specifically, in thefirst stage, basic f/g stimuli of 7.298 of visual angle(DVA) high and 9.68 DVA wide were created,containing eight alternating dark and light regions.Every other region in these stimuli was manipulated tocontain one of three f/g cues (convexity, symmetry, or

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 3

  • parallelism) biasing subjects’ percepts to seeing thoseregions as figural, or the unbiased shapes (which wereequated for those three f/g cues). The convexitydisplays were created as follows. A convex boundaryconsisted of a series of half circles of random radii,resulting in a series of convex parts along the boundary.This was repeated for every boundary in such a waythat every other region consisted of convex parts.Furthermore, no two boundaries were the same, andboundaries were chosen so that every region had equalarea size. Using this procedure we were able to generatedisplays closely resembling classical convexity stimuli(e.g., Kanizsa & Gerbino, 1976) (see Figure 2A).

    The remaining f/g cues were implemented byrepresenting the boundaries as B-splines. This gave usthe advantage that we could easily control for overallcurvature. Curvature is closely related to convexity(more positive curvature means more convex) and, ifnot controlled, could confound the effects of the otherf/g cues implemented below. We operationalized thiscontrol by keeping the sum of signed curvature alongeach boundary equal to zero. A first cue that wemanipulated using this method was parallelism. B-splines for boundaries in parallel displays were definedby 12 control points and a polynomial degree of three.The y coordinates of the control points were atequidistant points along the height of the display. The xcoordinates of the two most bottom (x1, x2) and twomost top (x11, x12) control points were set so that everyboundary in the display was equidistant to each otherat these positions. All other control points wereassigned x coordinates that were randomly sampledfrom [x1 � 37.70, x1 þ .37.70] arcmin. To create theparallel display, four different B-spline boundaries werecreated. Parallel regions in these displays were thendefined as a translated set of each of those (see Figure2C). Boundaries in symmetric displays were generatedsimilarly, with the only difference being the number ofcontrol points, here 20. Symmetric regions in thesedisplays were then defined as a mirrored set of each ofthe four different B-spline boundaries created (seeFigure 2B). Lastly, we created a class of displays werefer to as ‘‘unbiased.’’ In these displays, there was nobias for odd or even regions to be preferred as figural,based on any of the three former cues, i.e., convexity,parallelism, and symmetry. These displays were createdby generating boundaries in the same way as theboundaries for the symmetric displays. In this case,though, only one unique boundary was generated for agiven display and a mirrored set of it was repeatedevery other region (see Figure 2D). Pilot data showedthat some of such randomly generated unbiaseddisplays still induced some figural bias. Therefore, wechose only those displays which showed no such figuralbias.

    For each of these four conditions, three uniquedisplays were generated, resulting in 12 displays withdifferent geometries. Furthermore, on half the trials,the odd regions were dark and the even were lightcolored, while in the other half, it was the other wayaround (counterbalanced and crossed with otherfactors). On half the trials, the displays were shownreflected over their vertical axis (counterbalanced andcrossed with other factors). Finally, on half of thetrials, the displays were shown reflected over theirhorizontal middle axis (counterbalanced and crossedwith other factors).

    In the second stage of stimulus generation, texturalmotion of was added to the regions in the displays,generated in the first stage, in such a way that all darkregions had identical motion in one direction, and alllight regions had identical motion in the oppositedirection. In both cases, constant-speed horizontalmotion was imparted to the random dot textures. Incase of the dark regions, the random dot texture wassampled from a beta distribution with parameters [a¼6, b ¼ 2], generating a dark texture with sparselyscattered light pixels. The texture for the light regions,on the other hand, was sampled from a betadistribution with parameters [a¼ 2, b¼ 6], generating alight texture with sparsely scattered dark pixels. Everypixel in these textures was 1.47 arcmin by 1.47 arcmin.Each of these textures could move either to the left orto the right. This was implemented as follows. For therightward motion, in each frame t the texture columns[2, N] were taken from texture columns [1, N � 1] inframe t� 1, and the first column in frame t wasresampled in the manner described above (similarly forleftward motion). This procedure was repeated at a rateof 40 frames/s, resulting in a motion with a speed of0.98 DVA/s. These moving textures were then assignedto the dark and light columns in the previouslygenerated f/g stimuli in such a way that, for example,all dark regions had motion in one direction while allwhite regions had motion in the opposite directions (seethe sample stimuli online). We also counterbalanced forthe direction of motion so that in half the trials the darkregions had leftward motion, while in the other halfthey had rightward motion (counterbalanced andcrossed with other factors).

    Design and procedure

    Subjects sat at 85 cm from a 21 in. CRT monitor(144 Hz, 1024 · 768 pixels) on which the displays werepresented using Psychtoolbox (Brainard, 1997;Kleiner et al., 2007) on a Windows XP PC. Subjectsran a total of 192 trials split into two blocks, i.e., 2(Color) · 2 (Horizontal Reflection) · 2 (VerticalReflection) · 2 (Motion Direction) · 4 (Geometric f/gCues) · 3 (Displays Per Cue). All conditions were

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 4

  • counterbalanced for each subject, and trials wererandomized for each subject separately. The experi-ment started with 16 practice trials to acquaint thesubject with the displays and the task. Both in thepractice and main experiment, each trial consisted of800 ms of premask followed by 800 ms of the premaskwith a fixation cross added to it. The mask consistedof eight randomly generated semitransparent singleframes of unbiased displays overlaid on top of eachother. Following this, the actual experimental displaywas shown first for 2 s static on the first frame, then, 2s as a dynamic display with moving texture planes,without a fixation cross. Lastly, the subject waspresented with a post-mask screen, identical to thepremask, for a minimum of 800 ms. Once this post-mask was presented, the subject was asked to indicatewhich colored region they perceived as rotating bymeans of a keyboard response.

    Results and discussion

    Figure 3 shows the results plotted as the proportionof trials the subjects reported seeing regions containingone of the three geometric f/g cues as rotating. For theunbiased condition, data was reported as the propor-tion of times the subjects reported seeing the oddregions (e.g., the light regions in Figure 2D) as rotating.Responses were analyzed for the geometric f/g cuefactor only (i.e., unbiased, convexity, parallelism,symmetry). No other counterbalancing factors (e.g.,color) were found to yield any main effect orinteraction. All, except the unbiased, yielded responsesthat were significantly different from chance: convexity,t(8)¼ 10.08, p , 0.001 —05; parallelism, t(8)¼ 2.56, p, 0.05; symmetry, t(8)¼2.98, p , 0.05; unbiased, t(8)¼0.32, p¼ 0.76. A multilevel logistic regression showed asignificant main effect of geometric f/g cues whencompared to an unconditional means model (contain-

    Figure 3. Results for Experiment 1 indicating the proportion of trials the subjects reported seeing regions containing one of the three

    geometric f/g cues as rotating. For the unbiased condition, data was reported as the proportion of times the subjects reported seeing

    the odd regions as rotating. Error bars represent 61 SE as computed between subjects. Conditions significantly different from 0.5are indicated by *. The red line corresponds to chance level (0.5).

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 5

  • ing only an intercept) by means of a likelihood-ratiotest (LR ¼ 248, df ¼ 12, p , 0.001). Tukey pairwisecomparisons revealed the following effects. Convexitymore strongly biased subjects percept towards seeingrotation in those regions manipulated than any of theother cues manipulated (convexity unbiased, p , 0.001;convexity symmetry, p , 0.001; convexity parallelism, p, 0.001). For all other pairwise comparisons, the nullhypothesis could not be rejected. The strong effect ofconvexity might be attributed to number of regionspresent in our displays. Peterson and Salvagio (2008)showed that, in case an f/g display consisted of eightregions (four convex and four concave), averaged overall subjects, convex regions were seen as figural inabout 91% of the time. This closely matches our data,in which the convex regions were seen as rotating 90%of the time, averaged over all subjects. Symmetry, onthe other hand, showed a weaker effect than convexity,a result reported before (Kanizsa & Gerbino, 1976).Others found subjects to see symmetric regions asfigural 75% of the time (Peterson & Gibson, 1994). Inour experiment, with very different displays andgeometrical configurations, we found, averaged over allsubjects, symmetric regions to be perceived as rotatingin 61% of the time. The weakest effect was found forparallelism, where, averaged over all subjects, parallelregions were only seen as rotating 58% of time.

    Experiment 2

    Experiment 2 further tested the influence of static f/gcues on subjects’ percepts of the displays by introducingarea size. Rather than studying the effect of area inisolation, that is in displays containing straightboundaries, it was put into interaction with symmetry.By doing so, we not only tested the effect of area sizebut also showed how different cues might be combined.Subjects were shown the basic symmetric displays fromExperiment 1 in which the width of the symmetricregions was varied through five discrete steps (Figure4). In order to determine which of two possible perceptsthey held (Figure 1B), subjects were asked to indicate

    which regions, black or white, they perceived asrotating.

    Method

    Participants

    Ten Rutgers University students, naive to thepurpose of the experiment, participated for coursecredit.

    Stimuli and procedure

    The stimuli for this experiment were created usingthe same two-step procedure as in Experiment 1. Fourdisplays each with different symmetric regions analo-gous to Experiment 1 were created. To manipulate therelative area of the symmetric regions in these displays,we adjusted the pairwise distances between adjacentboundaries. In contrast to Experiment 1, boundarieswere not placed in such a way that the x1 control pointof each is equidistant to each other. More precisely, wemanipulated five levels of area size, ranging from thesymmetric regions being narrower than the asymmetricregions to the symmetric regions being wider than theasymmetric regions. This was implemented by adjust-ing the positions of the boundaries so that the x1s of aboundaries that make up a symmetric region werepositioned such that their distance was [67, 75, 83, 92,100] arcmin, while the boundaries making up theasymmetric regions were set to [100, 92, 83, 75, 67]arcmin, respectively, to make up the five different arearatio displays. This brings us to a total of 20geometrically different displays, 4 (Symmetric Dis-plays) · 5 (Area Ratios)—see Figure 4. As inExperiment 1, displays were counterbalanced for color,vertical reflection, and horizontal reflection.

    Subjects were tested in the same environment usingthe same protocol as in Experiment 1. Each subject rana total of 320 trials, split into two blocks, i.e., 2 (Color)· 2 (Horizontal Reflection) · 2 (Vertical Reflection) ·2 (Motion Direction) · 4 (Area Ratios) · 4 (DisplaysSymmetric Displays). All conditions were counterbal-

    Figure 4. Stimuli for Experiment 2: Five manipulations of the relative area of the symmetric regions that were implemented. Values

    indicate the logarithm of the ratio of the width of the symmetric region (arcmin) over the width of the asymmetric region (arcmin).

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 6

  • anced for each subject, and trials were randomized foreach subject.

    Results and discussion

    Figure 5 shows the results plotted as the proportion oftrials the subjects reported perceiving the symmetricregions as rotating. Responses were analyzed in terms ofthe logarithm of all five area ratios (the other,counterbalancing factors, such as color, were found not

    to yield any main nor interaction effect). A multilevellogistic regression showed a significant effect of log arearatio when compared to an unconditional means model(containing only an intercept) by means of a likelihood-ratio test (LR¼ 72, df¼ 3, p , 0.001). The narrower thesymmetric regions (relative to the asymmetric regions),the more likely they were seen as rotating (as indicatedby the estimated slope1 b1¼�1.1164, SE¼ 0.2351). Thiseffect, averaged over subjects, ranged from 77% for thenarrowest symmetric regions to 58% for the widestsymmetric regions. (Note that the latter corresponds to acue-conflict situation, where symmetry and relative area

    Figure 5. Results for Experiment 2: Each data point (horizontal jitter was added to these for presentation purposes) depicts, for an

    individual subject, the proportion of trials on which they saw the symmetric regions as rotating. The blue line depicts the logistic

    regression model, with the gray shading its 95% confidence interval. The red line corresponds to chance level (0.5).

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 7

  • exert influences in opposite directions.) In the neutralcase, where area size was equal between symmetric andasymmetric regions, subjects on average perceived thesymmetric regions as rotating in 69% of trials, which wasslightly higher than we found in Experiment 1, 61%.This difference could be partially due to the difference inmanipulated conditions in the two experiments inducinga difference in mental set. Overall, the results extendearlier findings (Baylis & Driver, 1995; Koffka, 1935;Rubin, 1921) of the effects of area size on figure ground,and show how area size combines with symmetry in amonotomic fashion to strengthen or weaken subjects’bias towards seeing the symmetric regions as figural—and, in our case, rotating in 3-D.

    General discussion

    Interpreting images in three dimensions involvesestimating the relative depths between different surfaces,as well as the 3-D shape of each surface, by integrating awide array of cues. In this paper, we presented a novelphenomenon that provides insights into how theinteraction between accretion deletion (a depth-from-motion cue) and geometric f/g (static) cues give rise to 3-D interpretations of rotating surfaces. In our displays,textural motion was added to classic f/g displays (seeFigure 1). The accretion-deletion cue in these displayswas fully ambiguous, since accretion deletion wasoccurring on both sides of each boundary. We foundthat this motion-based ambiguity was resolved by staticcues to figure and ground. That is, the regions predictedto be figural by the static cues tended to be perceived asvolumetric objects rotating in 3-D, whereas the regionspredicted to be ground were perceived as moving behindthe figural regions, and amodally completing into asingle frontoparallel surface. Moreover, we found thatthe proportion by which subjects perceive regions asrotating was dependent on the strength of the figuralcues in those regions. On the methodological side, thisphenomenon provides a novel way of measuring f/gperception, since the percept of 3-D rotation closelytracks the strength of the f/g cue. On the theoretical side,our results highlight the need for studying depth-from-motion cues and geometric f/g cues as an ensemble. Theobserved interactions between depth from accretion-deletion and static f/g cues cannot be explained bystandard accounts of accretion deletion or 3-D structurefrom motion.

    Accretion deletion

    Our results show that accreting and deleting texturedsurfaces can be interpreted in two very distinct ways

    depending on their interpretation of relative depthbased on geometric f/g cues. One such interpretation isdirectly in line with its standard interpretation aspresented by Kaplan (1969): The textured region thatundergoes accretion or deletion is perceived as a surfacethat is moving behind another surface. A secondinterpretation, which we introduce here, arises when aregion undergoing accretion deletion is perceived asbeing in front based on geometric f/g cues. In this case,such the accretion and deletion of the textured surfaceis interpreted as arising from self-occlusion due torotation in depth of 3-D object. A similar secondaryinterpretation was informally reported by Kaplan(1969) for displays in which textures of approximatelyequal average luminance on both sides of a margin,unlike ours, were accreting and deleting. However, intheir case, subjects reported seeing both sides asrotating. This is in strong contrast to our case in whichonly for example the odd regions were perceived asrotating, while the other even regions were perceived ascontinuing behind. Major differences between our andtheir display are the introduction of luminanceboundaries and the introduction of multiple regions(two vs. eight in our displays). Further researchcurrently under way will show if those are the decidingfactors of the perception found in our displays.

    At a more basic level the current phenomenonsuggests that geometric cues to f/g and accretion-deletion cues are combined together to form a finalpercept, and should be studied and modeled as anensemble. Furthermore, it poses a challenge to currentmodels of depth from motion (e.g., Barnes & Mingolla,2013; Beck, Ognibeni, & Neumann, 2008; Berzhan-skaya, Grossberg, & Mingolla, 2007; Raudies &Neumann, 2010; Yonas, Craton, & Thompson, 1987).Specifically, most current models do encompass a so-called FORM module (e.g., Barnes & Mingolla, 2013;Beck et al., 2008; Berzhanskaya et al., 2007; Raudies &Neumann, 2010) in which luminance-based borders areanalyzed. However in those models, these luminance-based borders are not processed in such a way that theshape of these borders (i.e., geometric cues to f/g) caninfluence the relative depth of surfaces as computedfrom local motion cues, such as accretion deletion.

    3-D rotation

    The percept of 3-D rotation obtained in our displaysis inconsistent with our stimulus displays in two ways.First, in our displays all texture elements merelyundergo a constant-velocity translation in a specifieddirection, whereas a locally cylindrical structure wouldgenerate a sinusoidal speed profile in the image. Indeed,according to standard models of structure frommotion, an interpretation of rigid 3-D rotation requires

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 8

  • a sinusoidal speed profile (Andersen & Bradley, 1998;Braunstein, 1962). Nevertheless, observers perceive the‘‘figural’’ regions as volumetric objects rotating indepth. Second, the 3-D rotation of an asymmetricobject results in continuously changing occludingcontours. In our displays, however, the contours alwaysremained fixed (in all conditions—three out of four ofwhich involved asymmetric regions). Yet, again, the‘‘figural’’ regions were perceived as rotating in 3-D. In ageneral sense, our results favor a Bayesian viewpoint inwhich image data that conflicts with a particular scenemodel generally only diminish its likelihood, ratherthan flatly ruling it out. Nevertheless existing Bayesianmodels of structure-from-motion (e.g., Hogervorst &Eagle, 1998) cannot handle complex situations, such asour displays, that involve multiple surfaces at distinctdepths, and certainly do not take figural status intoaccount. Hence while our data tend to favor a Bayesianview of motion and depth interpretation, a model thatcan explain our results does not yet exist (though ourongoing research aims to develop one).

    Why is the percept of 3-D rotation generated despitethese inconsistencies with the image data? As noted inthe Introduction, since accretion deletion is occurringon both sides of each boundary, a ‘‘standard’’interpretation of accretion deletion (of disappearancebehind an adjacent surface) is not possible on bothsides of the boundary. An interpretation of 3-Drotation on one side allows the visual system to solvethis problem, allowing it to ‘‘explain’’ the accreting anddeleting textures on both sides of each boundary in amutually consistent way. On one side accretion deletionis attributed to occlusion behind adjacent surfaces,whereas, on the other side, the accreting or deletingtexture is attributed to self-occlusion due to the 3-Drotation of a volumetric object.

    Figure-ground geometric cues

    In this paper, we showed how our inherentlyambiguous motion displays were disambiguated bygeometric cues to f/g. We found different cues to havedifferent strengths in biasing subjects’ percepts ofseeing particular regions as rotating in 3-D. For ourdisplays, we found convexity to yield the strongest bias,in that convex regions were most likely seen as rotatingvolumes. The effects of parallelism and symmetry weresomewhat less strong. We did not replicate the strongeffect of symmetry found by others (e.g., Peterson &Gibson, 1994). This failure to replicate was due to thefact that our displays differed from theirs in manyrespects, especially in the geometry of the boundaries.In other work (Froyen, Tanrıkulu, Singh, & Feldman,in press) we found that the degree of symmetry could bemodulated by manipulating the undulations along the

    boundary. Hence, ‘‘symmetry’’ is not a binary cue—either present or absent—that has one particularstrength. Rather the strength of a cue is dependent onthe geometry of the entire region, and its complement.Experiment 2 further emphasizes this by showing theinfluence of area size on figural status can besystematically manipulated. We found a smoothmonotonic decay of the proportion by which subjectsperceive symmetric regions as figural, as a function ofthe relative area of those regions. This finding expandson earlier findings (Baylis & Driver, 1995; Koffka,1935; Rubin, 1921), that narrower regions were morelikely to be perceived as figural. Furthermore, weshowed that if two cues are combined in the samedisplay, it is not the case that one will simply dominatethe other. For example when the asymmetric regionswere narrower than the symmetric regions, we did notfind that subjects were biased to seeing the asymmetricregion as figural. Rather, we found that a linear modelworked well in capturing the results as the log odds ofsubjects’ responses (of seeing the symmetric regions asfigural) as a linear combination of both symmetry andarea size. This linear combination suggests that the cuesof symmetry and relative area combine in a mannerconsistent with ‘‘weak fusion’’ (Baek & Sajda, 2003;Clark & Yuille, 1990; Landy, Maloney, Johnston, &Young, 1995). It will be interesting in future work totest this hypothesis more directly.

    A methodological tool for measuring figureground

    Our results showed that geometric cues to f/gdisambiguate the accretion-deletion cues in our dis-plays (Figure 1), and biased the percept of 3-D rotationin proportion to their strength. These displays thus alsoprovide a novel way of measuring f/g perception, sincethe percept of 3-D rotation tends to tracks the strengthof the f/g geometric cue. Traditionally, f/g perceptshave been measured simply by asking the subject whichregion they perceive as figural. Such direct explicitreport tasks are, however, prone to individual biases,high-level interpretations, and extraperceptual factors(Struber & Stadler, 1999). Less prone to such factorswas the visual short-term memory task proposed byDriver and Baylis (1996), based on the assumption thatfigural regions have a stronger memory trace thanground regions. Typically, a subject would be presentedwith an f/g display and, subsequently, with a testdisplay, containing one of the regions from the f/gdisplay. A subject would then be found to be faster atrecognizing the region in the test display if it were partof the figural than ground regions. Even though thistask was used by many researchers as the goldstandard, it has been remarked that the test display

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 9

  • itself might be influencing the short-term memory effect(Hulleman & Humphreys, 2004). Hulleman and Hum-phreys (2004) proposed yet another method based onvisual search. In their displays, one region would besymmetric while others were not. If this symmetricregion was also figural, subjects would more easily findit than when it was to be ground. However effective,this one symmetric region might confound the displayswhen another cue is to be measured. From a totallydifferent tradition is the on-off method (Hoffman &Singh, 1997; Stevens & Brookes, 1988), which in therecent years has been applied to traditional alternatingblack-white regioned f/g stimuli (Peterson & Salvagio,2008). Subjects are presented with a red dot on top ofone of the regions and asked if it was on top of thefigural region or not. The phenomenon in the currentpaper, as a methodology, overcomes many of the issuesas present in the direct methods (explicit report, on-offmethod), because of the indirect nature of the questionasked to subjects (‘‘Which colored regions do youperceive as rotating?’’). Such an indirect question wasmade possible by the perceptual coupling (Hochberg &Peterson, 1987) between the rotation percept and the f/g percept, present in our stimuli. Furthermore, thebistable nature of our stimulus makes it easy for theobserver to report their observation. The usefulness ofmethods employing such stimuli has been pointed outbefore (e.g., Backus, 2009). Lastly, no second imageneed to be shown as a target (visual short-term memorymethod), nor does the geometry of the displays need tobe changed (visual search method) that might influenceour f/g perception.

    Conclusion

    We presented a novel phenomenon, involving theinteraction between depth from accretion-deletion andgeometric (static) cues to f/g. In our displays, texturedregions on both sides of a boundary were accreting anddeleting, with the same speed but in opposite directions.This yielded an inherently ambiguous percept, whichwas disambiguated by geometric f/g cues. The resultingpercept was surprising in that one set of regions (thosemore likely to be assigned as figural based on thegeometric cues) was perceived as 3-D columns rotatingin depth, while the ground regions were seen astranslating, and amodally completing, behind them.The results showed that the likelihood of beingperceived as rotating in 3-D was determined by thestrength of the geometric f/g cues. On the methodo-logical side, this phenomenon provides a novel way ofmeasuring f/g perception, since the percept of 3-Drotation closely tracks the strength of the f/g cue. Onthe theoretical side, our results highlight the need for

    studying geometric f/g cues and depth-from-motioncues in tandem. The observed interactions betweenaccretion deletion and geometric f/g cues cannot beexplained by standard accounts of accretion deletion orstructure from motion.

    Keywords: perceptual organization, figure-ground,structure-from-motion, accretion-deletion

    Acknowledgments

    This research was supported by NIH EY021494 to J.F. and M. S., and NSF DGE 0549115 (Rutgers IGERTin Perceptual Science). We are grateful to John Wilder,Ö. Daglar Tanrıkulu, and two anonymous reviewersfor their many helpful comments. We thank LorileiAlley for her help at various stages of this study.

    Commercial relationships: none.Corresponding Author: Vicky Froyen.Email: [email protected]: Department of Psychology, Center for Cog-nitive Science, Rutgers University, New Brunswick, NJ,USA.

    Footnote

    1b1¼ log{[p(rot ¼ 1jR ¼ x)p(rot¼ 0jR ¼ X þ 1)]/[p(rot¼ 1jR ¼ X þ 1)p(rot ¼ 0jR ¼ X)]}, where rot¼‘‘symmetric regions are seen as rotating,’’ and R ¼log(area ratio).

    References

    Andersen, R., & Bradley, D. (1998). Perception ofthree-dimensional structure from motion. Trends inCognitive Sciences, 2(6), 222–228.

    Backus, B. T. (2009). The mixture of bernoulli experts:A theory to quantify reliance on cues in dichoto-mous perceptual decisions. Journal of Vision, 9(1):6,1–19, http://www.journalofvision.org/content/9/1/6, doi:10.1167/9.1.6. [PubMed]

    Baek, K., & Sajda, P. (2003 October). A probabilisticnetwork model for integrating visual cues andinferring intermediate-level representations. Paperpresented at the Third International Workshop onStatistical and Computational Theories of Vision,Nice, France. (pp. 3–10).

    Bahnsen, P. (1928). Eine untersuchung uber symmetrieund assymmetrie bei visuellen wahrnehmungen [A

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 10

    http://www.journalofvision.org/content/9/1/6http://www.journalofvision.org/content/9/1/6http://www.ncbi.nlm.nih.gov/pubmed/19271876

  • study about symmetry and asymmetry in visualpercepts]. Zeitschrift für Psychology, 108, 129–154.

    Barenholtz, E., & Feldman, J. (2006). Determination ofvisual figure and ground in dynamically deformingshapes. Cognition, 101(3), 530–544.

    Barnes, T., & Mingolla, E. (2013). A neural model ofvisual figure-ground segregation from kinetic oc-clusion. Neural Networks, 37, 141–164.

    Baylis, G., & Driver, J. (1995). One-sided edgeassignment in vision: 1. figure-ground segmentationand attention to objects. Current Directions inPsychological Science, 4(5), 140–146.

    Beck, C., Ognibeni, T., & Neumann, H. (2008). Objectsegmentation from motion discontinuities andtemporal occlusions—A biologically inspired mod-el. PLoS ONE, 3(11), e3807.

    Berzhanskaya, J., Grossberg, S., & Mingolla, E. (2007).Laminar cortical dynamics of visual form andmotion interactions during coherent object motionperception. Spatial Vision, 20(4), 337–395.

    Brainard, D. H. (1997). The psychophysics toolbox.Spatial Vision, 10, 433–436.

    Braunstein, M. (1962). Depth perception in rotatingdot patterns: Effects of numerosity and perspective.Journal of Experimental Psychology, 64(4), 415–420.

    Clark, J., & Yuille, A. (1990). Data fusion for sensoryinformation processing systems (Vol. 105). Boston,MA: Kluwer Academic Publishers.

    Driver, J., & Baylis, G. (1996). Edge-assignment andfigure-ground segmentation in short-term visualmatching. Cognitive psychology, 31(3), 248–306.

    Feldman, J., & Singh, M. (2006). Bayesian estimationof the shape skeleton. Proceedings of the NationalAcademy of Sciences, 103, 18014–18019.

    Froyen, V., Feldman, J., & Singh, M. (2010). ABayesian framework for figure-ground interpreta-tion. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.), Advances inneural information processing systems (Vol. 3, pp.631–639). Vancouver, British Columbia, Canada:Curran Associates.

    Froyen, V., Tanrıkulu, O. D., Singh, M., & Feldman, J.(in press). Stereoslant: A novel method for mea-suring figure-ground assignment. Manuscript sub-mitted for publication.

    Gibson, J., Kaplan, G., Reynolds, H., & Wheeler, K.(1969). The change from visible to invisible.Attention, Perception, & Psychophysics, 5(2), 113–116.

    Hegdé, J., Albright, T., & Stoner, G. (2004). Second-order motion conveys depth-order information.

    Journal of Vision, 4(10):1, 838–842, http://www.journalofvision.org/content/4/10/1, doi:10.1167/4.10.1. [PubMed]

    Hildreth, E., & Royden, C. (2011). Integrating multiplecues to depth order at object boundaries. Attention,Perception, & Psychophysics, 73(7), 1–18.

    Hochberg, J., & Peterson, M. A. (1987). Piecemealorganization and cognitive components in objectperception: Perceptually coupled responses tomoving objects. Journal of Experimental Psycholo-gy: General, 116(4), 370.

    Hoffman, D. D., & Singh, M. (1997). Salience of visualparts. Cognition, 63(1), 29–78.

    Hogervorst, M., & Eagle, R. (1998). Biases in three-dimensional structure-from-motion arise fromnoise in the early visual system. Proceedings of theRoyal Society of London. Series B: BiologicalSciences, 265(1406), 1587–1593.

    Hulleman, J., & Humphreys, G. W. (2004). A new cueto figure-ground coding: Top-bottom polarity.Vision Research, 44(24), 2779–2791.

    Kanizsa, G., & Gerbino, W. (1976). Vision and artifact.In M. Henle (Ed.), Convexity and symmetry infigure-ground organization (pp. 25–32). New York,NY: Springer.

    Kaplan, G. (1969). Kinetic disruption of optical texture:The perception of depth at an edge. Attention,Perception, & Psychophysics, 6(4), 193–198.

    Kleiner, M., Brainard, D., Pelli, D., Ingling, A.,Murray, R., & Broussard, C. (2007). What is new inpsychtoolbox 3. Perception, 36(14 ECVP AbstractSuppl.).

    Koffka, K. (1935). Principles of gestalt psychology.London, UK: Lund Humphries.

    Landy, M., Maloney, L., Johnston, E., & Young, M.(1995). Measurement and modeling of depth cuecombination: In defense of weak fusion. VisionResearch, 35(3), 389–412.

    Machilsen, B., Pauwels, M., & Wagemans, J. (2009).The role of vertical mirror-symmetry in visualshape detection. Journal of Vision, 9(12):11, 1–11,http://www.journalofvision.org/content/9/12/11,doi:10.1167/9.12.11. [PubMed]

    Metzger, F. (1953). Gesetze des sehens [Laws of seeing].Frankfurt-am-Main, Germany: Waldemar Kramer.

    Michotte, A., Thinès, G., & Crabbé, G. (1964). Lescomplements amodaux des structures perceptives[Amodal completion of perceptual structures].Louvain, Belgium: Publications Universitaires deLouvain.

    Mutch, K., & Thompson, W. (1985). Analysis ofaccretion and deletion at boundaries in dynamic

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 11

    http://www.journalofvision.org/content/4/10/1http://www.journalofvision.org/content/4/10/1http://www.ncbi.nlm.nih.gov/pubmed/15595889http://www.journalofvision.org/content/9/12/11http://www.ncbi.nlm.nih.gov/pubmed/20053102

  • scenes. IEEE Transactions on Pattern Analysis andMachine Intelligence, 7(2), 133–138.

    Ono, H., Rogers, B., Ohmi, M., & Ono, M. (1988).Dynamic occlusion and motion parallax in depthperception. Perception, 17(2), 255–266.

    Peterson, M. A., & Gibson, B. S. (1994). Must figure-ground organization precede object recognition?An assumption in peril. Psychological Science, 5(5),253–259.

    Peterson, M. A., & Salvagio, E. (2008). Inhibitorycompetition in figure-ground perception: Contextand convexity. Journal of Vision, 8(16):4, 1–13,http://www.journalofvision.org/content/8/16/4,doi:10.1167/8.16.4. [PubMed]

    Raudies, F., & Neumann, H. (2010). A neural model ofthe temporal dynamics of figure–ground segrega-tion in motion perception. Neural Networks, 23(2),160–176.

    Rubin, E. (1921). Visuell wahrgenommene figuren:studien in psychologischer analyse [Visually per-ceived figures: studies in psychological analysis].Kobenhaven, Denmark: Glydenalske Boghandel.

    Stevens, K. A., & Brookes, A. (1988). The concave cuspas a determiner of figure-ground. Perception, 17(1),35–42.

    Struber, D., & Stadler, M. (1999). Differences in top-down influences on the reversal rate of differentcategories of reversible figures. Perception, 28(10),1185–1196.

    Thompson, W., Mutch, K., & Berzins, V. (1985).Dynamic occlusion analysis in optical flow fields.IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 7(4), 374–383.

    Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S.E., Peterson, M. A., Singh, M., et al. (2012). Acentury of gestalt psychology in visual perception:I. perceptual grouping and figure-ground orga-nization. Psychological Bulletin, 138(6), 1172–1217.

    Yonas, A., Craton, L., & Thompson, W. (1987).Relative motion: Kinetic information for the orderof depth at an edge. Attention, Perception, &Psychophysics, 41(1), 53–59.

    Journal of Vision (2013) 13(10):6, 1–12 Froyen, Feldman, & Singh 12

    http://www.journalofvision.org/content/8/16/4http://www.ncbi.nlm.nih.gov/pubmed/19146271

    Introductionf01f02Experiment 1f03Experiment 2f04f05General discussionConclusionn1Andersen1Backus1Baek1Bahnsen1Barenholtz1Barnes1Baylis1Beck1Berzhanskaya1Brainard1Braunstein1Clark1Driver1Feldman1Froyen1Froyen2Gibson1Hegde1Hildreth1Hochberg1Hoffman1Hogervorst1Hulleman1Kanizsa1Kaplan1Kleiner1Koffka1Landy1Machilsen1Metzger1Michotte1Mutch1Ono1Peterson1Peterson2Raudies1Rubin1Stevens1Struber1Thompson1Wagemans1Yonas1