recovery of surface pose from texture orientation...

Biol Cybern (2010) 103:199–212DOI 10.1007/s00422-010-0389-3

Recovery of surface pose from texture orientation statisticsunder perspective projection

Paul A. Warren · Pascal Mamassian

Received: 22 April 2009 / Accepted: 31 March 2010 / Published online: 28 April 2010© Springer-Verlag 2010

Abstract In a seminal paper, Witkin (1981) derived a modelof surface slant and tilt recovery based on the statistics of theorientations of texture elements (texels) on a planar surface.This model made use of basic mathematical properties ofprobability distributions to formulate a posterior distributionon slant and tilt given a set of image orientations under ortho-graphic projection. One problem with the Witkin model wasthat it produced a posterior distribution with multiple max-ima, reflecting the inherent ambiguity in scene reconstruc-tion under orthographic projection. In the present article, weextend Witkin’s method to incorporate the effects of per-spective projection. An identical approach is used to that ofWitkin; however, the model now reflects the effects of per-spective projection on texel orientation. Performance of thenew model is compared against that of Witkin’s model ina basic surface pose recovery task using both a maximuma posteriori (MAP) decision rule and a rule based on theexpected value of the posterior distribution. The resultantposterior of the new model is shown to have only one max-imum and thereby the ambiguity in scene interpretation isresolved. Furthermore, the model performs better than Wit-kin’s model using both MAP and expected value decisionrules. The results are discussed in the context of human slantestimation.

Electronic supplementary material The online version of thisarticle (doi:10.1007/s00422-010-0389-3) contains supplementarymaterial, which is available to authorized users.

P. A. Warren (B)School of Psychological Sciences,Manchester University, Manchester, M13 9PL, UKe-mail: [email protected]

P. MamassianLaboratoire Psychologie de la Perception (CNRS UMR 8158),Université Paris Descartes, 45 rue des Saints-Pères, 75006 Paris, France

Keywords Shape from texture · Computational vision ·Human vision

1 Introduction

A fundamental problem faced by any (animal or artificial)visual system is to recover the 3D structure of the scenefrom the 2D information available in the image. One par-ticular problem of interest involves estimation of the shapeof objects in the scene. This is equivalent to the recovery ofthe local pose (i.e., the direction of the surface normal) ofsmall planar regions on the object surface. Undertaking sucha process over many such regions allows the visual systemto build up a global estimate of object shape.

In theory, local surface normals can be recovered from anumber of different information sources including surfaceshading (e.g., see Horn and Brooks 1989) and the prop-erties of texture markings on the surface (e.g., see Witkin1981; Gårding 1993a; Stone 1993). Furthermore, the abilityof human observers to recover surface shape and orienta-tion from shading and texture has been demonstrated (Min-golla and Todd 1986; Koenderink et al. 1992; Rosenholtzand Malik 1997) and contrasted with performance of idealobservers (Blake and Marinos 1990; Blake et al. 1993; Buck-ley et al. 1995; Knill 1998; Rosenholtz and Malik 1997).

In the present article, we focus on the problem of recover-ing planar surface pose, characterised by the slant and tilt ofthe surface normal (see Fig. 1), from texture information. Inparticular, we investigate the recovery of surface pose fromthe image projection of discrete markings or texture elements(texels) on the surface. Several algorithms have been devel-oped which can recover surface shape without the need fortexels, e.g., those based on deformations of oriented spec-tral content such as Li and Zaidi (2000), deformations in

123

http://dx.doi.org/10.1007/s00422-010-0389-3

200 Biol Cybern (2010) 103:199–212

Fig. 1 Illustration of surfacepose parameters—slant, σ , andtilt, τ . Note that slant as definedin the text is equivalent to themore traditional definition ofslant as the angle between theline of sight and the normal tothe surface. Planes are shownslanted about an a horizontaland b vertical axis. In both a andb, the top figure shows a view ofthe plane from the side (a) orfrom the top (b). The bottomfigure shows a schematicillustration from the point ofview of the observer andindicates the tilt and slant axes

τ = 0

τ = 90

tilt axis (direction of slant)

tilt axis (direction of slant)

slantaxis

slantaxis

σσ

a b

n

nLine of Sight

Line

of S

ight

σσ

o

o

affine structure, e.g., Malik and Rosenholtz (1997) or geo-metric distortions of closed contours, e.g., Brady and Yuille(1984). However, there is evidence that regular local tex-ture elements form an important part of human slant recov-ery (Velisavljevic and Elder 2006). Furthermore, when texelsare projected into the image a number of simple but powerfuleffects on the structure of the texture occur, which provideuseful information about surface pose. These effects can besummarised as:

(i) Size—as the slant of the plane is increases, the projectedtexels become relatively smaller and larger in the por-tions of the plane which are located, respectively, fartherfrom and nearer to the camera or eye.

(ii) Density—as the slant of the plane is increased, the pro-jected texels become relatively more and less dense inthe portions of the plane which are located, respectively,farther from and nearer to the camera or eye.

(iii) Compression/foreshortening—as the slant of the planeis increased the aspect ratio of texels varies systemati-cally so that the extent of the texel in the direction of thetilt axis is compressed relative to that in the orthogonaldirection.

In this study, we are particularly interested in the effectsof compression and how they can be used to recover the pose(slant and tilt) of a planar surface. Both size and density cuescan be used to recover surface pose, however, they rely uponthe rate of change of texture information across the scene,

and as such they rely upon gradients in texture information(Gibson 1950). Furthermore, for small surfaces the size anddensity cues are of limited use. The effect of foreshorteningis different from size and density cues in that informationabout surface pose can be recovered from any single texelon the surface and therefore does not rely upon a gradient ofinformation across the scene or the size of the surface.

One consequence of compression is that under theassumption that all texel orientations are equally likely on thesurface (the isotropy assumption); the projected orientationsin the image approach the orientation of the axis of slant. Thistendency is illustrated in Fig. 2 for a plane covered in tex-els whose orientations are selected randomly from a uniformdistribution. As the slant about a horizontal axis increases,the distribution of orientations in the image becomes morepeaked around the horizontal direction.

Witkin (1981) provided an elegant algorithm which, underthe isotropy assumption, takes advantage of the systematicrelationship between the slant and tilt of the plane and the dis-tribution of discrete texel orientations in the image to recoversurface orientation. Witkin’s approach rests upon the ideaof having a statistical model of the texture on the surface.Recovering the surface pose then becomes a problem of sta-tistical inference; slant (σ ) and tilt (τ ) are parameters to beestimated given observed data in the image. Witkin providesa statistical estimator (Eq. 1) for (σ, τ ), based on the pos-terior distribution p(σ, τ |θ ′), describing the probability thatthe surface has slant σgnd tilt τ given the n-vector of imagetexel orientations θ ′ = {θ ′

1, θ′2, . . . , θ

′n}:

123

Biol Cybern (2010) 103:199–212 201

τ = 90, σ = 45

-90 -45 0 45 900

.01

.02

Projected Orientation (deg)

Pro

babi

lity

τ = 90, σ = 75

-90 -45 0 45 900

.01

.02


Pro

babi

lity

τ = 90, σ = 0

-90 -45 0 45 900

.01

.02


Pro

babi

lity

o o

oo

o o

Fig. 2 Schematic illustration of effects of foreshortening on the dis-tribution of texel orientations in the image. The top row shows a fron-to-parallel surface on which are placed a number of texels at randomorientations as illustrated in the approximately uniform orientation dis-tribution. The middle row shows the effect of slanting this surface abouta horizontal axis. A peak is formed in the distribution of texel orienta-tions at a location which characterises the axis of tilt. The bottom rowdemonstrates that as the slant is increased the peak in the distributionbecomes more concentrated around a direction which depends on thetilt

p(σ, τ |θ ′) =n∏

i=1

π−2 sin σ cos σ

cos2(θ ′i − τ) + sin2(θ ′

i − τ) cos2 σ(1)

This method is by no means the only approach to recov-ery of surface pose from texture information in the image.Kanatani (1984) proposed an alternative continuous contour-based approach based on counting intersections of contourswith straight lines in the image plane. Brady and Yuille (1984)applied another approach for closed contours on a surface,suggesting that surface pose could be recovered by maxi-mizing compactness (a metric related to the enclosed areaand perimeter length of the contour) in the back projectiononto the surface. Furthermore, Blake and Marinos (1990) andGårding (1993a) have developed iterative methods to recoverapproximations to the maximum likelihood estimates of slantand tilt as described by Witkin (1981).

One problem with the Witkin algorithm and the majorityof alternative approaches to recover surface pose, is that thegeometric model is based upon orthographic (parallel) pro-jection so that orientations in the image are independent oftexel location on the surface. Under this projection model,the image data obtained for a surface with pose parameters(σ, τ ) is identical to that when the plane has pose (σ, τ + π )(see eq. 1). As a consequence when such models are shown aset of randomly oriented texels, then simply by chance (i.e.,depending on the set of texels generated) an error could ariseof around 180◦ in the recovered tilt.

The perspective projection case has been considered byGårding (1993b). An algorithm (Weak Isotropy Surface Per-ception or WISP) was proposed incorporating a perspectiveprojection model and using standard results from circularstatistics. The algorithm relies upon an initial assumption ofstrong isotropy which is then relaxed to assume only weakisotropy (isotropy of texture on the average) in an iterativeprocedure. However, this approach still recovers two rivalinterpretations of the surface pose which are then manuallyselected. The reason for this ambiguity is that under the WISPmodel, the texels are all treated as if they occur at the centreof the image. Thus, in spite of perspective projection infor-mation being present in the shape of the texel in the image,the position of those texels is not taken into account. Con-sequently, this model suffers from a more complex analogueof the ambiguity arising under orthographic projection.

In the present article, we extend Witkin’s original algo-rithm directly to take into account a perspective projectionmodel. We derive a generalization of (Eq. 1) for perspec-tive projection which provides an estimator for (σ, τ ) basedon p(σ, τ |θ ′, u, v), the probability that the surface has slantσ and tilt τ given the n-vector of image orientations θ ′andthe 2D locations of the line segments on the plane (u, v).Throughout this article, we will use the coordinate systemdescribed in Fig. 3.

The model is compared to the orthographic projectionmethod of Witkin (1981) using two different decision rules.When either a maximum a posteriori (MAP) or expectedvalue (EXP) decision rule are imposed upon the posterior, theestimate is seen to have a single value (i.e., the tilt ambiguityis removed). The reason for investigating performance usingthe EXP decision rule is that it has been suggested that humanperformance in tasks of this kind might be better representedby sampling on a trial by trial basis from the posterior dis-tribution (Mamassian and Landy 1998). Consequently, theexpected value of the posterior might be more appropriatefor representation of human slant estimation performance.

The results of our model simulations are discussed in thecontext of human slant estimation. We also discuss the simi-larities between our model and some human data collected ina slant estimation experiment (presented in the supplemen-tary materials). These experiments were designed to explore

123

202 Biol Cybern (2010) 103:199–212

Fig. 3 Coordinate system usedthroughout this study

d

usin σ ucos σ

σ

Y=V

X

Z

U

P(u, v) = P(x, y, z)

x’

y’

the common pattern of slant underestimation found in pre-vious studies (e.g., Gibson 1950; Gruber and Clark 1956;Braunstein 1968) and to see if this is consistent with theresults of our model. Furthermore, we investigated whetherhuman participants are sensitive to texel orientation informa-tion under perspective projection.

2 Methods: deriving the estimator

As in Witkin (1981) we will split the derivation into sec-tions corresponding to the geometric and statistical modelsbefore finally characterising the estimator for slant and tilt.At each stage we will demonstrate how the present model isa generalisation of the Witkin model.

As noted in the introduction, the Witkin (1981) algo-rithm is based upon an orthographic projection model. Con-sequently, the location of line elements on the plane is notfactored into the estimator. In the first section we derive somebasic results for perspective projection of surface texels to theimage plane.

2.1 The projection model—Case 1: τ = 0

We begin by deriving the function describing how locationsand angles on the surface are transformed under perspectiveprojection for the zero tilt case (i.e., rotation about the Y-axis;see Fig. 1).

Let the triple {X, Y, Z} define an orthogonal, referencecoordinate frame with origin at a distance d from the observer(Fig. 3). Let P be a point on a vertical plane passing throughthe origin of this frame and slanted about the Y-axis by anangle σ . Let {U, V} define the coordinate system attached tothis plane. If we define the image plane to be the X-Y planethen by similar triangles we have:

x = ud cos σ

d + u sin σ, y = vd

d + u sin σ(2)

where d is the viewing distance. This equation provides thecoordinates (x, y) of point P in the image plane under per-spective projection.

Now let (u1, v1)T and (u2, v2)

T be two such points on thesurface and let the line between them define angle θ with theU-axis. By Eq. 2, the projections of these points in the imageare given by(

x1

y1

)=

(u1d cos σ

d+u1 sin σv1d

d+u1 sin σ

),

(x2

y2

)=

(u2d cos σ

d+u2 sin σv2d

d+u2 sin σ

).

Thus, relative to the X-axis, θ ′, the projection of angle θ inthe image, is given by:

tan θ ′ = y2 − y1

x2 − x1= d(v2 − v1) + sin σ(v2u1 − v1u2)

d cos σ(u2 − u1)(3)

However, since the points on the surface define a straight linethe following relationships hold.

u2 = u1 + α cos θ , v2 = v1 + α sin θ , so that:⎧⎨

⎩

u2 − u1 = α cos θ

v2 − v1 = α sin θ

v2u1 − v1u2 = α(u1 sin θ − v1 cos θ).

Substituting these into Eq. 3, we obtain θ ′ from

tan θ ′ = tan θ + 1/d sin σ(u1 tan θ − v1)

cos σ

= tan θ(1 + u1/d sin σ) − v1

/d sin σ

cos σ.

(4)

Note that for the cases in which (i) d is large or (ii) the line isclose to the origin (u and v are small) Eq. 4 is approximatedby

tan θ ′ = tan θ

cos σ,

123

Biol Cybern (2010) 103:199–212 203

which is the basic orthographic projection model from Witkin(1981).

2.2 The projection model—Case 2: arbitrary τ

To introduce the effect of surface tilt on projected image ori-entation we now pick arbitrary coordinate axes for the imageplane (namely Xτ and Yτ ) with Xτ in the direction of the tiltaxis which is inclined at an angle of τ relative to the X-axis.We also define θ ′ as the angle between the X-axis and a pro-jected line orientation, together with θ ′

τ as the angle betweenthe Xτ -axis and the projected line orientation. Then, by ourdefinition we know that

θ ′τ = θ ′ − τ

Also, using (4) we can calculate the angle θ ′τ as

θ ′τ = tan−1

⎡

⎢⎢⎣tan θτ (1 + u/

d sin σ) − v/d sin σ

cos σ

⎤

⎥⎥⎦

where θτ is the orientation of the line element on the sur-face prior to projection relative to the projection of theXτ -axis onto the surface. Consequently, we obtain (Eq. 5)as an expression for the projected orientation of a line seg-ment on the surface as a function of the surface’s slant andtilt and the line’s orientation and position on the surface.

θ ′ = f (θτ ; σ, τ, u, v)

= tan−1

[tan θτ (1 + u/

d sin σ) − v/d sin σ

cos σ

]+ τ (5)

Note that for the cases in which (i) d is large or (ii) the line isclose to the origin (u and v are small) Eq. 5 is approximatedby:

θ ′ = tan−1[

tan θτ

cos σ

]+ τ

which is Eq. 1 from Witkin (1981).

2.3 Statistical model (isotropy and independence)

We use the statistical model of Witkin (1981). Each of thetexels on the surface, with orientation θτ , projects to a linein the image, with orientation θ ′. The distributions of orien-tations on the surface and across the whole image plane arerelated by the geometric transformation in Eq. 5.

By assuming a particular form for the distribution of orien-tations on the surface it is possible to quantify the likelihoodthat an observed image orientation occurred for each pos-sible slant and tilt pair. In Witkin (1981) it is assumed thatthe distribution of orientations on the surface is uniform (theisotropy assumption).

In addition, assuming that the line orientations on the sur-face are independent from one another (the independenceassumption), it is possible to quantify the combined likeli-hood that the set of observed image orientations occurred foreach possible slant and tilt pair.

Since Witkin’s method was based upon an orthographicprojection model there was no dependence on texel orien-tation on the plane. Under the perspective projection modelused in the present study we make the additional assump-tion that the location of texels is uniformly distributed on thesurface (the homogeneity assumption).

A final assumption is that all surface normals describingthe orientation of the plane are equally likely. Consequently,we do not implement an explicit prior belief that floor or ceil-ing planes are more likely in the world. This enables us toassess the performance of our estimator without bias due toprior assumptions.

2.4 Deriving the estimator for surface slant and tilt

2.4.1 Step 1: a single texel

We wish, first of all, to characterise the likelihoodL(σ, τ ; θ ′, u, v) that each slant and tilt pair generated anobserved single image orientation θ ′ from the surface loca-tion (u, v), under the geometric and statistical models intro-duced above. It should be noted at this point that the likeli-hood appears to be a mixed function of the post-projectionimage orientation and the pre-projection position on the sur-face (u, v). Note, however, that u and v are simply parametersin the likelihood and not the variables of interest. In fact, bysimply inverting Eq. 2, the position of the texel in the imagecould also be substituted in Eq. 6, so that all parameters areexpressed post-projection. We choose not to do this solelyfor the sake of parsimony. This likelihood function can bedetermined by treating the conditional probability distribu-tion p(θ ′|σ, τ ; u, v) as a function of slant and tilt—note thatthe semi-colon indicates that u and v are simply parameters.To derive this quantity we will use the isotropy assumptionwhich constrains the distribution of θτ to be

p(θτ ) = 1

π.

We will also use the geometric model expressed in Eq. 5which determines θ ′ as a function of θτ (and vice versa)with σ, τ, u and v as parameters. Finally, we use the standardrelation for determining the probability distribution function(p.d.f.) of a function, ξ , of a random variable, x , with knowndistribution:

pdf(ξ(x)) = pdf(x)dx

dξ(x)

Replacing x wi th θτ and ξ with f (the function fromEq. 5), we can derive the required p.d.f., p(θ ′|σ, τ ; u, v), of

123

204 Biol Cybern (2010) 103:199–212

a projected orientation given the slant and tilt of the surfaceand the surface position of the texel:

p(θ ′|σ, τ ; u, v) = p(θτ )dθτ

dθ ′ = 1

π

dθτ

dθ ′ .

In order to carry out the differentiation we require θτ ,expressed as a function of θ ′, which is obtained by invertingEq. 5:

θτ = g(θ ′; σ, τ, u, v)

= tan−1

[tan(θ ′ − τ) cos σ + v

/d sin σ

(1 + u/d sin σ)

].

To calculate the derivative we set the term inside the squarebracket equal to γ and use the chain rule:

dθτ

dθ ′ = dθτ

dγ

dγ

dθ ′ = 1

1 + γ 2

dγ

dθ ′ .

Using the quotient rule to evaluate the remaining deriva-tive and simplifying we obtain:

dθτ

dθ ′ = sec2(θ ′ − τ) cos σ(1 + u/d sin σ)

(1 + u/d sin σ)2 + (tan(θ ′ − τ) cos σ + v

/d sin σ)2

.

As a consequence we obtain the required likelihood function:

L(σ, τ ; θ ′, u, v) = p(θ ′|σ, τ ; u, v) = 1

π

dθτ

dθ ′

= π−1 sec2(θ ′ − τ) cos σ(1 + u/d sin σ)

(1 + u/d sin σ)2 + (tan(θ ′ − τ) cos σ + v

/d sin σ)2

.

(6)

Note that for the cases in which (i) d is large or (ii) the line isclose to the origin (u and v are small) Eq. 6 is approximatedby the Witkin (1981) likelihood function.

2.4.2 Step 2: multiple independent texels

To obtain a joint density function for n image texel ori-entations defined in the vector θ ′ = {θ ′

1, θ′2, . . . , θ

′n}, we

use the assumption of independence stated in the statis-tical model section. Let u = {u1, u2, . . . , un} and v ={v1, v2, . . . , vn}be vectors containing the position of the tex-els then

L(σ, τ ; θ ′, u, v) =n∏

i=1

L(σ, τ ; θ ′i , ui , vi ). (7)

It can be seen from Eqs. 6 and 7 that for the cases in which(i) d is large or (ii) the texels were close to the origin beforeprojection (i.e., ui and vi are small) Eq. 7 reduces to Eq. 1.

2.4.3 Step 3: application of Bayes’ rule

From Bayes’ rule, we know that the posterior density func-tion, p(σ, τ |θ ′, u, v), allowing estimation of the slant and tilt

parameters, can be obtained as:

p(σ , τ |θ ′; u, v) = p(σ, τ )p(θ ′|σ, τ ; u, v)∫∫p(σ, τ )p(θ ′|σ, τ ; u, v)dσdτ

where the integration is performed over the σ and τ rangesand p(σ, τ ) is the joint prior probability of each slant, tiltpair.

To obtain p(σ, τ ) we use the assumption that all surfacenormals are equally likely (see statistical model section). Itcan be shown (e.g., see Witkin 1981) that p(σ, τ ) is thengiven by:

p(σ, τ ) = sin σ

π.

Consequently, the relative likelihood (i.e., the posterior dis-tribution before normalisation) is given by

p(σ, τ )p(θ ′|σ, τ ; u, v) = sin σ

π

n∏

i=1

π−1 sec2(θ ′i − τ) cos σ(1 + ui

/d sin σ)

(1 + ui/d sin σ)2 + (tan(θ ′

i − τ) cos σ + vi/d sin σ)2

. (8)

Note, once again, that for the cases in which (i) d is large or(ii) u and v are small, Eq. 8 is approximated by Eq. 3 fromWitkin (1981). Normalising this expression with respect toits integral over the slant and tilt ranges will yield the requiredposterior distribution.

3 Simulations

In this section, we present the results of simulations whichcompare the performance of the present method (referred toas the perspective projection method) and that from Witkin(1981) (referred to as the orthographic projection method).The models will ‘view’ images containing a number of ori-ented line segments obtained under perspective projectionfrom a planar surface. One hundred trials were conductedfor each pair of 5 slant (15◦, 30◦, 45◦, 60◦, 75◦) and 4 tilt(0◦, 45◦, 90◦, 135◦)values defining the surface normal direc-tion. For the sake of clarity, these are referred to as ‘world’slants and tilts (σw, τw), in contrast to the ‘estimated’ slantsand tilts (σe, τe) obtained from the estimator.

On the surface, the lines were randomly positioned (fol-lowing a 2D uniform distribution) and orientated (followinga uniform distribution). Note that for the simulations under-taken, for both the orthographic and the perspective model,the texels were projected into the image using perspectiveprojection. Consequently, the two models have access to thesame data, but for the perspective model the estimator forsurface pose uses a p.d.f. for texel orientation in the imagewhich takes location into account. The competing methodswill return estimates of surface slant and tilt over the ranges

123

Biol Cybern (2010) 103:199–212 205

Fig. 4 Contour plots derivedfrom posterior distribution fororthographic projection methodmodel of Witkin (1981) for arange of world slants and tilts(each subplot corresponds to adifferent world slant and tiltpair). Note that there are alwaystwo approximately equal peaksin the distribution reflecting theinherent ambiguity in this modelbetween planes which have tiltsdiffering by 180◦. Differentrandom samples of texels on thesurface will lead to one peakbeing favoured over the other atrandom

Slant = 15, Tilt = 0 Slant = 15, Tilt = 45 Slant = 15, Tilt = 90 Slant = 15, Tilt = 135




Slant = 75, Tilt = 0

−180 1800

90Slant = 75, Tilt = 45 Slant = 75, Tilt = 90 Slant = 75, Tilt = 135

Tilt

Sla

nt

ORTHOGRAPHIC PROJECTION METHOD

[0, π/2] and [−π, π ], respectively. These ranges define ahemisphere of possible surface normals.

Simulations are conducted for a scenario in which theviewing distance is 57 cm and the observer views a squareplane (with side length 2.24 m) through a circular aper-ture with a diameter of 20◦ of visual angle. On the plane,10,000 line segments are positioned at random (heterogene-ity assumption) with random orientation (isotropy assump-tion). Texel density on the surface is therefore approximately0.2 texels/cm2. Elements falling outside the 20◦ field of viewafter projection were not ‘seen’ by the model. The size ofthis plane was chosen so that, for this viewing distance, evenin the largest slant condition simulated, the projected texelsfilled the aperture (i.e., the top edge of the plane never enteredthe aperture).

One final aspect of the model to be specified is the deci-sion rule employed to convert the posterior distribution intoan estimate of slant and tilt. Witkin (1981) used the MAPsolution and accordingly we will present the same solutionin what follows. However, we will also show results usingan alternative decision rule (referred to as EXP) in whichthe spherical expectation is calculated over the slant and tilthemisphere (e.g., see Mardia and Jupp 1999). This decisionrule was also used since there is some suggestion in the liter-ature that it would better describe the behaviour of a humandecision maker who samples from the posterior distribution(see Mamassian and Landy 1998).

3.1 Results—Orthographic projection method

Figure 4 shows contour plots for the set of posterior distri-butions averaged over the 100 trials, for each of the (σw, τw)

conditions. Care should be taken when examining these plotssince we are representing a circular quantity (tilt) on a lin-ear scale, consequently the left and right ends of the tilt axisshould be regarded as equivalent. First note that the likeli-hood becomes more peaked as the slant increases and conse-quently the variability in the slant and tilt estimate decreaseswith the real surface slant. Note also that there appear to betwo peaks for each posterior. This result reflects the fact thatunder an assumption of orthographic projection image dataobtained for a plane in the (σw, τw) orientation is identicalto that when the plane is in the (σw, τw +π) orientation (seeEq. 1).

Figure 5a shows the estimated slant values from the ortho-graphic method over the range of world slants using the MAPdecision rule. The circles show each of the estimated slantvalues over the world slants and tilts range. The unbrokenline represents the mean estimated slant. These results aresimilar to those in Fig. 5 from Witkin (1981); the model per-forms best at intermediate and large world slants but tends toover-estimate slant particularly at low world slant values.

Figure 5b also shows a polar plot of the world (open sym-bols) and estimated (filled symbols) slant and tilt pairs usingthe MAP decision rule. The increasing size of the symbolsreflects the increasing slant conditions and allows easier dif-ferentiation between the estimates. Note that for the largervalues of world slant the estimated slants are quite accurate(they lie on a circle with the same radius as the world slant inquestion). However, often the estimated tilt is around 180◦away from the world tilt.

Figure 5c, d shows the corresponding results for the EXPdecision rule. Note in Fig. 5c that the estimated slant valuesshow marked underestimation of the corresponding worldslants. This is due to the fact that the posterior has two

123

206 Biol Cybern (2010) 103:199–212

Fig. 5 Recovered slant and tiltestimates for the Witkin (1981)model using an orthographicprojection method. a Slantestimates for MAP decision rule.b Mean slant and tilt estimatesfor MAP decision rule. c Slantestimates for EXP decision rule.d. Mean slant and tilt estimatesfor EXP decision rule. In a andc, each circle corresponds to theslant recovered from one of 100individual trials, the lineconnects the mean values overthese 100 trials. In b and d, openshapes correspond to thesimulated range of surface slantsand tilts in the world. Closedsymbols represent the recoveredmean slant & tilt estimates. Notethat the recovered tilt is oftenaround 180◦ in error (which canbe seen particularly clearly forthe 0◦ tilt case)

20

40

60

80

30

210

60

240

90

270

120

300

150

330

180 0

0 45 900

45

90

World Slant (degs)

Est

ima

ted

Sla

nt

(de

gs)

20

40

60

80

30

210

60

240

90

270

120

300

150

330

180 0

0 45 900

45

90

World Slant (degs)

Est

ima

ted

Sla

nt

(de

gs)

a

c

b

d

Tilt (degs)

World

EstimatedT

ilt (degs)

World

Estimated

)sged( tnalS)sged( tnalS

peaks and consequently the spherical expectation in the slantdimension is between the peaks (and hence close to zeroslant). The underestimation of slant can also be seen in thepolar plot of Fig. 5d in which many of the estimated slantslie at the origin. Since the largest peak will sometimes beat the appropriate value of tilt and sometimes be 180◦ awayfrom this value (due to random fluctuations in the projectedorientations), Fig. 5d also shows the tendency for the methodto misestimate tilt by 180◦.

In this section, we have shown that the estimates obtainedfor the orthographic method of Witkin (1981) are relativelyaccurate for recovery of slant (provided slant is sufficientlylarge) but suffer from large misestimates of tilt. In thenext section, we contrast the performance of the perspectivemethod developed in this article.

3.2 Results—perspective projection method

Figure 6 shows contour plots for the set of posterior distri-butions averaged over the 100 trials, for each of the (σw, τw)

conditions. Comparison with Fig. 4 indicates that there is aclear tendency for only one dominant peak using this methodrelative to the orthographic method. This is due to the factthat the perspective method developed here assesses the like-

lihood of observed orientations taking into account the linelocation on the plane. Consequently, only one of the peaks isfavoured. Note also that, once again, as the slant is increased,the reliability of the estimate appears to improve (the spreadof the peak decreases).

Figure 7a shows the estimated slant values from the per-spective method over the range of world slants using theMAP decision rule. The circles show the estimated slantvalues over all world slants and tilts. The unbroken linerepresents the mean estimated slant. These results show asimilar tendency to those in Fig. 5 from Witkin (1981) andsimilar to those seen in Fig. 5a of the present article; themodel performs best at higher world slants but tends toover-estimate slant at low world slant values. Note, how-ever, that for the orthographic model the overestimate per-sists even in the highest slant condition tested. For theperspective model the overestimate vanishes at high slantvalues.

Figure 7b shows a polar plot of the world (open symbols)and estimated (filled symbols) slant and tilt pairs for the MAPdecision rule. The distinction between these results and thosefrom the orthographic method are now rather striking. Theperspective method appears much better at estimating theworld tilt.

123

Biol Cybern (2010) 103:199–212 207

Fig. 6 Contour plots derivedfrom posterior distribution forthe new perspective projectionmethod model for a range ofworld slants and tilts (eachsubplot corresponds to adifferent world slant and tiltpair)... Note that now there isone dominant peak, thereby theambiguity seen in Figs. 4 and 6is reduced





Slant = 75, Tilt = 0

−180 1800

90Slant = 75, Tilt = 45 Slant = 75, Tilt = 90 Slant = 75, Tilt = 135

Tilt

Sla

nt

PERSPECTIVE PROJECTION METHOD

Fig. 7 Recovered slant and tiltestimates for the newperspective projection model.a. Slant estimates for MAPdecision rule. b. Mean slant andtilt estimates for MAP decisionrule. c. Slant estimates for EXPdecision rule. d. Mean slant andtilt estimates for EXP decisionrule. In a and c, each circlecorresponds to the slantrecovered from one of 100individual trials, the lineconnects the mean values overthese 100 trials. In b and d, openshapes correspond to thesimulated range of surface slantsand tilts in the world. Closedsymbols represent the recoveredmean slant & tilt estimates. Notethat the recovered tilt is nowmuch closer to the world tiltwhich corresponds to thedown-weighting of the secondpeak in the posterior distributionin Fig. 6

20

40

60

80

30

210

60

240

90

270

120

300

150

330

180 0

20

40

60

80

30

210

60

240

90

270

120

300

150

330

180 0

0 900

90

World Slant (degs)

Est

ima

ted

Sla

nt

(de

gs)

45

45

0 45 900

45

90

World Slant (degs)

Est

imat

ed S

lant

(de

gs)

a

c

b

d

Tilt (degs)

World

Estimated

Tilt (degs)

World

Estimated


Figure 7c, d shows the corresponding results for the EXPdecision rule. Note in Fig. 7c that the estimated slant valuesshow much smaller under-estimates of slant relative to theresults from the orthographic method (c.f. Fig. 5c). This is

due to the fact that the second peak in the posterior distri-bution is considerably smaller under the perspective methodand consequently the spherical expectation is closer to thehighest peak.

123

208 Biol Cybern (2010) 103:199–212

Fig. 8 Recovered slant and tiltestimates for the newperspective projection modelwhen field of view in increasedto 40◦. a. Slant estimates forMAP decision rule. b. Meanslant and tilt estimates for MAPdecision rule. c. Slant estimatesfor EXP decision rule. d. Meanslant and tilt estimates for EXPdecision rule. In a and c, eachcircle corresponds to the slantrecovered from one of 100individual trials, the lineconnects the mean values overthese 100 trials. In b and d, openshapes correspond to thesimulated range of surface slantsand tilts in the world. Closedsymbols represent the recoveredmean slant & tilt estimates. Notethat errors in slant and tilt arenow much smaller for the EXPdecision rule (although theyremain fairly similar to thoseseen in Fig. 7 for the MAP rule)

20

40

60

80

30

210

60

240

90

270

120

300

150

330

180 0

20

40

60

80

30

210

60

240

90

270

120

300

150

330

180 0

0 45 900

45

90

World Slant (degs)

Est

imat

ed S

lant

(de

gs)

0 900

90

World Slant (degs)

Est

imat

ed S

lant

(de

gs)

45

a

c

b

d

Tilt (degs)

World

EstimatedT

ilt (degs)

World

Estimated


Relative to the orthographic simulation (Fig. 5d), Fig. 7dreveals much better approximations to the world slant andtilt values using the EXP decision rule under the perspec-tive method. The tilts are accurately recovered and the slantsshow a similar pattern of underestimation to that shown inFig. 7c.

In a further simulation, we show that the perspectivemethod can produce very accurate results when the fieldof view is increased. We repeated the simulation shown inFigs. 6 and 7 but increased the field of view to 40◦. To accountfor the increase in the texels now falling inside the aperturewe reduced the number of texels on the surface to 2,500.

At larger fields of view the perspective information isgreater and so we would expect the perspective method toperform particularly well in this condition. This expectationis confirmed to some extent in Fig. 8; when compared tothe simulations conducted with small field of view and illus-trated in Fig. 7 the MAP decision rule performs slightly bet-ter and the EXP decision rule performs much better. Withthese results in mind, it is worth noting that human observershave been shown to increasingly use perspective informa-tion at larger fields of view in the estimation of structurefrom motion (e.g., Eagle and Hogervorst 1999; Hogervorstand Eagle 2000).

3.3 Model comparison

We compared the orthographic and perspective projectionmethods and the EXP and MAP decision rules in a full fac-torial 2×2 design. We ran the simulations described in theprevious sections (with 20◦ field of view) 10 times, recover-ing mean slant and tilt estimates over the 100 trials for eachof the 20 world slant and tilt pairs and for each of the fourmodels. For each of the 10 simulations we then calculatedthe RMS error in both estimated slant and tilt over the worldslant and tilt pairs. Mean RMS slant and tilt error over the 10simulations are shown in Fig. 9a and b, respectively. Note theerror bars represent 95% confidence intervals and are smallerthan the symbols used in Fig. 9a.

From Fig. 9 it is clear that the perspective models out-perform the orthographic models. As noted in the previoussection this is particularly clear for tilt estimation for whichRMS errors are around 100◦ in the orthographic case.

Two 2-factor (projection model×decision rule) ANOVAs,one for slant and one for tilt estimation were conducted.For slant estimation, there was a significant main effect ofboth projection model (F(1, 9) = 53926.4, P < 0.0001)

and decision rule (F(1, 9) = 24640.2, P < 0.0001) and aninteraction between these factors (F(1, 9) = 22388.8, P <

123

Biol Cybern (2010) 103:199–212 209

Fig. 9 Comparison of modelsusing orthographic andperspective methods and theMAP and EXP decision rules.Mean RMS error in estimateda. slant and b. tilt over 10replications. Error bars (whichare smaller than the symbols ina.) represent 95% confidenceintervals

Orthographic Perspective10

15

20

25

30

35

40

RM

S E

rror

(de

gree

s)Model

Slant Estimation

EXPMAP

Orthographic Perspective0

50

100

150

RM

S E

rror

(de

gree

s)

Model

Tilt Estimation

EXPMAP

ba

Table 1 Results of fourrepeated measures t-testsundertaken to assess differencesbetween estimated slants inconditions sharing one commoncomponent (decision rule orprojection method)

Comparison t df P-value (2-tailed)

(ORTH, EXP) vs. (ORTH, MAP) 363.861 9 P < 0.0001

(PERSP, EXP) vs. (PERSP,MAP) 1.639 9 P = 0.1356

(ORTH, EXP) vs. (PERSP, EXP) 233.179 9 P < 0.0001

(ORTH, MAP) vs. (PERSP, MAP) 4.644 9 P = 0.0012

Table 2 Results of fourrepeated measures t-testsundertaken to assess differencesbetween estimated tilts inconditions sharing one commoncomponent (decision rule orprojection method)

Comparison t df P-value (2-tailed)

(ORTH, EXP) vs. (ORTH, MAP) −8.289 9 P < 0.0001

(PERSP, EXP) vs. (PERSP,MAP) −2.252 9 P = 0.0508

(ORTH, EXP) vs. (PERSP, EXP) 27.487 9 P < 0.0001

(ORTH, MAP) vs. (PERSP, MAP) 31.777 9 P < 0.0001

0.0001). Repeated measures t tests were also conducted toassess the differences between the four conditions containingone common factor level. The results are presented in Table 1.Note that for both the MAP and EXP decision rules, the per-spective model has significantly smaller error (although thisimprovement is not materially important in the case of theMAP decision rule). Note also that there is no difference inperformance in the perspective condition between the MAPor EXP decision rules.

For tiltestimationtherewasasignificantmaineffectofbothprojection model (F(1, 9) = 1054.4, P < 0.0001) and deci-sion rule (F(1, 9) = 37.124, P < 0.0002) and an interactionbetween these factors (F(1, 9) = 15.7, P < 0.005). Similarto the analysis for slant, four repeated measures t tests werealso conducted and the results are presented in Table 2. Theresultsclearly indicate that theperspectivemodel isbetter thanthe orthographic model and that this improvement is seen irre-spective of whether the MAP or EXP decision rule is used.

Figure 10 shows the distributions of absolute errors in esti-mation of slant and tilt for each of the 4 models discussed.

Error frequencies were calculated over the 20 world slantand tilt pairs used in previous simulations and over 100 tri-als (i.e., 2,000 errors in total). The distributions of errorsin slant estimation are shown in Fig. 10a. These are sim-ilar across models except for the model using the ortho-graphic method and the EXP decision rule (bottom left panel)which tends to produce larger errors in recovered slant (thisis expected from Fig. 5c). The distributions of errors in tiltestimation are shown in Fig. 10b. Note that the two modelsusing the orthographic decision rule have peaks in their errordistributions at 0◦ and 180◦ due to the inherent ambiguityin tilt recovery under this scheme. The tilt error distribu-tions for the two perspective projection models also have apeak at zero but, in contrast to the orthographic case, thelikelihood of obtaining an error in tilt estimation of 180◦ isgreatly reduced.

Taken together, the data presented in Figs. 9 and 10clearly indicate that the perspective projection method out-lined in this article performs better than the orthographicmethod.

123

210 Biol Cybern (2010) 103:199–212

0 45 900

1000(ORTH, MAP)

0 45 900

1000

0 45 900

1000

0 45 900

1000

(PERSP, MAP)

(PERSP, EXP)(ORTH, EXP)

0 90 1800

400

800

0 90 1800

400

800

0 90 1800

400

800

0 90 1800

400

800

(PERSP,MAP)(ORTH,MAP)

(PERSP, EXP)(ORTH, EXP)

Tilt Error (Degrees)

Fre

quen

cy

Fre

quen

cy

a b

Slant Error (Degrees)

Fig. 10 Comparison of absolute errors in slant and tilt obtained using the orthographic and perspective methods and the MAP and EXP decisionrules. a. Histogram of absolute slant errors. b. Histogram of absolute tilt errors

4 Discussion

4.1 Summary

We have derived a new model for the recovery of planar sur-face slant and tilt under perspective projection of orientedline segments. The model is a generalisation of the ortho-graphic method presented in Witkin (1981) and relies uponassumptions of isotropy and homogeneity of texture elementson the plane. The new perspective model was compared tothe Witkin (1981) model in a slant and tilt estimation taskfor a range of simulated surface pose parameters. The newperspective method outperformed the Witkin model and inparticular it did not suffer from the ambiguity in surface tiltdue to the orthographic projection process. An expected valuedecision rule rather than the MAP solution was also investi-gated. When the perspective projection method developed inthis article was used, the model incorporating the expectedvalue decision rule was found to have some similarities withhuman slant estimation in a human behavioural study (seesupplementary materials).

4.2 Why are there errors?

Although the methods implemented produced reasonableestimates of slant and tilt (using either decision rule), thereare still large and systematic deviations from the true valueof surface orientation. It should, however, be noted that theperformance is rather good bearing in mind that we are onlyconsidering the orientation of markings on the surface. Thereare many other sources of texture information available inthe scene, as noted in the introduction (e.g., the density, sizeetc.). Adding further independent likelihood functions forthese information sources would lead to smaller errors inestimated surface orientation.

4.3 Orthographic vs. perspective method

The new model based on perspective projection was shownto produce a more accurate approximation of surface posethan that of Witkin (1981). The results shown in Figs. 5a,b and 7a, b for the orthographic and perspective methods,respectively, and using the MAP decision rule, indicate thatthe perspective method is superior in terms of recovery ofpairs of world slant and tilt values. In particular the perspec-tive method is greatly superior in terms of estimation of tiltsince the orthographic method regularly recovers tilt with a180◦ error (Fig. 10).

In addition, the perspective method is also shown to out-perform the orthographic method when a spherical expecta-tion decision rule was implemented (Figs. 5c, d; 7c, d). Usingthis decision rule, the orthographic method greatly underes-timates slant (in fact recovered slant is close to zero for val-ues of world slant less than around 45◦ since the expectationlies approximately between the two peaks). Furthermore, therecovered tilt is still subject to systematic errors of around180◦ caused by the inherent ambiguity present in the assump-tion of orthographic projection (Fig. 10b). In contrast, for theperspective method the spherical expectation decision ruleproduces a less pronounced underestimate of slant and accu-rate estimates of tilt. These findings are summarised in Figs. 9and 10 which compare the four models implemented directly.

The results of the experiment undertaken in the supple-mentary materials indicate that human observers appear tobe sensitive to the information arising from the perspectiveprojection of orientated line segments. In particular, we foundevidence for a significant increase in perceived slant whenimage orientations were defined by perspective orientationrelative to orthographic projection (see supplemental materi-als). Consequently, we suggest that models of human surfacepose recovery should incorporate a perspective projectionprocess.

123

Biol Cybern (2010) 103:199–212 211

4.4 MAP vs. EXP decision rule

The MAP decision rule has the advantageous property that itminimises the expected error of the estimate. In this sense itis the optimal decision rule for our estimator to implement.However, it has been suggested that when human observ-ers are faced with similar estimation problems and thereis no explicit cost involved in making an incorrect deci-sion then it is more appropriate to model the response bysampling from the posterior distribution (Mamassian andLandy 1998; Mamassian et al. 2002). In this case the spher-ical expectation should reflect the mean of the observer’sresponses.

Furthermore, it is possible to process slant and tilt inde-pendently to recover the MAP solution (i.e., first find the tilt atwhich the posterior is maximised and then examine all valuesof slant for this tilt value to find the MAP solution). In con-trast the EXP solution can not be recovered in this way sinceit requires non-independent processing of slant and tilt. Con-sequently, unlike the EXP solution, use of a MAP solutionis not consistent with the findings of Seyama et al. (2000).In this study, the authors tested whether adapting to slantedsurfaces depended on the tilt difference between the adaptingand test surfaces. They found that the slant aftereffect did infact depend on tilt difference, suggesting non-independentprocessing of slant and tilt.

Commonly, human observers tend to underestimate sur-face slant when only texture information is present (e.g., seeGibson 1950; Braunstein 1968). As shown in Fig. 7 (and seeFig. S2 in supplemental materials), results obtained usingthe perspective method with a spherical expectation decisionrule exhibit the moderate underestimation of slant associ-ated with human observers. As a consequence this schememay be able to explain some of this systematic underestima-tion.

4.5 Invalid assumptions for human observers?

The key assumption at the heart of both the orthographicand perspective method maximum likelihood estimators isthat surface texture orientations are isotropic. Indeed thereis some evidence that this is the case for human observ-ers (Rosenholtz and Malik 1997; Knill 2003; Warren andMamassian 2003), however, this assumption is clearly vio-lated when, for example, one looks at a field of grass. Clearly,in such a situation the predominant orientation is vertical.This assumption, then may seem inappropriate for a modelof human slant recovery. Knill (2003), however, has pro-posed and provided evidence for a framework in which priorassumptions such as isotropy and homogeneity are ‘switchedoff’ by human observers when there is sufficient informationin the image to suggest that the assumption is invalid. This

results in a more general (less constrained) analysis of theworld. Consequently, the fact that we can find examples ofsurfaces which violate key assumptions of the model pre-sented here does not necessarily rule out its potential contri-bution to the description of human slant recovery.

References

Blake A, Marinos C (1990) Shape from texture—estimation, isotropyand moments. Artifi Intell 45:323–380

Blake A, Bülthoff HH, Sheinberg DL (1993) Shape from texture—idealobservers and human psychophysics. Vis Res 33:1723–1737

Brady M, Yuille A (1984) An extremum principle for shape from con-tour. IEEE Trans Pattern Anal Mach Intell 6:288–301

Braunstein ML (1968) Motion and texture as sources of slant informa-tion. J Exp Psychol 78:247–253

Buckley D, Frisby JP, Blake A (1995) Does the human visual systemimplement an ideal observer theory of slant from texture?. Vis Res36:1163–1176

Eagle RA, Hogervorst MA (1999) The role of perspective informationin the recovery of 3D structure-from-motion. Vis Res 39:1713–1722

Gårding J (1993a) Direct estimation of shape from texture. IEEE TransPattern Anal Mach Intell 15:1202–1208

Gårding J (1993b) Shape from texture and contour by weak isotropy.J Artif Intell 64:243–297

Gibson JJ (1950) The perception of the visual world. Houghton Mifflin,Boston

Gruber HE, Clark WC (1956) Perception of slanted surfaces. PerceptMot Ski 6:97–106

Hogervorst MA, Eagle RA (2000) The role of perspective effectsand accelerations in perceived three-dimensional structure-from-motion. J Exp Psychol Hum Percept Perform 26:934–955

Horn BKP, Brooks MJ (1989) Shape from shading. MIT Press, Cam-bridge, Mass

Kanatani K (1984) Detection of surface orientation and motion fromtexture by a stereological technique. Artif Intell 23:213–237

Knill DC (1998) Surface orientation from texture: ideal observers,generic observers and the information content of texture cues. VisRes 38:1655–1682

Knill DC (2003) Mixture models and the probabilistic structure of depthcues. Vis Res 43:831–854

Koenderink JJ, van Doorn AJ, Kappers AML (1992) Surface perceptionin pictures. Percept Psychophys 52:487–496

Li A, Zaidi Q (2000) Perception of three-dimensional shape from textureis based on patterns of oriented energy. Vis Res 40:217–242

Malik J, Rosenholtz R (1997) Computing local surface orientation andshape from texture for curved surfaces. Int J Comput Vis 23:149–168

Mamassian P, Landy MS (1998) Observer biases in the 3D interpreta-tion of line drawings. Vis Res 38:2817–2832

Mamassian P, Landy MS, Maloney LT (2002) Bayesian modelling ofvisual perception. In: Rao R, Olshausen B, Lewicki M (Eds.) Prob-abilistic models of the brain: perception and neural function. MITPress, Cambridge, MA, pp. 13–36

Mardia KV, Jupp PE (1999) Directional statistics, 2nd ed. Wiley,Chichester

Mingolla E, Todd JT (1986) Perception of solid shape from shading.Biol Cybern 53:137–151

Rosenholtz R, Malik J (1997) Surface orientation from texture: isotropyor homogeneity (or both)?. Vis Res 37:2283–2293

Seyama J, Takeuchi T, Sato T (2000) Tilt dependency of slant after-effect. Vis Res 40:349–357

123

212 Biol Cybern (2010) 103:199–212

Stone JV (1993) Shape from local and global analysis of texture. PhilTrans R Soc Lond B 339:53–65

Velisavljevic L, Elder JH (2006) Texture properties affecting the accu-racy of surface attitude judgements. Vis Res 46:2166–2191

Warren PA, Mamassian P (2003) The dependence of slant perceptionon texture orientation statistics. J Vis, 3, abstract 847

Witkin AP (1981) Recovering surface shape and orientation from tex-ture. Artif Intell 17:17–45

123

recovery of surface pose from texture orientation...

Documents