estimating the focus of expansion in analog vlsithese chips can exploit the inherent parallelism...

|L« International Journal of Computer Vision 28(3), 261-277 (1998)© 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.

Estimating the Focus of Expansion in Analog VLSI

IGNACIO S. MCQUIRK*Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology,

Cambridge, MA [email protected]

BERTHOLD K.P. HORNArtificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139

[email protected]

HAE-SEUNG LEEMicrosystems Technology Laboratories, Massachusetts Institute of Technology, Cambridge, MA 02139

hsleeCrmtl.mit.edu

JOHN L. WYATT, JR.Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139

[email protected]

Received October 9, 1996; Revised September 8, 1997; Accepted September 9, 1997

Abstract. In the course of designing an integrated system for locating the focus of expansion (FOE) from asequence of images taken while a camera is translating, a variety of direct motion vision algorithms based on imagebrightness gradients have been studied (McQuirk, 1991, 1996b). The location of the FOE is the intersection of thetranslation vector of the camera with the image plane, and hence gives the direction of camera motion. This paperdescribes two approaches that appeared promising for analog very large scale integrated (VLSI) circuit implementa-tion. In particular, two algorithms based on these approaches are compared with respect to bias, robustness to noise,and suitability for realization in analog VLSI. From these results, one algorithm was chosen for implementation.This paper also briefly discuss the real-time analog CMOS/CCD VLSI architecture realized in the FOE chip.

Keywords: focus of expansion, motion vision, passive navigation, analog VLSI

1. Introduction

In recent years, some attention has been given to thepotential use of custom analog VLSI chips for early vi-sion processing problems such as optical flow (Tanner

Funding for this work was provided by the National Science Founda-tion and DARPA under contracts MIP-8814612 and MIP-9117724.Ignacio S. McQuirk was supported by an NSF graduate fellowshipand a Cooperative Research Fellowship from AT&T Bell Labs.'Present address: Maxim Integrated Products, Sunnyvale, CA94086.

and Mead, 1986), smoothing and segmentation (Yangand Chiang, 1990; Keast and Sodini, 1993) orientation(Standley, 1991), depth from stereo (Hakkarainen andLee, 1993), edge detection (Dron, 1993) and alignment(Umminger and Sodini, 1995). The key features ofearly vision tasks such as these are that they involve per-forming simple, low-accuracy operations at each pixelin an image or pair of images, typically resulting in alow-level description of a scene useful for higher levelvision. This type of processing is often well suited toimplementation in analog VLSI, resulting in compact,

262 McQuirk et al.

high speed, and low power solutions. Through a closecoupling of processing circuitry with image sensors,these chips can exploit the inherent parallelism oftenexhibited by early vision algorithms, allowing for anefficient match between form and function. This paperdetails some of the algorithms as well as the analogVLSI implementation developed in the application ofthis focal-plane processing approach to the early visiontask of passive navigation.

An important goal of motion vision is to estimatethe 3D motion of a camera based only on the mea-sured time-varying images. Traditionally, there havebeen two basic approaches to this problem. In feature-based methods, an estimate of motion and scene struc-ture is found by establishing the correspondence ofprominent features such as edges, lines, etc., in an im-age sequence (Jain, 1983; Dron, 1993). In motion field-based methods, the optical flow (Horn and Schunk,1981) is used to approximate the projection of the three-dimensional motion vectors onto the image plane andfrom this an estimate of camera motion and scene depthcan be found (Bruss and Horn, 1983). Both the opti-cal flow calculation and the correspondence problemhave proven to be difficult in terms of reliability and,more importantly, implementation. In keeping withour paradigm of local, low-level, parallel computation,we have explored methods which directly utilize imagebrightness information to recover motion (Horn, 1990;McQuirk, 1991).

The introduction of the focus of expansion (FOE)for the case of pure translation simplifies the generalmotion problem substantially. The FOE is the inter-section of the translation vector of the camera with theimage plane. This is the image point towards whichthe camera is moving, as shown pictorially in Fig. 1.With a positive component of velocity along the opticaxis, image features will appear to move away from theFOE and expand, with those closer to the FOE mov-ing slowly and those further away moving more rapidly.Through knowledge of the camera parameters, the FOEgives the direction of 3D camera translation. Once thelocation of the FOE has been ascertained, we can es-timate distances to points in the scene being imaged.While there is an ambiguity in scale, it is possible tocalculate the ratio of distance to speed. This allowsone to determine the time-to-impact between the cam-era and objects in the scene. Applications for such adevice include the control of moving vehicles, systemswarning of imminent collision, obstacle avoidance inmobile robotics, and aids for the blind.

Figure I . Illustration of the passive navigation scenario, showingthe definition of the focus of expansion as the intersection in thecamera frame of reference of the camera velocity vector with theimage plane.

A variety of direct methods for estimating the FOEwere explored for implementation in analog VLSI; twoof the more promising algorithms considered are pre-sented in this paper. We chose one for actual realiza-tion in an integrated system and the architecture usedfor this FOE chip will also be described.

2. The Brightness-Change Constraint Equation

The brightness-change constraint equation (BCCE)forms the foundation of various algorithms for rigidbody motion vision (Negahdaripour and Horn, 1987a;Horn and Weldon, 1988) and is also the basis forthe variants that we have explored for potential im-plementation in analog VLSI. This equation relatesthe observed brightness gradients in the image withthe motion of the camera and the depth map of thescene. It is derived from the three basic assumpt-ions:

• A pin-hole model of image formation.• Rigid body motion in a fixed environment.• Instantaneously constant scene brightness.

We proceed by formalizing each assumption in turn.

Estimating the Focus of Expansion in Analog VLSI 263

Figure 2. Viewer-centered coordinate system and perspective pro-jection.

and rotational velocity us = (a^., a>y, &>;)7', the motionof a world point R relative to the camera satisfies:

dK.~dt

= -t - (w x R) (4)

Instantaneously Constant Scene Brightness

A common method used to relate the apparent motionof image points to the measured brightness E ( x , y ) isthrough the constant brightness assumption. We as-sume that the brightness of a surface patch remainsconstant as the camera moves, implying that the totalderivative of brightness is zero:

A Pin-Hole Model of Image Formation

Following (Bruss and Horn, 1983; Negahdaripour andHorn, 1987a; Horn and Weldon, 1988), a viewer-basedcoordinate system with a pin-hole model of imageformation is adopted as depicted in Fig. 2. A worldpoint

R = (X, Y, Z)7

is mapped to an image point

r ^ ( x . y , f ) T

(1)

(2)

dE~dt

dr~dt

(5)

\ •/'

= E, + V£

8E "()E ()E, 9 x ' ().y

E,=—, V£=(£,,£,,,0)7 'of

In practice, the constant brightness assumption hasbeen shown to be valid for a large class of image se-quences (Horn and Schunk, 1981).

By differentiating the perspective change equationand substituting both the rigid body and constantbrightness assumptions, we find the general brightness-change constraint equation (BCCE) for both translationand rotation:

v • a; s • tE, + —— + o— = 0

/ R z (6)

using a ray passing through the center of projectionplaced at the origin of the coordinate system. The im-age plane Z = f , where / is the principal distance, ispositioned in front of the center of projection for con-venience. The optic axis is the perpendicular from thecenter of projection to the image plane and is parallelto the Z-axis. The x- and y-axes of the image plane arealso parallel to the X - and V-axes and emanate fromthe principal point (0, 0, /) in the image plane.

The world point R and the image point r are geome-trically related by the perspective projection equation(Horn, 1986):

The s and v vectors are strictly properties of the imagebrightness gradients along with the x and y position inthe image:

- f E ,- f E ,

x E x + v£ys =

^ + f 2 E , + y ( x £,+>•£„) '- f 2 E , - x ( x E , + y E , )

f ( y E , - x E , )(7)

R(3)R z

Rigid Body Motion in a Fixed Environment

Assuming that the camera moves relative to a fixedenvironment with translational velocity t = fc, ?y, r;)^

In order to investigate the case for translation only,we set the camera rotation w to zero and define theFOE, ro, as the intersection of the translational veloc-ity vector t with the image plane. Using the perspec-tive projection equation, we find:

_/1t z

i-o = (xo, vo, /) = (8)

264 McQuirk et al.

Simplifying the constraint equation in this case re-sults in:

£,+^(V£.( r - ro) )=0 (9)

and rewriting this gives our final result for the BCCEunder translation only:

r E , + ( x - xo) E, + (y - yo) E, =0 (10)

The time-to-impact T = Z/?; is the ratio of the depth tothe velocity parallel to the optic axis. This is a measureof the time until the plane parallel to the image planeand passing through the center of projection intersectsthe corresponding world point. The time-to-collisionof the camera is the time-to-impact at the focus of ex-pansion.

Examining Eq. (10), we note that the time-to-impactr is a function of x and v, while the FOE is aglobal parameter. Given a time-to-impact map or as-suming a special form of scene geometry such as aplane (Negahdaripour and Horn, 1986), the problemis overdetermined and we can recover the FOE usinga least-squares minimization approach. However, wewere interested in the more general case where the time-to-impact map and the motion are both unknown. Inthis situation, the problem is underdetermined and amore creative method must be found.

3. Two Algorithms Suitable for Analog VLSI

The paramount consideration that we used when ex-amining algorithms for estimating the FOE was thefeasibility of implementing them in analog VLSI. Ad-hering to the focal-plane processing approach in theanalog domain necessitates algorithms which use low-level computations operating locally and in parallel onimage brightness. Furthermore, in order to enhance thelikelihood of actual implementation, any algorithm wepropose must be exceedingly simple. The BCCE givesa useful low-level relationship between the location ofthe focus of expansion in the image plane and the ob-served variation of image brightness, and we wouldlike to exploit this to estimate the FOE. Unfortunately,this relation also includes the unknown time-to-impactT . In order to still use the BCCE without explicit knowl-edge of T , two approaches were proposed. In the firstapproach, image points where the time variation ofthe brightness, Ei, is zero are identified. Ideally, theFOE would be at the intersection of the tangents to theiso-brightness contours at these so-called "stationary"

points. In the second approach, the observation is madethat when given an incorrect estimate of the locationof the FOE, solving the BCCE for r gives rise to depthestimates with incorrect sign. However, depth is pos-itive and thus an estimate for the FOE can be foundwhich minimizes the number of negative depth values.

3.1. The Stationary-Points Algorithm

Image points where E, = 0 provide important con-straints on the direction of translation; they are referredto as stationary points (Horn and Weldon, 1988). WithE, = 0, the first term of the BCCE drops out and theconstraint at the stationary points becomes one of or-thogonality between the measured s and the translationvector t:

s • t = 0

Previous approaches utilized these special constraintsto estimate t directly as opposed to finding the FOE.A least-squares minimization sum over the stationarypoints can be formed with an additional term utilizinga Lagrange multiplier to insure that the magnitude oftis normalized to unity. This normalization is necessaryin order to account for the inherent scale factor ambi-guity in t. The solution to this minimization problem isitself an eigenvector/eigenvalue problem: the estimatefor t which minimizes the sum is the eigenvector cor-responding to the smallest eigenvalue (Negahdaripourand Horn, 1987a; Horn and Weldon, 1988). Calcula-tion of eigenvectors and eigenvalues in analog hard-ware is possible, but difficult (Horn, 1990).

To find a solution more amenable to implementation,we can instead perform a similar minimization now interms of the FOE, and this leads to a simple linear prob-lem. The constraint at each stationary point becomesa line:

V£ . (r - ro) = (x - ,to)£, + (v - yo)E,. = 0

Figure 3 demonstrates the simple geometry of a sta-tionary point. For illustrative purposes, we have con-structed a Mondrian image consisting of a dark circulardisk on a white background. As such, the image bright-ness gradient V£ points everywhere outward from thedisk. A stationary point occurs on the disk when thebrightness gradient is perpendicular to the vector em-anating from the FOE. The focus is located at the in-tersection of the tangents to the brightness gradient atthese points.


ApparentMotion

.StationaryPointE(=O

BrightnessGradient

Figure 3. An illustration of the simple geometry in the image planeassociated with stationary points using a Mondrian image consistingof a dark disk. The FOE is located at the intersection of the tangentsat the stationary points.

Of course, in a real image these tangent lines willnot precisely intersect. In such a case, we can thenfind a solution by minimizing the sum of the squares ofthe perpendicular distances from i-o to these constraintlines, weighted by the norm of the brightness gradient||V£||:

min^W(£',)(V£.(r-ro))2 (12)" re/

Here the sum is over the entire image / and a weightingfunction W(E,) is used to allow only the contributionsof those constraints that are considered to correspond tostationary points. Clearly, W(-) is constructed to weighinformation more heavily at image points where E, %0. The additional weighting by the brightness gradientmagnitude is useful, because both noise and distortionreduce our confidence in the direction of V£ when[|V£|| is small.

The solution ro to this quadratic minimization prob-lem satisfies the matrix equation:

^W(£,)(V£V£7') rore/

= W^KVfVf^r (13)re/

Note that if the direction of the brightness gradientat the stationary points is uniformly distributed, thenthe condition number of the matrix premultiplying ro

is near unity and hence the solution of the matrix equa-tion well-behaved.

It is important to observe that the actual functionalform of the weighting with E, is not essential, as long asthe weight is small for large E, and large for small E,.Thus, in practice we are able to use a simple functionsuch as a cutoff on the absolute value of E,:

if|£,| < i]otherwiseW ( E , , I ] ) = (14)

where the width of the function is set by the parame-ter f ] .

Posing the problem in terms of the FOE leads toa simple linear equation and it is this which is quiteappealing for realization in analog VLSI. However, adrawback to this approach is that our estimate of theFOE relies on information garnered from the station-ary points, and these points usually form a small subsetof the overall image. The small number of points con-tributing to the solution as well as the selection of thesepoints via the weighting function raises the question ofnoise immunity.

It is also important to note that the algorithm can failif the range of r is too large. The conflict here is in dif-ferentiating between stationary points which constrainthe location of the FOE and stationary points due todistant backgrounds which do not. For example, allpoints along a horizon in the image have Z -^ oo re-sulting in E, w 0 even though they need not satisfyV£ • (r - ro) = 0. In such a situation, the solution willbe strongly biased by these points, all of which will beselected by the weighting function to contribute to thesolution. This difficulty, which is characteristic of allmethods utilizing information obtained from stationarypoints, can certainly be alleviated using more sophis-ticated processing. However, this kind of higher-levelunderstanding is outside our single-chip analog VLSIframework, and hence was eliminated from considera-tion.

3.2. The Depth-is-Positive Algorithm

The depth-is-positive approach was formulated in anattempt to address the problems associated with thestationary-points algorithm. This method for estimat-ing the FOE is based on the idea that the depth cal-culated from the BCCE with the correct location ofthe FOE should be positive (Negahdaripour and Horn,1987b). Since the BCCE only involves the ratio ofdepth to forward velocity, there is an overall sign

266 McQuirk et al.

ApparentMotion

BrightnessGradient \ Forbidden

^ Half-Plane\

Permissible ss

Half-Plane•

FOE

Figure 4. An illustration of the constraint provided by imposingpositive depth at an image point. The tangent to the image brightnessdivides the image plane into permissible and forbidden half-planes.

ambiguity since this velocity can be either positiveor negative and in the latter case the focus of expan-sion would become a focus of constriction. However,if we assume a priori that we have forward motion thenwe can require that the estimated T found by solvingthe BCCE be positive:

sign(r) = sign(£',V£' • (ro - r)) > 0

Returning to our simple Mondrian image. Fig. 4 il-lustrates the constraint line found by imposing positivedepth. For each image point, the tangent to the bright-ness gradient can be drawn. If the FOE estimate isplaced on one side of this line, T will evaluate positive,while on the other side it will evaluate negative. Hence,each image point constrains the FOE to lie in a permis-sible half-plane and the true FOE must therefore lie inthe region formed by the intersection of all the permis-sible half-planes of the points in the image. Each con-straint provided by imposing positive depth is weakerthan that provided by a stationary point. However, thedepth-is-positive constraint applies at all image pointswith nonzero brightness gradient V£, not just a se-lect few where E, = 0, and this observation holds outthe possibility that our solution can potentially rely onsubstantially more points and may be more robust as aconsequence.

To cast the depth-is-positive constraint in terms of aminimization problem, we can formulate an error sum

(15)

using only the sign of the calculated depth.

min^M(-T(ro))r" r^

=min^M(£ ,V£ . ( r - ro ) )= mmro re/

where u(t) is the unit step function. The solution tothis problem attempts to find the location of the FOEthat minimizes the number of image points which givenegative depth values. This is a difficult problem tosolve since the sum is not convex. To ameliorate thisdifficulty, we can attempt to include some convexityin addition to the sign information in the minimizationsum. To motivate the form of this sum, we can makethe following observation. If we use an incorrect valuefor the FOE of i-y and solve for the resulting T' givenby the BCCE, we find that:

T_^ V£ • (r - r;,)T - V £ - (r-ro)'

(16)

Hence, not only can we get negative depth values foran incorrect FOE location, but they can also be large inmagnitude. Thus it seems reasonable to augment theerror sum ofEq. (15) to:

nmy^f^ro))2^-'!-^)))rl) reT

mm

= min V(V£ . (r - i-o))2 u (£,V£ . (r - ro)),r" ~~1

re/(17)

where not only do we attempt to minimize the numberof negative depth values, but we also attempt to mini-mize their magnitudes as well. The explicit weightingwith E, in the left-hand side of Eq. (17) is intendedto enhance noise robustness, since we naturally have ahigher confidence in our measurements if E, is large.However, for the case of a distant background, non-idealities such as camera noise will give small nonzerovalues of E, for many pixels where in principle E, = 0and a substantial fraction of these will introduce erro-neous terms into the right-hand side of Eq. (17). Theconflict here is that in the low-noise case we especiallyvalue constraints from points where E, = 0 as is evi-denced by the stationary-points algorithm itself, but inthe high-noise case small values of measured E, maycome from the distant background and give negativedepth values inappropriately.

Note the partial similarity in form between thestationary points Eq. (12) and the depth-is-positive


Eq. (17). Clearly, any even power in this sum wouldsuffice to give the desired convexity. Hence, the choiceof a quadratic is rather arbitrary and in fact is motivatedsolely by its simplicity, a necessary feature from ourimplementation standpoint.

To see that the sum in Eq. (17) is convex, recall thatthe sum of convex functions is itself a convex func-tion (Fleming, 1965). Each term in (17) consists of theproduct of a rank-1 quadratic form on the plane and afunction which takes the two values zero and unity, withthe transition between zero and unity occurring alongthe line where the quadratic form vanishes. Thus, thegraph of each term vanishes on the half-plane (or ev-erywhere for the special case E, = 0) and is quadraticon the other half-plane. Each term is therefore con-vex (though not strictly convex) and thus the sum mustbe as well. Note that each term is also continuouslydifferentiable, and thus the sum is also continuouslydifferentiable.

Though each term is not strictly convex, the sum usu-ally is. To see this note that the sum of a set of scalarfunctions with positive semidefinite Hessian matrices isa function with a Hessian matrix which is positive def-inite everywhere except at points where some nonzero

vectors lie in the null space of the Hessian matrix ofevery term in the sum, generally a sparse or empty setof points.

Since the minimization sum is convex, any localminimum is a global minimum. Of course, the finalestimate for the location of the FOE must be found it-eratively, as there is no closed-form solution. This isnot necessarily a drawback since a system implement-ing this minimization can utilize an iterative feedbackloop to find the solution.

3.3. Algorithm Performance

In order to compare the performance of our two ap-proaches, we took raw image data during a motion tran-sient and processed it with each algorithm. Figure 5shows a sample series of images taken from the 64 x 64embedded imager on the FOE chip during a motiontransient. A simple scene consisting of a grid of blackdisks on a white background was constructed, result-ing in Mondrian-like images such as the ones that wehave described. Camera motion was always forwardtowards the target with the orientation of the camera

Figure 5. A sample motion sequence with the FOE placed in the upper-left hand comer as indicated by the cross. Starting at the top-left, thesequence in the figure proceeds from left to right.

268 McQuirk el al.

viewing direction relative to the motion set preciselyvia rotation stages. This allowed the placement of theFOE anywhere inside the field of view.

Of course, the mapping from the 3D motion pro-duced in the lab and the resulting location of the FOErequires explicit knowledge of the camera parame-ters, most notably the location of the principal pointin the image plane as well as the principal distance.These were obtained using an internal camera calibra-tion technique based on rotation (Stein, 1993). Underpure rotation, the position of a point in the image afterrotation depends only on the camera parameters and thelocation of the point in the image before rotation. Inorder to estimate these parameters, a series of imagesfor various rotations of the camera about two indepen-dent axes were taken. Feature detection to locate thecenters of the disks in the images was performed andthe resulting correspondences from unrotated to rotatedimages noted. This correspondence information wasthen fed to the nonlinear optimization code of (Stein,1993) which estimates the principal distance /, the lo-cation of the optic axis (cy, c\), as well as the radial

Table 1. Summary of FOE chip cam-era calibration parameters.

Imaging parameter Calibrated value

79.86 pixels31.33 pixels33.39 pixels

5.96E-5/pixel2

1.03E-8/pixel4

0.86'

K^Ki

distortion parameters (A'| , K^) and the axes of rotationused. Table 1 shows typical calibration parameters forthe FOE chip found using this method.

With these parameters, we can predict the locationof the FOE in the image plane. A series of experimentswas performed wherein the FOE was placed in a gridacross the image plane and 24 frames of raw imagedata were acquired during each of the associated mo-tion transients. Figure 6 shows the results of both thestationary-points algorithm and the depth-is-positive

60

50

40

S.a>x'a;o>,

30

20

10

"^o

^b

^

- • ^p

xp

XP

•xQ- • v

xP

" " dx

'1^ '

:XP

:^>

•":•»""

: -^

I ^...^

^...^...

vr^':»

"i»"*;»;

..-a.;.^

'?f'

.,^......^..,9 »« ;«

...^... ...y...

!? :»-

» s^

..«....;.»..

*i ^...^.....^..

..^...

<9 ':

9 :.......9 :.

» :

.. « ;.

« :

"^

^•

«

«

• • » •

»

»

...»

^

'*x

....^...,

d<

0;

• • • • » • • • • •

<?

^

9

^

• fc • •: x

.^....

df- ':

W :

.o<..;.....

^ :

<?< :

^ ':

-Pxl

•<5-• • ; • • • -X:

10 20 30 40x0 (pixels)

50 60

Figure 6. Algorithmic results using real image data. The actual FOE was strobed over the image plane; its position is indicated by 'x'. Theresults of processing by the stationary-points algorithm is shown as 'o' and the depth-is-positive algorithm by '+'.


algorithm. First centered differencing was used to es-timate the brightness gradients E^, E y , and E,. Ofcourse, because we have Mondrian images the regionswhere the brightness gradient V£ is nonzero naturallyoccur only on the boundary of the disks in the image. Infact, the majority of the image has gradients near zero,and in order to prevent these from strongly biasing thesolution, a threshold on the image brightness gradientE^ + E2 was used in practice to segment these out ofthe computation.

For estimating the location of the FOE using thestationary-points algorithm, the solution of Eq. (13)obtained by matrix inversion was utilized and for es-timation using the depth-is-positive algorithm, an iter-ative Newton's method was employed. The locationspredicted by the calibration are denoted by ' x' , whilethe mean locations of the FOE estimate over the imagesequences found using the stationary-points algorithmare shown as 'o' and the depth-is-positive algorithm areshown as '+'. Both techniques produce good resultsnear the optic axis. However, as the location of the FOEnears the image boundary, the error in the estimationincreases dramatically. A variety of effects come intoplay when the FOE is near the image boundary, typ-ically resulting in a deviation of the estimate towardscenter.

In the stationary-points algorithm, the simple con-stant threshold of the weighting function biases the so-lution to favor constraint lines due to stationary pointsnear the FOE. The purpose of this function is to selectpixels about each stationary point to contribute to thesolution and hence these pixels form bands about thestationary points in the image. In general, the overallrange of the time derivative E/ increases the furtheraway from the FOE a feature is. However, E, is still

zero when V£ • (r — i-o) = 0. Thus, E, must gothrough zero more rapidly for the more distant station-ary points. This effect is further exacerbated by theforshortening of image features away from the FOE.As a result, the cutoff function selects fewer points forinclusion in a band the further away from the FOE itscorresponding stationary point is. Hence, bands nearbythe FOE have more pixels contributing to the error sumthan bands further away, and thus are effectively givenmore weight.

Noise will alter the directions of the measured im-age gradient and this angular error results in a rotationof the constraint line. For a given angular error in thegradient, the further away from the FOE the stationarypoint is, the larger the resulting distance between theFOE and the constraint line. Hence, the constraint in-formation provided by distant stationary points is morenoise sensitive than that given by nearby ones, and sothe increased weighting of constraints from nearby sta-tionary points is in fact a desirable effect.

The most significant contribution to the observedbias in the stationary-points algorithm is due to thesebands. Each stationary point by itself should only pro-vide a 1 D constraint. However, inclusion of points in aband about a stationary point augments this constraint.Figure 7 shows the simple geometry of a band aboutthe stationary point.

Overall, we would ideally like each band to behaveas a single constraint line given by its stationary point;only distance perpendicular to this line would con-tribute to the error sum. However, each point in theband provides a constraint line and clearly the least-squares solution, when we neglect the contributions ofthe other bands in the image, falls inside the regionbounded by the band itself and the constraint lines

Band aroundStationary

Point

Figure 7. Simple geometry of a band about a stationary point.

270 McQuirk et al.

provided by the points at the band edges. In effect,each band not only penalizes perpendicular distanceto the effective constraint line as desired, but alsothe distance away from the stationary point along theconstraint line. In practice, the image data had bandsplaced evenly about the center of the image, and sothe attraction of the bands tended to cancel when theFOE was near the center. When the FOE is placed awayfrom the center, the bands are no longer evenly placedaround the FOE due to the finite extent of the imageand their combined attraction draws the solution to-wards the center.

The depth-is-positive algorithm also displays band-like structure. The original conception of this approachwas to rely on data away from the stationary points toform a solution in order to enhance robustness withrespect to noise and distant backgrounds. This turnsout to not be the case, as stationary points are indeedcrucial to the depth-is-positive algorithm as well. If weuse a test location rj, for the FOE which is differentthan the true location and observe for what r negativedepth values actually occur, we find that they clusterabout the stationary points in bands as shown in Fig. 8.As the test location approaches that of the actual FOE,the widths of these bands shrink about the stationarypoints. In practice, as the noise power in the imageincreases, so does the effective size of the bands andhence the overall bias.

We would like to make a quantitative comparisonof the performance of the two approaches in order tochoose one for implementation. From a circuit designstandpoint and given the architecture chosen for the

StationaryPoint

Negative Depth

Region

Figure 8. When a test FOE differing from the true FOE is used tocalculate T, the resulting negative depth values are clustered aroundthe stationary points.

FOE chip, the complexity in terms of transistor countfor the two methods turned out to be roughly the same,and thus was insufficient to choose one or the otherfor realization in hardware. Comparing the two algo-rithms solely using the data of Fig. 6 can be misleading.For the stationary-points algorithm, larger widths inthe weighting function lead to more data contributingto the overall solution and hence increased robustness.On the other hand, we have seen that the bands selectedaround the stationary points by the weighting functionwith larger widths lead to more bias. Thus the selec-tion of the width of the function embodies a tradeoffbetween robustness and accuracy. In order to examinethis tradeoff more quantitatively, synthetic images weregenerated which closely match the measured ones sothat we could explicitly corrupt the images E with ad-ditive white Gaussian noise n to get the resulting noisyimages E':

E ' = E + n (18)

We define the signal-to-noise ratio SNR as

( 2\

SNR=101ogn, ^ (19)ff'2/

where we have used the sample variance. The perfor-mance metric we constructed for purposes of com-parison penalizes both bias in the solution as well asdegradation due to noise. In practice, we summed thesquared error between the predicted location found bythe calibration technique and the mean location foundby each algorithm along with the noise variance in thesolution. Since the bias is spatially dependent, we av-eraged the results over the 81 locations in the imageplane of the FOE used in our experiments, resultingin an overall metric 8 intended to quantify algorithmperformance:

5- gTEDi'-o-'oii'+O (20)V I1'!))

For the stationary-points algorithm, S is obviouslya function of both the signal-to-noise ratio and theweighting function width r f . However, S shows amarked minimum with respect to i], and hence we canfind the optimal width to use in practice. Figure 9 shows8 for the depth-is-positive algorithm and the optimal 8for the stationary-points algorithm, both as a functionof SNR.


Figure 9. Algorithmic results using synthetic image data comparing the performance of the stationary-points algorithm ( x ) versus the depth-is-positive algorithm (+).

One important feature that one should note fromthese two curves is that S does not go to zero evenin the limit of noise-free image data. The reasonfor this is two-fold. Bias always remains, especiallywith the FOE near the image edge, even in the ab-sence of noise. Furthermore, the imaging and finite-differencing process itself results in a variation in theestimated image gradient directions during a motiontransient. The contribution due to this variation is al-ways present, and can be thought of as an equivalent"noise" source.

For large signal-to-noise ratios, the optimalstationary-point behavior is slightly better than depth-is-positive, while for small signal-to-noise ratios theconverse is true. Overall, the curves appear markedlysimilar, and do not in and of themselves provide adefinitive means for choosing one algorithm over theother for implementation. However, due to the flexi-bility in explicitly setting the accuracy versus noise ro-bustness tradeoff and because typical SNRs observedfrom the FOE chip were in the 40 dB range, thestationary-points algorithm was chosen for implem-entation.

4. The FOE Chip

Having chosen the stationary-points algorithm for im-plementation, one could design a system to calculatethe individual elements of the 2 x 2 linear system ofEq. (13) and solve for the location of the FOE off-chipby matrix inversion. This would require the on-chipcalculation of five complex quantities over the entireimage and this makes such an approach prohibitivelyexpensive. Instead, we can design a system to estimatethe location of the FOE using a feedback technique suchas gradient descent. By using this kind of approach, wecan trade off the complexity of the required circuitrywith the time required to perform the computation.

Given a convex error function f ( a ) of a parametervector a = ( a y , . . . , a^-})7, we can minimize / viathe set of differential equations:

^ = -^/(a),dtwhere f t is a positive definite matrix. This system ofdifferential equations will relax to a local minimum off ( a ) if one exists.

(21)

272 McQuirk et al.

The stationary-points minimization sum ofEq. (12)is convex. In fact, unless all the V£ terms with nonva-nishing W are exactly parallel, then the sum is strictlyconvex and hence cannot have any local minimum otherthan the global minimum. Applying gradient descentto our particular problem results in:

-ro=|3^W(E,,^1)VEVET(r-ro) (22)dt re/

To implement this system, one could use the ap-proach of (Tanner and Mead, 1986) and design a pixel-parallel chip consisting of an n x n array of analogprocessors. With a photo-transistor as the imaging de-vice, each processor would estimate the brightness gra-dient in time using a differentiator, and the brightnessgradient in space using finite differencing with adja-cent pixel processors. Based on these measured imagegradients, the processor at position (x , y ) in the arraywould calculate two currents proportional to the terminside the summation of Eq. (22). Each processor theninjects these currents into global busses for the volt-ages xo and yo, respectively, thereby accomplishingthe required summation over the entire image. For acapacitor, the derivative of the voltage is proportionalto the injected current, so if we terminate the busseswith capacitances Cy and C\, we naturally implementEq.(22).

The major difficulty with this elegant solution is thatof area. The output currents that the processors cal-culate require four multiplies in addition to the cut-off weighting function. Including all of this circuitryin addition to the photo-transistor creates a very largepixel processor size and, given the constraints of lim-ited silicon area, the number of pixels that we wouldbe able to put on a single chip would be quite smallas a result. The actual number of pixels contributingto our computation is already small to begin with be-cause the number of stationary points in the image istypically only a fraction of the total number of pixelsin the image. A large number of pixels is clearly de-sirable to enhance the robustness of the computation.Additionally, a fully parallel implementation would beinefficient, again because only a small number of pro-cessors would be contributing at any one time, with therest idle.

To increase the number of pixels and make more effi-cient use of area, the solution that was decided upon wasto multiplex the system using a column-parallel pro-cessing scheme. Instead of computing the full frame

of terms in our summation in parallel, we calculate acolumn of them at a time, and process the column sumssequentially. Of course, we can no longer use the sim-ple time derivative in the right-hand side of Eq. (22).We can instead use a forward difference approximation,resulting in a simple proportional feedback system:

i-o' = i-o" + h W(E,, VEVZ^r - r,',0),re/

where h is the feedback gain. This system is nowa discrete-time analog system as opposed to thecontinuous-time analog system discussed earlier. Thisimplementation method will allow us to put more pixelson the chip at the expense of taking longer for the itera-tion to converge to the desired solution. It is interestingto note that if we had implemented the depth-is-positivealgorithm instead of the stationary-points algorithm,then the expression in the sum would merely replaceW ( E , , T]} with u(E,VET(r - i-o)).

For the stationary-points algorithm, the equation thatour system should solve is the 2 x 2 matrix problem:

Ai-o = b (23)

where we have defined

A=^W(E,,r1)VEVET, (24)re/

b=^W(E,,r1)\'EVETr. (25)re/

We can rewrite our solution method into the followingform:

r^^+^-Ar^. (26)

This is the Richardson method, the simplest iterativetechnique for solving a matrix equation (Varga, 1962).The transient solution to this equation is:

r;," = A~-lh+(! -hA)' e,, (27)

where A~^b is the desired solution and Co is the initialerror. Clearly, in order for this system to be stable, werequire that the error iterates go to zero. We must there-fore guarantee that the spectral radius of the iterationmatrix is less than unity. Examining the eigenvalues X'of the iteration matrix we find that they are related tothe eigenvalues X of the matrix A by

).' = 1 - h X . (28)


Since A is symmetric and positive semidefinite (typi-cally definite in practice), we know that the eigenvaluesof A are real and positive. Requiring the spectral radiusof the iteration matrix to be less than unity results inthe following requirement on h for stability

0<h < (29)

Additionally, we can choose the optimal h to mini-mize the convergence time of the iteration. This hopisolves:

hopt = argmin(max(|l - h^mu 1-^maxD)h

^min+^max ~ Ere/ W(E" II v£ II2

At a minimum, we therefore require our system tocalculate three quantities: the matrix residual b — Ai-p",the weighted squared image gradient ^re/ ^(Ei, r])||V£||2, and a fourth quantity, ^re/ ^'1' useful inpractice for setting the width n] of the weighting func-tion (McQuirk, 1991).

The approach that was decided upon to implementthe discrete time system we have described uses charge-coupled devices (CCDs) as image sensors. If we ex-pose a CCD to light over a short period of time, it storesup a charge packet which is linearly proportional to theincident light during this integration time. Arrays ofCCDs can be manipulated as analog shift registers aswell as imaging devices. This allows us to easily mul-tiplex a system which uses CCDs. Since we intend toprocess image data in the voltage/current domain, wemust convert the image charge to voltage and this canbe done nondestructively through a floating gate am-plifier. Thus, we can shift our image data out of a CCDarray column-serial and perform our calculations onecolumn at a time. Instead of n2 computational elementscorresponding to the parallelism of a continuous-timesystem, we now only have n. Clearly, we can increaseour pixel resolution significantly and design more ro-bust circuitry to perform the computations as a result.

The system architecture used in the FOE chip isshown in Fig. 10. (More explicit circuit data can befound in (McQuirk, 1997), and a complete discussionof FOE circuit and system performance can be found in

TimingInput

f t

kit»ff

Cfl•a•caa3

^$3ru

tSerialOutput

•<-»••<-»•

• »• -»-•«-»•-*-»••<«-»••VHr-<-»•

•**•

•»»••

•<-»•

•«-»•

«*»•

————————t————————

Interline CCD

Imager with Storage—I

f

eSi

0.

§a^

a0E•5

p•4

»

Array ofAnalogSignalProcessors

t ^

^—«i—•«—•«—^—•*—«•—

<•—•«—

«i—•^—•^—

^^

•«—•^—•^—

t

i."8'a^•§o

ColumnFOEEstimate

ColumnOutputs

Figure 10. Block diagram representation of the system architecture of the actual FOE chip.

274 McQuirk et al.

(McQuirk, 1996b) which is also available as (McQuirk,1996a).) It is composed of four main sections: theCCD imager with storage and an input/output serialshift register, the array of floating gate amplifiers fortransducing image charge to voltage, the CMOS arrayof analog signal processors for computing the requiredcolumn sums, and the position encoder providing (x, y )encoding in voltage to the CMOS array as data is pro-cessed.

The input/output CCD shift register at the left side ofthe block diagram allows us to disable the imager, andinsert off-chip data into the computation. This shift reg-ister can also clock data out of the CCD imager, lettingus see the images that the system is computing with.Thus we have four basic testing modes: (i) computersimulated algorithm on synthetic data, (ii) computersimulated algorithm on raw image data taken from theimager, (iii) chip processing of synthetic data inputfrom off-chip, and (iv) chip processing of raw imagedata acquired in the on-chip imager. With these fourtesting modes, we can separately evaluate algorithmperformance and system performance.

The function of the interline CCD imager with stor-age is to acquire the two images in time necessary toestimate the brightness gradients. The imager is an in-terline structure which uses exposed photogates to in-tegrate image charge and covered shift registers to bothstore the resulting images and shift them out. Once twoimages have been acquired successively in time, weshift them to the right one column at a time to an array

of floating gate amplifiers. The floating gate amplifierstransduce the image charges into voltages which arethen applied to the analog signal processors. As in-put, these processors also require the current estimatein voltage of the location of the FOE driven in from

(') (0off-chip, r^ ), and the present r = (x, y )(x,position of the data, provided in voltages by the posi-tion encoder at the far right of the diagram.

The encoder uses the voltage on a resistive chain toencode the y position up the array. A CMOS digitalshift register is utilized to select the appropriate x valueover time as columns are processed. A logic 1 is suc-cessively shifted up the shift register as each columnis processed, enabling a pass transistor which sets x tothe value of voltage on the resistor chain at that stage.In this manner, x increases in the stair-step fashion nec-essary as the columns of data are shifted through thesystem.

From the image data, the pixel position, and the FOEestimate, the processors in the array compute the fourdesired output currents which are summed up the col-umn in current and sent off-chip. The block diagramfor the analog processors is shown in Fig. 1 1 .

To estimate the three brightness gradients, eight in-put voltages representing the 2 x 2 x 2 cube of pixelsneeded for the centered differencing are input to theprocessor. Four MOS source-coupled pairs are usedto transduce these voltages into differential currentswhich are then added and subtracted using current mir-roring to form the brightness gradients. An absolute

PositionFOE From

Estimate Encoder , ErrorIE,| 11VEII CurrentBus Bus Bus

FromFloating-GateSense

FormDerivs

^AbsoluteValue

H,

^^

IE^

WeightFunction

——• i

\2 Mults

andSum

Squarer

^4

W(E^+E^)

1

y1

-/

2 Mults

\

Sx-S/

^4

' \

Figure 11. Block diagram indicating the structure of analog row processor in the CMOS processing array.


value circuit computes | E, | and a copy of this signal isthen summed up in current along with all the contribu-tions of the other processors in the array. The resultingoverall current forms the first main output of the chip,E.l^l.

Another copy of | E, \ is subtracted from a referencecurrent /, and injected into a single-ended latch. Theresult of this latch is the weighting function decision.In-line with the brightness gradient currents £,., Eyare pass-gate switches whose state is controlled by theweighting decision. If the processor is not to contributeto the error sum because | E, > i], these gates are turnedoff by the output of the latch preventing signal flow tothe rest of the processor and thus enforcing the weight-ing decision. The weighted brightness gradients arecopied using current mirrors three times—the first isused for the pair of current-mode squarers needed tocompute the squared gradient magnitude. This signal issummed up in current along with all the contributionsof the other processors in the array and the resultingoverall current forms the second main output of thechip,E,,W(£,,^)||V£||2.

The second gradient copy is used by the first layerof multipliers to compute W(E,, r?)V£ • (r — i-o). Themultipliers used on the FOE chip are all simple MOSversions of the standard four-quadrant Gilbert bipo-lar multipliers (Gilbert, 1968). These multipliers haveas input both a differential voltage and a differentialcurrent. The output is a differential current which ap-proximates a multiplication of the two inputs. The dotproduct resulting from the first layer of multipliers istransduced from differential current to differential volt-age using an MOS circuit with devices operating in thetriode region. This signal is then used as the voltageinput to a second layer of multipliers whose other in-put is the third gradient copy. The resulting two finaldifferential currents are the contribution to the matrixequation residual for that processor. They are summedin current along with the contributions from the otherprocessors in the array, and form the final two outputsfrom the chip.

To complete the iterative feedback loop, we sum theoutput currents from the column of analog processorsas the image data is shifted out a column at a timefrom the imager. Once a whole frame of data has beenaccumulated, we use the residual to update the FOEestimate using the proportional feedback loop. Whilethis certainly could be done on the chip, this was doneoff-chip in DSP for testing flexibility. Due to the dif-ficulty in re-circulating image data on-chip, we further

acquire new image pairs for each successive iteration ofthe feedback loop. Since the locations of the stationarypoints in the image move as the camera translates, us-ing new image pairs at each iteration can be viewed asproducing additional noise for the algorithm to handle.We could alleviate this problem by moving the imageroff-chip and adding a frame buffer, but the originalarchitectural goal was a single-chip system.

A series of experiments were performed in whichthe FOE was placed in a grid across the image plane.For testing purposes, image data corresponding to realmotion was generated using a flexible fiber-optic imageguide. One end of the image guide was held fixed overthe chip, while the other end was attached to a carrierwhose position and velocity on a linear track was con-trolled through a DC motor system. The orientation ofthe viewing direction of the image guide relative to themotion direction of the carrier was precisely set usingtwo rotation stages, allowing for the positioning of theFOE anywhere in the image plane. Figure 12 showsthe final results from the FOE chip comparing the meanoutput of the proportional feedback loop enclosing thechip (shown as '*') with the results of the algorithm onthe raw image data (shown as 'o').

5. Summary

This paper discussed the application of integrated ana-log focal plane processing to realize a real-time systemfor estimating the direction of camera motion. The fo-cus of expansion is the intersection of the camera trans-lation vector with the image plane and captures thismotion information. Knowing the direction of cameratranslation clearly has obvious import for the control ofautonomous vehicles, or in any situation where the rela-tive motion is unknown. The mathematical frameworkfor our approach, resulting in a simplified brightnesschange constraint equation, was developed. Severalpromising algorithms for estimating the FOE based onthis constraint and suitable for analog VLSI were dis-cussed, including the one chosen for final implementa-tion. A special-purpose VLSI chip with an embeddedCCD imager and column-parallel analog signal pro-cessing was constructed to realize the desired algo-rithm. The difference between the output of the FOEchip enclosed in a simple proportional feedback loopand the location predicted by the stationary-points al-gorithm operating on raw image data was less than3% full scale. Table 2 summarizes the performance

276 McQuirk et al.

Table 2. Summary of FOE chip and system performance parameters.

Process

Chip dimensions

Imager topology

Technology

Illumination

Image sensor

Charge packet size

Acquisition time

Quantum efficiency

Dark current

Transfer inefficiency

FGA output sensitivity

System frame rate

Peak on-chip power dissipation

System settling timeFOE location inaccuracy

Orbit n-well 2 /xm CCD/BiCMOS

9200 (im x 7900 m

64 x 64 interline CCD

Double-poly buried channel CCD

Front-face

CCD gate

600,000 e-

Tested to 1 ms

30% @ 637 nm

<10nA/cm2 @ 30°C

<7 x 10 ' 500 kHz

2^(V/e-

30 frames/s

170mW

20-30 iterations typical<.3% full scale over 80% of the field of view

w

\f\J

40

IAqn^

20

10

n

9

NO

»

9s

OK

OK

.......^...

o^

QX

'. ': ': ': ':f, « ; « :* *'•: cx ^ ^ ;NP «'• « :* <x * 8 S ':

3p » : S :8 S : » S 9 :

•^ "•^"••^••"^' • •« " ; • • ^ • • • • ^ • • • • • ^ • • • f " -

« «: « :« « : « i(? NP :

« S ^ ^ f t ^ S ? ; ^ ^ ^ ;

OIS 36: (y : (3K ^ : « ^ 9I0 :

c* ^ $ \^ ^\ ^ ^ *0 \..^....^.....^....^....^.^...^....^...^...^...

10 20 30 40 50 60x0 (pixels)

Figure 12. Comparison of the results from the stationary-points algorithm using raw image data (shown as 'o') and the output from the FOEchip (shown as '*').


achieved by the FOE chip in conjunction with the testsystem. A detailed discussion of FOE circuit and sys-tem performance can be found in (McQuirk, 1996b)which is also available as (McQuirk, 1996a).

Acknowledgments

The authors would like to thank Chris Umminger andSteve Decker for their many helpful comments and crit-icisms.

References

Bruss, A. and Horn, B. 1983. Passive navigation. Computer Vision,Graphics, and Image Processing, 21(1):3-20.

Dron, L. 1993. The multi-scale veto model: A two-stage analog net-work for edge detection and image reconstruction. InternationalJournal of Computer Vision, 11(1 ):45-61.

Fleming, W. 1965. Functions of Several Variables. Addison Wesley:Reading, MA.

Gilbert, B. 1968. A precise four-quadrant multiplier with sub-nanosecond response. IEEE Journal of Solid-State Circuits, SC-3(4):365-373.

Hakkarainen, J. and Lee, H. 1993. A 40 x 40 CCD/CMOS absolute-value-of-difference processor for use in a stereo vision system.IEEE Journal of Solid-State Circuits, 28(7):799-807.

Horn, B. 1986. Robot Vision. MIT Press: Cambridge, MA.Horn, B. 1990. Parallel networks for machine vision. In Artificial

Intelligence at MIT: Expanding Frontiers, P. Winston and S.A.Shellard (Eds.), MIT Press: Cambridge, MA. vol. 2, chap. 43,pp.530-573.

Horn, B. and Schunk, B. 1981. Determining optical flow. ArtificialIntelligence, 16(1-3):185-203.

Horn, B. and Weldon, E. 1988. Direct methods for recovering mo-tion. International Journal of Computer Vision. 2(1):51-76.

Jain, R. 1983. Direct computation of the focus of expansion.IEEE Transactions on Pattern Analysis and Machine Intelligence,21(1).

Keast, C. and Sodini, C. 1993. A CCD/CMOS-based imager withintegrated focal plane signal processing. IEEE Journal of Solid-State Circuits, 28(4):431^37.

McQuirk, I. 1991. Direct methods for estimating the focus of ex-pansion. Master's Thesis, Massachusetts Institute of Technology,Cambridge, MA.

McQuirk, I. 1996a. An analog VLSI chip for estimating the focus ofexpansion. AI Technical Report 1577, The MIT Artificial Intelli-gence Laboratory. Cambridge, MA.

McQuirk, I. 1996b. An analog VLSI chip for estimating the focus ofexpansion. Ph.D Thesis, Massachusetts Institute of Technology,Cambridge, MA.

McQuirk, I. 1997. An Analog VLSI Chip for Estimating the Focus ofExpansion. In 1997 ISSCC Digest of Technical Papers, pp. 24-25.

Negahdaripour, S. and Horn, B. 1986. Direct passive navigation:Analytical solution for planes. In Proceedings of the IEEE Inter-national Conference on Robotics and Automation, Washington,DC, IEEE Computer Society Press.

Negahdaripour, S. and Horn, B. 1987a. Direct passive navigation.IEEE Transactions on Pattern Analysis and Machine Intelligence,9(1): 168-176.

Negahdaripour, S. and Horn, B. 1987b. Using depth-is-positive con-straint to recover translational motion. In Proceedings of the Work-shop on Computer Vision, Miami Beach, EL.

Standley, D. 1991. An object position and orientation 1C with embed-ded imager. IEEE Journal of Solid-State Circuits, 26(12): 1853-1859.

Stein, G.P. 1993. Internal camera calibration using rotation and ge-ometric shapes. Master's Thesis, Massachusetts Institute of Tech-nology, Cambridge, MA.

Tanner, J. and Mead, C. 1986. An integrated optical motion sensor.In VLSI Signal Processing I I , Proceedings of the ASSP Conferenceon VLSI Signal Processing, University of California, Los Angeles,pp.59-76.

Umminger, C. and Sodini, C. 1995. An integrated analog sensorfor automatic alignment. IEEE Journal of Solid-State Circuits,30(12):1382-1390.

Varga, R. 1962. Matrix Iterative Analysis. Prentice-Hall: EnglewoodCliffs, NJ.

Yang, W. and Chiang, A. 1990. A full fill-factor CCD imager withintegrated signal processors. In 1990 ISSCC Digest of TechnicalPapers, pp. 2\8-2l9.

estimating the focus of expansion in analog vlsithese chips can exploit the inherent parallelism...

Documents