error characteristics of parallel-perspective stereo mosaicszhu/zhuvideoreg2001.pdfpeleg &...

IEEE Workshop on Video Registration (with ICCV’01), Vancouver, Canada, July 13, 2001

1

Error Characteristics of Parallel-Perspective Stereo Mosaics

Zhigang Zhu, Allen R. Hanson, Howard Schultz, Edward M. RisemanDepartment of Computer Science, University of Massachusetts at Amherst, MA 01003

E-mail: [email protected]

Abstract

This paper analyzes different aspects of the errorcharacteristics of parallel-perspective stereo mosaicsgenerated from an airborne video camera movingthrough a complex three-dimensional scene. First, weshow that theoretically a stereo pair of parallel-perspective mosaics is a good representation for anextended scene, and the adaptive baseline inherent to thegeometry permits depth accuracy independent of absolutedepth. Second, in practice, we have proposed a 3Dmosaicing technique PRISM (parallel-ray interpolationfor stereo mosaicing) that uses interframe match tointerpolate the camera position between the originalexposure centers of video frames taken at discrete spatialsteps. By analyzing the errors introduced by a 2Dmosaicing method, we explain why the "3D mosaicing"solution is important to the problem of generating smoothand accurate mosaics while preserving stereoscopicinformation. We further examine whether this rayinterpolation step introduces extra errors in depthrecover from stereo mosaics by comparing to the typicalperspective stereo formulation. Third, the errorcharacteristics of parallel stereo mosaics from cameraswith different configurations of focal lengths and imageresolutions are analyzed. Results for mosaic constructionfrom aerial video data of real scenes are shown and for3D reconstruction from these mosaics are given. Weconclude that (1) stereo mosaics generated with thePRISM method have significantly less errors in 3Drecovery (even if not depth independent) due to theadaptive baseline geometry; and (2) longer focal length isbetter since stereo matching becomes more accurate.

1. Introduction

There have been attempts in a variety of applications toadd 3D information into an image-based mosaicrepresentation. Creating stereo mosaics from two rotatingcameras was proposed by Huang & Hung [1], and from asingle off-center rotating camera by Ishiguro, et al [2],Peleg & Ben-Ezra [3], and Shum & Szeliski [4]. In thesekinds of stereo mosaics, however, the viewpoint --therefore the parallax -- is limited to images taken from avery small area. Recently our work [5,6,7] has been

focused on parallel-perspective stereo mosaics from adominantly translating camera, which is the typicalprevalent sensor motion during aerial surveys. A rotatingcamera can be easily controlled to achieve the desiredmotion. On the contrary, the translation of a camera over alarge distance is much hard to control in real visionapplications such as robot navigation [8] andenvironmental monitoring [6, 9]. We have previouslyshown [5-7] that image mosaicing from a translatingcamera raises a set of different problems from that ofcircular projections of a rotating camera. These includesuitable mosaic representations, the generation of aseamless image mosaic under a rather general motion withmotion parallax, and epipolar geometry associated withmultiple viewpoint geometry.

In this paper we will give a thorough analysis on variousaspects of the error characteristics of 3D reconstructionfrom parallel-perspective stereo mosaics generated fromreal video sequences. It has been shown independently byChai and Shum [10] and by Zhu, et al [5,6] that parallel-perspective is superior to both the conventionalperspective stereo and the recently developed multi-perspective stereo for 3D reconstruction, in that theadaptive baseline inherent to the parallel-perspectivegeometry permits depth accuracy independent of absolutedepth. However, this conclusion is obtained in an idealcase – i.e. enough samples of parallel projection rays froma “virtual camera” with ideal 1D or 2D motion can begenerated from a complete scene model. In the practice ofstereo mosaicing from a real video sequence, however, weneed to consider the errors in the final mosaics versuscamera motion types, frame rates, focal lengths, and scenedepths. The analysis of the error characteristics of 3Dconstruction from real stereo mosaics will be the focus ofthis paper.

First we will show why an efficient “3D mosaicing”techniques are important for accurate 3D reconstructionfrom stereo mosaics. Obviously use of standard 2Dmosaicing techniques based on 2D image transformationssuch as a manifold projection [11] cannot generate aseamless mosaic in the presence of large motion parallax,particularly in the case of surfaces that are highly irregularor with large different heights. Moreover, perspectivedistortion causing the geometric seams will introduceerrors in 3D reconstruction using the parallel-perspectivegeometry of stereo mosaics. In generating image mosaics


2

with parallax, several techniques have been proposed toexplicitly estimate the camera motion and residualparallax [9,12,13]. These approaches, however, arecomputationally intense, and since a final mosaic isrepresented in a reference perspective view, there couldbe serious occlusion problems due to large viewpointdifferences between a single reference view and the rest ofthe views in the image sequence.

We have proposed a novel “3D mosaicing” techniquecalled PRISM (parallel ray interpolation for stereomosaicing) [7] to efficiently convert the sequence ofperspectiveimages with 6 DOF motion into the parallel-perspective stereo mosaics. In the PRISM approach,global image rectification eliminates rotation effects,followed by a fine local transformation that accounts forthe interframe motion parallax due to 3D structure of thescene, resulting in a stereo pair of mosaics that embodies3D information of the scene with optimal baseline. Thispaper further examines (1) whether the PRISM process ofimage rectification followed by ray interpolationintroduces extra errors in the following step of depthrecovery; and (2) whether the final disparity equation ofthe stereo mosaics really means that the depth recoveryaccuracy is independent of the focal length and absolutedepths. To show the advantages of the stereo mosaics,depth recovery accuracy is analyzed and compared to thetypical perspective stereo formulation. Results for mosaicconstruction from aerial video data of real scenes areshown and for 3D reconstruction from these mosaics aregiven in the paper. Several important conclusions forgenerating and using stereo mosaics will be made basedon our theoretical and experimental analysis.

2. Parallel-Perspective Stereo Geometry

Fig. 1 illustrates the basic idea of the parallel-perspectivestereo mosaics. Let us first assume the motion of a camerais an ideal 1D translation, the optical axis is perpendicularto the motion, and the frames are dense enough. We cangenerate two spatio-temporal images by extracting twocolumns of pixels (perpendicular to the motion) at thefront and rear edges of each frame in motion. The mosaicimages thus generated are similar toparallel-perspectiveimages captured by a linear pushbroom camera [14],which has perspective projection in the directionperpendicular to the motion and parallel projection in themotion direction. In contrast to the common pushbroomaerial image, these mosaics are obtained from twodifferent oblique viewing angles of a single camera’s fieldof view, one set of rays looking forward and the other setof rays looking backward, so that a stereo pair of left andright mosaics can be generated as the sensor movesforward, capturing the inherent 3D information.

Fig. 1. Parallel-perspective stereo geometry. Bothmosaics are built on the fixation plane, but their unit is inpixel – each pixel represents H/F world distances.

2.1. Parallel-perspective stereo modelWithout loss of generality, we assume that two vertical 1-column slit windows havedy/2 offsets to the left and rightof the center of the image respectively (Fig. 1). The "lefteye" view (left mosaic) is generated from the front slitwindow, while the "right eye" view (right mosaic) isgenerated from the rear slit window. Theparallel-perspective projection modelof the stereo mosaics thusgenerated can be represented by the following equations[6]

xl = xr = F X/Z

yr = FY /H + (Z/H-1) dy/2 (1)yl = FY /H - (Z/H-1) dy/2

whereF is the focal length of the camera,H is the heightof a fixation plane (e.g., average height of the terrain).Eq.(1) gives the relation between a pair of 2D points (onefrom each mosaic), (xl,yl) and (xr,yr), and thecorresponding 3D point (X,Y,Z). It serves a functionsimilar to the classical pin-hole perspective camera model.A generalized model under 3D translation is given in [7].The depth can be computed as (from Eq. (1) )

)1(yy

y

d

yH

d

bHZ

∆+== (2)

where

by = dy+∆ y = FBy/H (3)

is the "scaled" version of the baselineBy, ∆ y = yr - yl isthe "mosaic displacement"1 in the stereo mosaics.Displacement∆y is a function of the depth variation of thescene around the fixation planeH. Since a fixed anglebetween the two viewing rays is selected for generatingthe stereo mosaics, the "disparities" (dy) of all points arefixed; instead a geometry of optimal/adaptive baselines(by) for all the points is created. In other words, for any

1 We use “displacement” instead of “disparity” since it is related to thebaseline in a two view-perspective stereo system.

dy

Yl

dy

Yr

ZrZl

Tyl

F

By

Y

Z

O

Tyr

H

(X,Y,Z)o

yl H/F

right mosaic

left mosaic

3D scene

fixation plane yr H/F


3

point in the left mosaic, searching for the match point inthe right mosaic means finding an original frame in whichthis match pair has a pre-defined disparity (by the distanceof the two slit windows) and hence has an adaptivebaseline depending on the depth of the point (Fig. 1).

2.2. Depth resolution of stereo mosaicsIn a pair of parallel-perspective stereo mosaics, depth isproportional to the image displacementy∆ (Eq.(2)). Since

y∆ is measured in discrete images, we assume that the

image localization resolution is y∂ pixels (usually

y∂ <=1) in the stereo mosaics, so thaty∆ =0,± y∂ ,

± 2 y∂ , …. The depth resolution in the parallel-

perspective stereo is a constant value(Fig. 2)

constant=∂=∂ yd

HZ

y

(4)

which is a contrast to the two-view perspective stereowhere the depth error of a point is proportional to thesquare of the depth (Eq. 9a-6) in Appendix).

(a) parallel stereo (b) perspective stereo

Fig. 2 Depth resolution of stereo mosaics

Ideally, for parallel-perspective stereo, depth resolution isindependent of absolute depths of scene points and of thefocal length of the camera used to generate the stereomosaics. In addition, the image resolutions in the ydirection are the same no matter how far scene points are.The reason is that due to the parallel projection in the ydirection, parallel rays intersect in the 3D scene pointsinstead of converging rays (Fig. 2).

3. Stereo Mosaicing from Real Video

In the PRISM approach for large-scale 3D scene modelingfrom real video, the computation of "match" is efficientlydistributed in three steps: camera pose estimation, imagemosaicing and 3D reconstruction. In estimating cameraposes (for image rectification), only sparse tie pointswidely distributed in the two images are needed. Ingenerating stereo mosaics, matches are only performed forparallel-perspective rays between small overlappingregions of successive frames. In using stereo mosaics for

3D recovery, matches are only carried out between thetwo final mosaics. This section gives a brief summary ofthe techniques in the three steps, as the base for the erroranalysis in the following section. Algorithms anddiscussions in detail can be found in [6,7].

3.1. Image rectificationThe stereo mosaicing mechanism can be generalized tothe case of 3D translation if the 3D curved motion trackhas a dominant translational motion for generating aparallel projection in that direction [7]. Under 3Dtranslation, seamless stereo mosaics can be generated inthe same way as in the case of 1D translation. The onlydifference is that viewpoints of the mosaics form a 3Dcurve instead of a 1D straight line. Further, the motion ofthe camera can be generalized to a 6 DOF motion withsome reasonable constraints on the values and rates ofchanges of motion parameters of a camera [6,7] (Fig. 3a),which are satisfied by a sensor mounted in a light aircraftwith normal turbulence. There are two steps necessary togenerate a rectified image sequence that exhibits only 3Dtranslation, from which we can generate seamlessmosaics:

1) Camera orientation estimation. Assuming an internallypre-calibrated camera, the extrinsic camera parameters(camera orientations) can be determined from our aerialinstrumentation system (GPS, INS and a laserprofiler)[15] and a bundle adjustment technique [16]. Thedetail is out the scope of this paper, but the main pointhere is that we do not need to carry out dense matchbetween two successive frames. Instead only sparse tiepoints widely distributed in the two images are needed toestimate the camera orientations.

Fig. 3. Image rectification. (a) Original and (b) rectified image sequence.

2) Image rectification. A 2D projective transformation isapplied to each frame in order to eliminate the rotationalcomponents(Fig. 3b). In fact we only need to do this kindof transformation on two narrow slices in each frame thatwill contribute incrementally to each of the stereomosaics. The 3D motion track formed by the viewpointsof the moving camera will have a dominant motiondirection (Y) that is perpendicular to the optical axis of the"rectified" images.

Dominant motion direction Y3D path of the camera: 3D rotation + 3D translation

Original image frames

(a)

Dominant motion direction Y

Rectified image frames

Path of the camera: 3D translation

(b) X

ZY

Z2

Multiple viewpoints

∂Z2=∂Z1

∂Z1

Z1

∂Y

Z2

Two viewpoints

∂Z2>∂Z1

∂Z1

Z1

Ol Or


4

3.2. Ray interpolationHow can we generate seamless mosaic from video of a

translating camera in a computational effective way? Thekey to our approach lies in the parallel-perspectiverepresentation and an interframe ray interpolationapproach. For each of the left and right mosaics, we onlyneed to take a front (or rear) slice of a certain width(determined by interframe motion) from each frame, andperform local registration between the overlapping slicesof successive frames (Fig. 4), then generate parallelinterpolated raysbetween two known discrete perspectiveviews for the left (or right) mosaic.

Since we will use the mathematical model of the rayinterpolation in the following error analysis, let usexamine this idea more rigorously in the case of 2Dtranslation after image rectification when the translationalcomponents in the Z direction is small [6]. We take theleft mosaic as an example (Fig. 4). First we define thecentral column of the front (or rear) mosaicing slice ineach frame as afixed line, which has been determined bythe camera's location of each frame and the pre-selectionof the front (or rear) slice window (Fig. 4, Fig. 5). Aninterpretation plane (IP) of the fixed line is a planepassing through the nodal point and the fixed line. By thedefinition of parallel-perspective stereo mosaics, the IPsof fixed lines for the left (or right) mosaic are parallel toeach other. Suppose that (Sx, Sy) is the translational vectorof the camera between the previous (1st) frame ofviewpoint (Tx, Ty) and the current (2nd) frame of viewpoint (Tx+Sx, Ty+Sy) (Fig. 4). We need to interpolateparallel rays between the twofixed linesof the 1st and the2nd frames. For each point (xl, y1)(to the right of the 1st

fixed line y0=dy/2) in frame(Tx, Ty), which will contributeto the left mosaic, we can find a corresponding point (x2,y2) (to the left of the 2nd fixed line) in frame (Tx+Sx,Ty+Sy). We assume that (x1, yl) and (x2, y2) are representedin their own frame coordinate systems, and intersect at a3D point (X,Y,Z). Then the parallel reprojected viewpoint(Txi, Tyi) of the correspondence pair can be computed as

)(,)2/(

21

1yyi

y

xxxiy

yyyi TT

S

STTS

yy

dyTT −+=

−−

+= (5)

where Tyi is calculated in a synthetic IP that passesthrough the point (X,Y,Z) and is parallel to the IPs of thefixed lines of the first and second frames, andTxi iscalculated in a way that all the viewpoints between (Tx,Ty)and (Tx+Sx, Ty+Sy) lie in a straight line. Note that Eq. (6)also holds for the two fixed lines such that wheny1 = dy/2(the first fixed line), we have (Txi, Tyi)=(Tx, Ty), and wheny2 = dy/2 (the second fixed line), we have (Txi, Tyi)=(Tx+Sx,Ty+Sy). We assume that normally the interframe motion islarge enough to havey1-1 ÿ dy/2ÿy2+1. A super denseimage sequence could generate a pair of stereo mosaics

with super-resolution, but this will not be discussed in thispaper.

Fig. 4. View interpolation by ray re-projection

The reprojected ray of the point(X,Y,Z) from theinterpolated viewpoint (Txi, Tyi) is

]2

),2

([),( 11yy

y

xii

ddy

S

Sxyx −−= (6)

and the mosaicing coordinates of this point is

]2

),2

([),( 11y

yiy

y

xximm

dt

dy

S

Sxtyx +−−+= (7)

where

txi=F Txi / H , tyi=F Tyi / H. (8)

are the "scaled" translational components of theinterpolated view. Note that the interpolated rays are alsoparallel-perspective, with perspective in thex directionand parallel in they direction.

3.3. 3D reconstruction from stereo mosaicsIn the general case, the viewpoints of both left and rightmosaics will be on the same smooth 3D motion track.Therefore the corresponding point in the right mosaic ofany point in the left mosaic will be on an epipolar curvedetermined by the coordinates of the left point and the 3Dmotion track. We have derived the epipolar geometry ofthe stereo mosaics generated from a rectified imagesequence exhibiting 3D translation with they componentdominant [7]. Under 2D translation (Tx,Ty), thecorresponding point (xr,yr) in the right-view mosaic of anypoint (xl,yl) in the left-view mosaic will be constrained toanepipolar curve

ylx dy

yyybx

+∆∆∆=∆ ),( , (9)

lr xxx −=∆ , lr yyy −=∆where )]()([),( lxlylxllx ytydytyyb −∆++=∆ is the

baseline function ofyl and∆y, and txl(yl) is the “scaled”xtranslational component (as in Eq. (3) or (8)) of theoriginal frames corresponding to columnyl in the left

(Txi,Tyi)

(xi, yi)

IP ofinterpolatedfixed line

(Tx+Sx, Ty+Sy)

(x2, y2)

IP of 2nd

fixed line

(Tx, Ty)

(x1, y1)

IP of 1st

fixed line

(X,Y,Z)

1st fixed liney0= dy/2

2nd fixed line


5

mosaic. Hence x∆ is a nonlinear function of positionyl aswell as displacementy∆ , which is quite different from the

epipolar geometry of a two-view perspective stereo. Thereason is that image columns of differentyl in parallel-perspective mosaics are projected from differentviewpoints. In the ideal case where the viewpoints ofstereo mosaics form a 1D straight line, the epipolar curveswill turn out to be horizontal lines.

The depth maps of stereo mosaics were obtained by usingthe Terrest system designed for perspective stereomatch[17] without modification. The Terrest system wasdesigned to account the illumination differences andperspective distortion of stereo images with largelyseparated views by using normalized correlation andmulti-resolution un-warping. Further work is needed toapply the epipolar curve constraints into the search ofcorrespondence points in the Terrest to speedup the matchprocess. Currently we perform matches with 2D searchregions estimated from the motion track and the maximumdepth variations of a scene.

In parallel-perspective stereo mosaics, since a fixed anglebetween the two sets of viewing rays is selected, thedisparities of all points are pre-selected (by mosaicing)and fixed; instead the geometry of optimal/adaptivebaselines for all the points is created. From the parallel-perspective stereo geometry, the depth accuracy isindependent to the depth of a point and the imageresolution. However, there are two classes of issues thatneed to be carefully studied in stereo mosaics from realvideo sequences. First, 3D recovery from stereo mosaicsneed a two-step matches, i.e., interframe matching (andray interpolation) to generate the mosaics, and thecorrespondences of the stereo mosaics to generate a depthmap. Does the ray interpolation step introduce extraerrors? Second, the final disparity equation of the stereomosaics does not include any focal length. Does it meansthat the depth recovery accuracy from stereo mosaics isreally independent of the focal length of the camera thatcaptures the original video? We will discuss these twoissues in the following sections.

4. Error Analysis of Ray Interpolation

4.1. Comparison of 3D vs. 2D mosaicingFirst, we give the rationale why “3D mosaicing” is soimportant for 3D reconstruction from stereo mosaics by areal example. Fig. 5 shows the local match and rayinterpolation of a successive frame pair of a UMasscampus scene, where the interframe motion is (sx, sy) =(27, 48) pixels, and points on the top of a tall building (theCampus Center) have about 4 pixels of additional motionparallax. As we will see next, these geometricmisalignments, especially of linear structures, will be

clearly visible to human eyes. Moreover, perspectivedistortion causing the geometric seams will introduceerrors in 3D reconstruction using the parallel-perspectivegeometry of stereo mosaics. In the example of stereomosaics of the UMass campus scene [18], the distancebetween the front and the rear slice windows isdy = 192pixels, and the average height of the aerial camera fromthe ground isH = 300 meters (m). The relativeydisplacement of the building roof (to the ground) in thestereo mosaics is about∆y = -29 pixels. Using Eq. (2) wecan compute that the "absolute" depth of the roof from thecamera isZ = 254.68 m, and the "relative" height of theroof to the ground is ∆Z = 45.31 m. A 4-pixelmisalignment in the stereo mosaics will introduce a depth(height) error ofδZ = 6.25 m, though stereo mosaics haverather large "disparity" (dy =192). While the relative errorof the "absolute" depth of the roof (δZ/Z) is only about2.45%, the relative error of its "relative" height (δZ/∆Z) isas high as 13.8%. This clearly shows that geometric-seamless mosaicing is very important for accurate 3Destimation as well as good visual appearance. It isespecially true when sub-pixel accuracy in depth recoveryis needed [17].

In principle, we need to match all the points between thetwo fixed lines of the successive frames to generate acomplete parallel-perspective mosaic. In an effort toreduce the computational complexity, we have designed afast 3D mosaicingalgorithm [7] based on the proposedPRISM method. It only requires matches between a set ofpoint pairs in two successive images around theirstitchingline, which is defined as a virtual line in the middle of thetwo fixed lines (see Fig. 5). The pair ofmatching curvesin the two frames is then mapped into the mosaic as astitching curveby using the ray interpolation equation (7).The rest of the points are generated by warping a set oftriangulated regions defined by the control points on thematching curve (that correspond to the stitching curve)and the fixed line in each of the two frames. Here weassume that each triangle is small enough to be treated asa planar region.

Using sparse control points and image warping, theproposed 3D mosaicing algorithm only approximates theparallel-perspective geometry in stereo mosaics (Fig. 9),but it is good enough when the interframe motion is small(e.g., Fig. 10 and Fig. 11). Moreover, the proposed 3Dmosaicing algorithm can be easily extended to use morefeature points (thus smaller triangles) in the overlappingslices so that each triangle really covers a planar patch ora patch that is visually indistinguishable from a planarpatch, or to perform pixel-wise dense matches to achievetrue parallel-perspective geometry.

While we are still working on 3D camera orientationestimation using our instrumentation and the bundle


6

adjustments [15], Fig. 9 shows mosaic results wherecamera orientations were estimated by registering theplanar ground surface of the scene via dominant motionanalysis. However the effect of seamless mosaicing isclearly shown in this example. Please compare the resultsof 3D mosaicing (parallel-perspective mosaicing) vs. 2Dmosaicing (multi-perspective mosaicing) by looking alongmany building boundaries associating with depth changesin the entire 4160x1536 mosaics at our web site [18].Since it is hard to see subtle errors in the 2D mosaics ofthe size of Fig. 9a, Fig. 9b and Fig. 9c show close-upwindows of the 2D and 3D mosaics for the portion of thescene with the tall Campus Center building. In Fig. 9b themulti-perspective mosaic via 2D mosaicing has obviousseams along the stitching boundaries between two frames.It can be observed by looking at the region indicated bycircles where some fine structures (parts of a white bloband two rectangles) are missing due to misalignments. Asexpected, the parallel-perspective mosaic via 3Dmosaicing (Fig. 9c) does not exhibit these problems.

4.2. Errors from ray interpolationIn theory, the adaptive baseline inherent to the parallel-perspective geometry permits depth accuracy independentof absolute depth. However, in practice, two questionsneed to be answered related to local match and rayinterpolation. First, since we use motion parallax betweentwo successive frames, will the small baseline betweenframes introduce large errors in ray interpolation? Second,is there any resolution gain or loss due to the change ofthe perspective projection of original frames to theparallel projection of the stereo mosaics for differentdepths?

The answer to the second question is relatively simple: Asimple transformation of perspective frames to parallel-perspective mosaics does introduce resolution changes inimages (Fig. 6). Recall that we build the mosaics on afixation plane of the depthH. It means that the imageresolution in the stereo mosaics are the same as theoriginal frames only for points on planeH. However, forregions whose depths are less thanH, a simple parallel rayre-sampling process will result in resolution loss. On theother hand, regions whose depths are larger thanH couldhave their resolution enhanced by sub-pixel interpolation.This tells us that if we select the fixation plane above allthe scene points, we can make full use of the imageresolution of the original video frames. However, if westill want to keep the fixation plane between the scenepoints, we can still preserve the image resolution for thenearer points by a super-sampling process. For the pointsbelow the fixation plane, resolution could be betterenhanced by using sub-pixel interpolation between a pairof frames as illustrated in Fig. 6, assuming that we areperforming a sub-pixel match for the ray interpolation.

For example, for a point P that lies between two point P1

and P2 on the grids of the image O1, we find its matchbetween point Q1 and Q2 on the grids of the image O2.Then the color of the point P can be better interpolated byusing points Q1 and P2 since they are closer to the point Pin space.

In order to answer the first question, we formulate theproblem as follows (under 1D translation): Given anaccurate pointy3 = -dy/2 in view O3 that contributes to theright mosaic, we try to find a match pointyi=+d y/2 in aview that contributes to the left mosaic with parallel-perspective projection (Fig. 7). Note that we express thesepoints in their corresponding frame coordinate systemsinstead of the mosaicing coordinate system for easynotations; the mappings from these points to themosaicing coordinates are straightforward. The pointyi isusually reprojected from an interpolated viewOi generatedfrom a match point pairy1 and y2 in two existingconsecutive viewsO1 andO2. The localization error of thepoint yi depends on the errors in matching and localizingpointsy1 andy2. Analysis (see Appendix) shows that evenif the depth from two successive viewsO1 andO2 cannotgive us good 3D information (as shown by the large pinkerror region in Fig. 7, Eq. (a-6)), the localization error ofthe interpolated point (i.e. the left ray ofOi) is quite small(Eq. (a-5). It turns out that such depth error of stereomosaics is bounded by the errors of two pairs of stereoviews O1+O3 and O2+O3, both with almost the same"optimal" baseline configuration as the stereo mosaics.Using Eq. (a-6), and also considering the resolutionchanges in the mosaics discussed above, the depthestimation error of stereo mosaics can be derived as

ÿÿ�

ÿÿ�

�

>∈

≤==

∂∂

HZd

Z

d

H

HZd

H

yy

y

if,),(

if,

y

Z (9)

where pixel localization error y∂ is measured in the

mosaics rather than in the original frames as in Eq. (a-6).So the resolution lose (Z<H) and enhancement (Z>H) arereflected in Eq. (9). Comparing Eq. (9) with Eq. (4), it canbe seen that the depth error of the "real" stereo mosaicsgenerated by ray interpolation is related to the actualdepth (Z) of the point instead of just the average depthH.Therefore, in practice the depth accuracy is notindependent of absolute depth. Nevertheless, parallel-perspective stereo mosaics still provide a stereo geometrywith a pre-selected and fixed disparity and adaptivebaselines for all the points of different depths. Here arefour conclusions that are very important to the generationand applications of the stereo mosaics (refer to theequations in Appendix):

Conclusion 1. In theory, the depth accuracy of parallel-perspective stereo is independent of absolute depths;


7

however, in practice, the depth error of the stereomosaics is roughly proportional to the absolute depth of apoint.

Conclusion 2. Parallel-perspective stereo is apparentlymore accurate than two-view perspective stereo. Parallel-perspective stereo mosaics provide a stereo geometry withadaptive baselines for all the points of different depths,and depth error is better than a linear function ofabsolute depth. In contrast, the two-view perspectivestereo has a fixed baseline, and the depth error is asecond order function of absolute depth.

Conclusion 3. Ray interpolation does not introduce extraerrors to depth estimation from parallel-perspectivestereo mosaics. The accuracy of depth estimation usingstereo mosaics via ray interpolation is comparable to thecase of two-view perspective stereo with the same"adaptive" baseline configurations (if possible).Obviously, stereo mosaics provide a nice way to achievesuch configurations.

Conclusion 4. The ray interpolating accuracy isindependent of the magnitude of the interframe motion.This means that stereo mosaics with the same degree ofaccuracy can be generated from sparse image sequences,as well as dense ones, given that the interframe matchesare correct.

5. Error Analysis versus Focal Lengths

5.1. Selecting focal length and image resolutionIt is well known that in stereo vision, a large baseline

will give us better 3D accuracy in 3D recovery. Thegeometric property of the parallel-perspective stereomosaics also indicates that a larger angle between the twosets of rays of the stereo mosaics will give us largerbaselines (By in Fig. 1), hence better 3D accuracy. Itseems to tell us that a wide-angle lens (with shorter focallength) could give us larger baselines and hence betterstereo mosaic geometry than a tele-photo lens (with longerfocal length). However, one must consider several factorsthat affect the generation of the stereo mosaics and thecorrespondence of the stereo mosaics, to see this argumentis not necessarily true.

First we assume that the camera has the same number ofpixels no matter what the focal length (and the field ofview) is. A simple fact is that wider field of view (FOV),i.e., shorter focal length always means lower imageresolution (which is defined as thenumber of pixels permeter length of the footprint on terrain). Our question is:given the same distance of the two slit windows,dy(inpixels), what kind of focal length gives us better depthresolution, the wide angle lens or the telephoto lens?

In stereo mosaics, the error in depth estimate comes fromthe localization error of the stereo displacement∆y, whichconsists of two parts: the mosaic registration errorδb1 andthe stereo match errorδb2. The first part mainly comesfrom the baseline estimation (i.e. “calibration”) errorδB,by the following equation:

BH

Fb δδ =1 (10)

whereH is the depth of the fixation plane in generatingthe mosaics. From Eq. (2) the depth error part due to themosaic error is

Bd

Fb

d

HZ

yyδδδ == 11

(11)

Second, the depth estimation error due to the stereo matcherror δb2 depends on how big aδb2-pixel footprint is onthe ground. Since the image resolution of a point of depthH in the image of the focal lengthF is F/H (pixels/meter),the size of the footprint on the ground will be (Fig. 8)

2bF

HY δδ = (12)

Obviously shorter focal lengths produce larger footprints.,hence lower spatial resolution This part of the depth errorcan be expressed as

22 bd

HY

d

FZ

yyδδδ == (13)

Note that the same depth accuracy in terms of stereomatching is achieved for different focal lengths since thelarger baseline in the case of the wider field of viewexactly compensates for the larger footprint on the groundwith parallel projections (Fig. 8). The total depth error is

)( 21 bbdy

HZ δδδ += (14)

or

2bdy

HB

dy

FZ δδδ += (15)

whose differences will be explained in the following:

(1). If the registration error in generating mosaics isindependent of the focal length, which could be the casewhen the relative camera orientation is directly estimatedfrom interframe image registration and bundleadjustments, then Eq. (14) shows that depth error isindependent of the focal length (Fig. 8). However, since asmaller focal length (wide FOV) means a larger anglebetween the two set of left and right rays of the stereomosaics (given the same distance of the slit windows), itwill introduce larger matching errorδb2 due to occlusion,perspective distortion and illumination changes of largeseparated view angles.

(2) If the absolute camera orientation (and hence thebaselineBy) is estimated from other instrumentation otherthan image registration, the registration error in generatingmosaics will be proportional to the focal length (Eq. (10)).This means the same baseline error will introduce larger


8

mosaic registration error if larger focal length is used. Inthis case, Eq. (15) should be used to estimate the deptherror, which indicates that given the baseline estimateerror δB, larger focal length will introduce larger error inthe first part due to the multiplication ofF, and smallererror in the second part due to the smaller stereo matcherror δb2. As it is hard to give an explicit function of thestereo match error versus focal length (and viewdifference), it is roughly true that the second part isdominant using a normal focal length. In this case, ashorter focal length (and wider view direction difference)in generating stereo mosaics will introduce larger matcherror due to lower image resolution, significantly largerocclusion and more obvious illumination differences. Onthe other hand, too long a focal length will result in tooshort baselines, hence too big enlargement of thecalibration error in the images. Therefore, it is possible tofind an optimal focal length if we can specify a stereomatching error function versus the focal length (field ofview), considering the texture and depth variation of theterrain and the size of the stereo match primitives in stereoimages. Quantitatively, we have the following conclusion:

Conclusion 5. Ideally, estimating depth error of stereomosaic is independent to the focal length of the camerathat generates the stereo mosaics. However, in practicelonger focal length will give better 3D reconstructionfrom the stereo mosaics, due to the finer image resolution,less occlusion and fewer lighting problems if areasonably good baseline geometry can be constructed.

5.2. Experimental analysis

As an example, Fig. 10 an Fig. 11 compare the realexamples of 3D recovery from stereo mosaics generatedfrom a telephoto camera and a wide angle camera for thesame forest scene. The average height of the airplane isH= 385 m, and the distance between the two slit windowsfor both the telephoto and wide-angle stereo mosaics isdy

= 160. The focal length of the telephoto camera is Ftele =2946 pixels and that of the wide angle camera is Fwide =461 pixels (which were estimated by a simple calibrationusing the GPS/INS/laser range information with thecamera, and the results from image registration). In bothcases, the size of the original frames are 720 (x)*480(y)where the camera moved in the vertical (y) direction. By asimple calculation, the image resolution of the telephotocamera is 7.65 pixels/meter and that of the wide-anglecamera is 1.20 pixels/meter.

The depth maps of stereo mosaics were obtained by usingthe Terrest system based on a hierarchical sub-pixel densecorrelation method [17]. Fig. 10c and Fig. 11c show thederived "depth" maps (i.e., displacement maps) from thepairs of telephoto and wide angle parallel-perspectivestereo mosaics of the forest scene. In the depth maps,

mosaic displacements are encoded as brightness so thathigher elevations (i.e. closer to the camera) are brighter. Itshould be noted here that the parallel-perspective stereomosaics were created by the proposed 3D mosaicingalgorithm, with the camera orientation parametersestimated by the same dominant motion analysis as in Fig.9. Here, the fixation plane is a "virtual" plane with anaverage distance (H=385 m) from the scene to the camera.Fig. 10d and Fig. 11d show the distributions of the mosaicdisplacements of the real stereo mosaics in Fig. 10 andFig. 11. It can be found that the∆y displacementdistribution of the telephoto stereo mosaics has almost azero mean, which indicates that the numbers of pointsabove and below the virtual fixation plane are very close.In the depth map of the wide-angle mosaics, more pointson tree canopies can be seen. For both cases, most of thepixels have displacements within -10.0 pixels to +10.0pixels. Using Eq. (2) we can estimate that the range ofdepth variations of the forest scene (from the fixationplane) is from -24.0 m (tree canopy) to 24.0 m (theground).

Fig. 12 and Fig. 13 show close-up windows of the stereomosaics and the depth maps for both telephoto and wide-angle cameras. By comparison, the telephoto stereomosaics have much better spatial resolutions of the treesand the ground, and have rather similar appearance in theleft and right views. In contrast, the left and right wideangle stereo mosaics have much large differences inillumination and occlusion, as well as much lower spatialresolution. The large illumination differences in the wide-angle video are due to the sunlight direction that alwaysmade the bottom part of a frame brighter (and sometimeoversaturated) than the top part (Fig. 13d). From theexperimental results, we can see that better 3D results areobtained from the telephoto stereo mosaics than from thewide-angle stereo mosaics.

6. ConclusionsIn the proposed stereo mosaicing approach for large-scale3D scene modeling, the computation of "match" isefficiently distributed in three steps: camera poseestimation, image mosaicing and 3D reconstruction. Inestimating camera poses, only sparse tie points widelydistributed in the two images are needed. In generatingstereo mosaics, matches are only performed for viewinterpolation between small overlapping regions ofsuccessive frames. In using stereo mosaics for 3Drecovery, matches are only carried out between the twofinal mosaics, which is equivalent to finding a matchingframe for every point in one of the mosaics with a fixeddisparity.

In terms of depth recovery accuracy, parallel-perspectivestereo mosaics provide adaptive baselines and fixed


9

disparity. We have obtained several importantconclusions. Ray interpolation between two successiveviews is actually very similar to image rectification, thusthe accuracy of two-step match mechanism (mosaicing +stereo match) for 3D recovery from stereo mosaics iscomparable to that of a perspective stereo with the sameadaptive/optimal baseline configurations. We also showthat the ray interpolation approach works equally well forboth dense and sparse image sequences in terms ofaccuracy in depth estimation. Finally, given the number ofpixels in the original frames, the errors of depthreconstruction is somewhat related to the focal length (andthe image resolution) of the camera that captures the videoframes. Although further study is needed to investigatewhat is the best focal length for a certain spatial relationof the camera and the terrain, it seems that stereo mosaicsusing a telephoto lens (with narrower FOV, higher imageresolution and less perspective distortion) gives better 3Dreconstruction results than those of a wide angle lens. Infact we can extract multiple (>2) mosaics with smallviewing angle differences between each pair of nearbymosaics [18]. Multi-disparity stereo mosaics could be anatural solution for the problem of matching across largeoblique viewing angles.

AcknowledgementsThis work is partially supported by NSF EIA-9726401, and NSFCNPq EIA9970046. The authors would like to thank theanonymous reviewers for their valuable comments andsuggestions.

References[1]. H.-C. Huang and Y.-P. Hung, Panoramic stereo imaging

system with automatic disparity warping and seaming,Graphical Models and Image Processing, 60(3): 196-208,1998.

[2]. H. Ishiguro, M. Yamamoto, and Tsuji, Omni-directionalstereo for making global map,ICCV'90, 540-547.

[3]. S. Peleg, M. Ben-Ezra, Stereo panorama with a single camera,CVPR'99: 395-401

[4]. H. Shum and R, Szeliski, Stereo reconstruction frommultiperspective panoramas,Proc. IEEE ICCV99, 14-21,1999.

[5]. Z. Zhu, A. R. Hanson, H. Schultz, F. Stolle, E. M. Riseman,Stereo mosaics from a moving video camera for environmentalmonitoring, Int. Workshop on Digital and ComputationalVideo, 1999, Tampa, Florida, pp 45-54.

[6]. Z. Zhu, E. M. Riseman, A. R. Hanson, Theory and practice inmaking seamless stereo mosaics from airborne video,Technical Report #01-01, CS Dept., UMass-Amherst, Jan.2001 (http://www.cs.umass.edu/~zhu/UM-CS-2001-001.pdf).

[7]. Z. Zhu, E. M. Riseman, A. R. Hanson, Parallel-perspectivestereo mosaics, InICCV’01, Vancouver, Canada, July 2001.

[8]. J. Y. Zheng and S. Tsuji, Panoramic representation for routerecognition by a mobile robot. IJCV 9 (1), 1992, 55-76

[9]. R. Kumar, P. Anandan, M. Irani, J. Bergen and K. Hanna,Representation of scenes from collections of images, InIEEEWorkshop on Presentation of Visual Scenes, 1995: 10-17.

[10]. J. Chai and H. -Y. Shum, Parallel projections for stereoreconstruction,CVPR’00: II 493-500.

[11]. S. Peleg, J. Herman, Panoramic Mosaics by ManifoldProjection.CVPR'97: 338-343.

[12]. H.S. Sawhney, Simplifying motion and structure analysisusing planar parallax and image warping. ICPR'94: 403- 408

[13]. R. Szeliski and S. B. Kang, Direct methods for visual scenereconstruction, InIEEE Workshop on Presentation of VisualScenes,1995: 26-33

[14].R. Gupta , R. Hartley, Linear pushbroom cameras,IEEETrans PAMI, 19(9), Sep. 1997: 963-975

[15].Schultz, H., Hanson, A., Riseman, E., Stolle, F., Zhu. Z., Asystem for real-time generation of geo-referenced terrainmodels,SPIE Symposium on Enabling Technologies for LawEnforcement, Boston MA, Nov 5-8, 2000

[16].C. C. Slama (Ed.), Manual of Photogrammetry, FourthEdition,American Society of Photogrammetry, 1980.

[17].H. Schultz. Terrain Reconstruction from Widely SeparatedImages, InSPIE. Orlando, FL, 1995.

[18].Z. Zhu, PRISM: Parallel ray interpolation for stereo mosaics,http://www.cs.umass.edu/~zhu/StereoMosaic.html

Appendix: Error analysis of ray interpolation

We formulate the problem as follows: Given an accurate pointy3

= -dy/2 in view O3 that contribute to the right mosaic, we try tofind a match pointyi=+d y/2 in a view that contributes to the leftmosaic with parallel-perspective projection (Fig. 7). The pointyi

is usually from an interpolated viewOi generated from a matchpoint pair y1 and y2 in existing consecutive viewsO1 and O2.Suppose the interframe baseline between O1 and O2 is Sy, andthe baseline between O1 and O3 is By. First we can write outequations of the depth errors by two view stereos O1+O3 andO2+O3 , both with almost the same baseline configurations asthe adaptive baseline between Oi and O3. The depth from thepair of stereo views O1 and O3 is

2/131 y

yy

dy

BF

yy

BFZ

+=

−=

and the depth estimation error is

2/y

Z

11,3 ydy

Z

+=

∂∂ (a-1)

wherey1 is slightly greater thandy/2 by a small valueδy1:||2/ 11 ydy y δ+= (a-2)

Similarly, The depth from the pair of stereo views O2 and O3 is

2/232 y

yyyy

dy

SBF

yy

SBFZ

+−

=−−

=

and the depth estimation error is

2/y

Z

23,2 ydy

Z

+=

∂∂ (a-3)

wherey2 is slightly smaller thandy/2 by a small valueδy2:||2/ 22 ydy y δ−= (a-4)

Using Eq. (5) we can calculate the relative translationalcomponentSyi of the interpolated view Oi to the first view:

yy

yi Syy

dyS

21

1 )2/(

−−

= (a-4)


10

The localization error of the pointSyi, which determines themosaicing accuracy(Eq. (7)), depends on the errors in matchingand localizing pointsy1 and y2, which can be derived bydifferentialing Eq. (a-4) by bothy1 andy2:

]||)(

)2/(||

)(

)2/()([|| 22

21

112

21

121y

yy

dyy

yy

dyyySS

yyyyi ∂

−

−+∂

−

−−−=∂

By assuming that yyy 21 ∂≡∂=∂ , and using )/( 21 yyFSZ y −= ,

we can conclude that

F

Zyi =∂

∂y

S (a-5)

where F is the focal length. It is interesting to note thatinterpolating accuracy is independent of the magnitude of theinterframe motionSy. For comparison, the depth error from thetwo consecutive frames O1+O2 is

yFS

Z

yy

Z 2

11,2 2y

Z =−

=∂∂ (a-6)

Apparently smaller interframe motion will introduce muchlarger depth estimating error (see the pink region in Fig. 7). Thedepth estimation from stereo mosaics can be written as

)(3

yiyyi

yiySB

d

F

yy

SBFZ −=

−−

=

where we inserty3 = -dy/2 and yi=+d y/2. This equation isequivalent to Eq. (2). Using Eq. (a-5), the depth estimation errorof stereo mosaics can be expressed by

yi d

Z=∂∂

3,y

Z (a-7)

It turns out that the depth error of stereo mosaics is bounded bythe errors of two view stereos O1+O3 and O2+O3, both withalmost the same adaptive baselines as the stereo mosaics, i.e.

2,3i,31,3 y

Z

y

Z

y

Z

∂∂≤

∂∂≤

∂∂ (a-8)

Fig. 5. Examples of local match and triangulation for the left mosaic.Close-up windows of (a)the previous and (b) the current frame. Thegreen crosses show the initially selected points (which are evenlydistributed along the ideal stitching line) in the previous frame and itsinitial matches in the current frame by using the global transformation.The blue and red crosses show the correct match pairs by featureselection and correlation (red matches red, blue matches blue). Thefixed lines, stitching lines/curves and the triangulation results areshown as yellow.

Fig. 6. Resolution changes from perspective projection (solid raysfrom O1) to parallel projection (dashed rays). In a simple rayinterpolation where each pixel in the mosaics is only from a singleframe, resolution remains the same for plane H, reduces to half (graydots) for plane H/2, and could be two times (the original black dotspus the interpolated white dots) for plane 2H. With imageinterpolation from more than one frames, image resolution can bebetter enhanced by sub-pixel interpolation (see text).This figureshows the case where parallel rays are perpendicular to the motion.However, same principle applies for the left and right views of thestereo mosaics.

Fig. 7. Error analysis of view interpolation. While depth estimation fortwo consecutive frames is subject to large error, the localization error ofthe interpolated ray for stereo mosaics turn out to be very small

Fig. 8. Depth error versus focal lengths (and fields of view). Note thatrays are parallel due to parallel projections, which gives the same depthaccuracy with different focal lengths since the larger baseline in thecase of the wider field of view compensates the larger footprint on theground. However, in practice, stereo mosaics from a telephoto camerahave better depth accuracy because of better stereo match.

Fixed line

Stitching line /matching curve 1(a)

Fixed line

Stitching curve/ matching curve 2:about 4 pixels of additional motionparallax on the top

(b)

O3O1 O2Oi

-dy/2dy/2dy/2

y2y1 yi

(X,Y,Z)

Right ray from O3

Left ray from O2

Left ray from O1

H

δZwide=δZtele

δYwide

Telephoto stereo mosaics

Wide anglestereo mosaics

δZtele

δYtele

O1

H/2

2H

H

P P2P1 Q1 Q2

O2


11

Seams due tomisalignment Seamless after

local match

cb

a

Fig. 9. Parallel-perspective mosaics of the UMasscampus scene from an airborne camera. The (parallel-perspective) mosaic (a) is the left mosaic generatedfrom a sub-sampled "sparse" image sequence (every10 frames of total 1000 frames) using the proposed3D mosaicing algorithm. The bottom two zoom sub-images show how 3D mosaicing deals with largemotion parallax of a tall building: (b) 2D mosaic resultwith obvious seams (c) 3D mosaic result withoutseam.

Fig. 10. Stereo mosaics and 3D reconstruction of a166-frame telephoto video sequence. The size ofeach of the original mosaics is 7056*944 pixels. (a)left mosaics (b) right mosaics (c) depth map(displacement ∆y from 33 to -42 is encoded asbrightness from 0 to 255) (d) depth (displacement)distribution (canopies above the fixation plane:negative displacement; points below the fixationplane: positive displacement)

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

-60 -40 -20 0 20 40

∆∆∆∆ y (p ixels)

H(n

umbe

rof

pixe

ls)

a

b

c

d


12

Fig. 12. Zoom regions of the telephoto stereo mosaics in Fig. 10 show high resolution of the trees and good appearance similarityin (a) the left and (b) the right mosaics, and hence produce (c) good 3D results.

a b c

0

2 0 0 0

4 0 0 0

6 0 0 0

8 0 0 0

1 0 0 0 0

1 2 0 0 0

1 4 0 0 0

1 6 0 0 0

1 8 0 0 0

2 0 0 0 0

-6 0 -4 0 -2 0 0 2 0 4 0

∆∆∆∆ y (p ix e ls )H

(nu

mbe

rof

pixe

ls)

Fig. 11. Stereo mosaics and 3D reconstruction of a 344-frame wide angle video sequence. The size of each of the original mosaicsis 1680*832 pixels. (a) left mosaics (b) right mosaics (c) depth map (displacement ∆y from 33 to -42 is encoded as brightness from0 to 255) (d) depth (displacement) distribution(canopies above the fixation plane: negative displacement; points below the fixationplane: positive displacement)

a

b

c

d

Fig. 13. Zoom regions of the wide angle stereo mosaics in Fig. 11 show much lower resolution of trees and largely differentilluminations, perspective distortions and occlusions in (a) the left and (b) the right mosaics, and hence produce (c) less accurate3D results. (d) The camera moves toward the sun so the bottom part is always brighter (and sometime over-saturated) than the toppart of each frame due to the sunlight reflection. It is an unusual case that you take a photo both along and against the direction oflight. The right mosaic comes from the bottom part while the left mosaic comes from the top part of video frames.

a b c d

sun

cameraLeftmosaic

Rightmosaic

terrain

error characteristics of parallel-perspective stereo mosaicszhu/zhuvideoreg2001.pdfpeleg &...

Documents