propagation algorithm for image-based rendering using full or

1

Propagation Algorithm for Image-Based RenderingUsing Full or Partial Depth Information

Ha T. Nguyen,Student Member, IEEE, Minh N. Do, Member, IEEE,

Abstract— We propose a new approach, called thePropagationAlgorithm, for the image-based rendering (IBR) application thatsynthesizes novel images, as taken by virtual cameras at arbitraryviewpoints, using a set of acquired images. The PropagationAlgorithm proactively propagates all available information fromactual cameras to virtual cameras, using the depth information atall or some of intensity pixels. This process turns the IBR probleminto a two-dimensional interpolation problem at virtual imageplanes. Each virtual image thus can be efficiently rendered atonce for all image pixels. Experimental results show the Propaga-tion Algorithm produces accurate rendering, even around objectboundaries – where most of the existing IBR techniques fail.

Index Terms— Image-Based Rendering, Plenoptic Function,Range Camera, Depth Information, Ray Tracing, NonuniformInterpolation, Image Segmentation.

I. I NTRODUCTION

Image-based rendering (IBR)is an emerging technologythat has been developed as an alternative to traditionalmodel-based techniquesfor image synthesis. IBR synthesizes novel(or virtual) images, as taken by virtual cameras at arbitraryviewpoints, using a set of acquired images. With the advan-tages of being photorealism and having low complexity overmodel-based techniques, IBR has many potential applicationssuch as remote reality and telepresence [22], [25].

Situated at the crossroad of computer vision, computergraphics and signal processing, IBR has attracted a lot of atten-tion in recent years. Shum et al. [22] classify IBR techniques ina continuum based on the level of prior knowledge of the scenegeometry. At one end of the continuum,3D warping [13], [14],layered-depth images (LDIs) [21], and LDI trees [2], requirethe full geometrical information. In view interpolation [3],view morphing [20], and joint view triangulation [11], featurecorrespondence between images is used as implicit geomet-rical information. At the other end of the continuum, lightfield rendering [10] and Lumigraph [7] use many images tocompensate the scene geometrical information.

Despite many IBR techniques having been proposed, thereis still room to improve the rendering quality and computa-tional complexity [22]. One of the reasons is that many ofthe existing IBR techniques essentially follow the traditionalframework, in which an intermediate step called image-based

Ha T. Nguyen is with the Department of Electrical and Computer Engineer-ing and the Coordinated Science Laboratory, University of Illinois, Urbana IL61801 (email: [email protected]).

M. N. Do is with the Department of Electrical and Computer Engineering,the Coordinated Science Laboratory, and the Beckman Institute, University ofIllinois, Urbana IL 61801 (email: [email protected]).

This work was supported by the National Science Foundation under GrantITR-0312432.

(a) Full depth. (b) Depth at about3% of pixels.

Fig. 1. Virtual images rendered by Propagation Algorithm. (a): Teddy imageusing Full Depth Propagation Algorithm. (b): Teddy image using Partial DepthPropagation Algorithm (using about3% of number of intensity pixels).

modeling reconstructs the scene geometry, explicitly or implic-itly, using the acquired images. In other words, existing IBRtechniques render virtual images through a reconstruction ofthe scene geometry,even if what we really want is not the3Dscene geometry but only the2D virtual images. Once havingthe scene geometry, the ray-tracing approach is usually used torender virtual images. For each virtual pixel one draws a raythrough the camera center and the virtual pixel and finds thepoint where this ray hits an object surface in the scene. Theintensity of this point is then interpolated using informationof neighboring actual pixels. The ray tracing approach iscomputationally complex, and has a limited rendering qualityaround the object boundaries [6], [24].

In this paper, we present a new approach for the IBR prob-lem. We named this approachPropagation Algorithm[15].Our approach differs from existing methods in two points:

1) We assume that thedepth information is available for allor someimage pixels (in our experiments in section IV,only about 3% of number of intensity pixels havingdepth is enough for an acceptable rendering quality).

2) We do not use the ray-tracing approach and intensityinterpolation pixel by pixel on virtual images. Instead,we proactively propagate all available information to thevirtual image plane first, remove occluded points in con-tinuous domain, and thendo the intensity interpolationonly once for all the pixels.

For the first point, the depth assumption is widely made inliterature, for both synthesized and realistic images [6], [21],[2], [13], [14], [20], [11], [3]. For synthesized images, thedepth information can be computed because of the availabilityof the scene geometry and the camera model. For realisticimages, the assumption is justified by the availability of range

2

O

D(x)

XImage plane

Object surface x

Camera center

Fig. 2. The pinhole camera model. Pixelx, situated at the intersection of theimage planeand the ray connecting thecamera centerO and the3D pointX, is the image ofX. Note that the depthD(x) is defined by the distancefrom X to the camera image plane.

camera technology1 (like 3D range sensing [9], [17]) andby the existence of feature correspondence techniques [19]and techniques to reconstruct the scene geometry from pixelcorrespondences [12], [5], [8]. The second point is motivatedby a fact that the IBR problem is a nonuniform interpolationproblem [25]. However, our algorithm does not interpolate thegenerally high dimensional plenoptic function [1] or locallyinterpolate pixel intensities (using nearby cameras and/ornearby pixels), but it follows the classical2D interpolationframework at the virtual image plane. This framework collectsthe available information first, then interpolates the intensityfunction at the virtual cameras at once for all pixels. Theexperimental results show that the Propagation Algorithmimproves the rendering quality, especially around the objectboundaries.

This paper is organized as follows. Section II sets up theproblem of IBR with depth information. Section III describesthe Propagation Algorithm and experimental results in the casewhere all the pixels are associated with depth information.Section IV presents the algorithm when we have the depthinformation for only a small subset of intensity pixels. Finally,section V presents the conclusion the paper and discussion offuture work.

II. PROBLEM STATEMENT

In this paper, we consider the pinhole camera model (seeFig. 2), although the proposed algorithm should work for othercamera models as well. We furthermore suppose that: i) thescene surfaces are lambertian, i.e. images of a3D point indifferent cameras have the same intensity, and ii) the intensityand depth function are sufficiently smooth to be able to beinterpolated from a finite set of samples.

Each cameraC is specified by an intrinsic calibration matrixK and its location specified by a rotation matrixR and atranslation vectorT [12]. The setsI and D are the set ofintensity pixels and of depth pixels at cameraC, respectively.Capital letterX denotes a point in the3D world coordinate,and small letterx denotes the image ofX in 2D imagecoordinate. The intensity and depth ofx are I(x) andD(x),

1Though it requires a registration process prior to the Propagation Algo-rithm to merge the intensity and depth information together.

Inputs:Of actual camera{Ci}N

i=1 :Ki, Ri, Ti : calibration information,{I(x) : x ∈ Ii} : actual images,{D(x) : x ∈ Di} : depth images,

Of virtual cameraC :Π : projection matrix,I : collection of desired pixels

Output:{I(x) : x ∈ I} : rendered image atC.

Fig. 3. The inputs and output of the Propagation Algorithm for IBR.The inputs are the (intrinsic and extrinsic) calibration, intensity and depthinformation at the actual cameras, and the calibration information at the virtualcamera. The output is the virtual image at the virtual camera.

respectively. As the camera is supposed to be pinhole, theimage formation process can be described using the followingprojection equation [12]:

D(x)(

x1

)= [KR,KT ]

(X1

). (1)

We denotex = (x, 1)T and X = (X, 1)T be thehomoge-neous coordinate vectorof x andX, andΠ = [KR,KT ] bethe projection matrix ofC [12]. Note thatx andX are definedup to a scale, that is for allt ∈ R, t 6= 0, we also have(tx, t)T

and(tX, t)T in homogeneous coordinates correspond tox andX in Euclidean coordinates, respectively. In this paper, unlesswe specify otherwise,x andX have the last coordinate equalto 1. Eq. (1) then becomes a linear equation in homogeneouscoordinates.

The inputs of our algorithm are calibration informationKi, Ri andTi, the intensity images{I(x) : x ∈ Ii}, the depthinformation{D(x) : x ∈ Di} of the actual cameras{Ci}N

i=1,and also a projection matrixΠ and a collection of the desiredpixels I of the virtual cameraC. This implies that we needall the cameras to be calibrated in addition to the intensity anddepth information. The output of the algorithm is the virtualimage{I(x) : x ∈ I}. We consider the case we render onlyone virtual image. In case where collection of virtual imagesis desired, a natural extension of the Propagation Algorithmwith some level of optimization can be implemented.

III. PROPAGATION ALGORITHM WITH FULL DEPTH

INFORMATION

In this section we consider the full depth case, i.e.Ii =Di for all i = 1, 2, . . . , N . That means for every actual pixelwe know both the intensity and depth information. In the firsttwo subsections, we review the traditional ray-tracing approachand present an overview of the Propagation Algorithm tohighlight its new features and differences with the traditionalapproach. In the next three subsections, we detail the threesteps of the Propagation Algorithm: Information Propagation,Occlusion Removal and Intensity Interpolation. The last sub-section presents some experimental results.

3

P – actual depth?

d

f

u0 u u1

(a) Ray-tracing approach.

P

d

f

u0 u(b) Propagation Algorithm.

Fig. 4. Illustration of the differences between the ray-tracing and propagation approaches. The setting has two actual cameras atu0 andu1, and a virtualcamera atu. Arrows indicate the flow of information. The ray-tracing approach find the actual depth at each pixel by searching all possible candidates alongthe ray through the camera center and the virtual pixel. Ray-tracing hence requires high computational expense. Propagation Algorithm collects all availableinformation at the virtual image plane before actually processing the information.

A. Traditional ray-tracing approach

In the traditional ray-tracing approach, we render virtualimages pixel by pixel. For each virtual pixel, we first tryto recover the depth associated with this virtual pixel, i.e.where the ray through the camera center and the virtual pixelhits an object surface in the scene. In IBR reconstruction-based techniques, this requires the reconstruction of the scenegeometry, which is very costly or even unrealistic. To avoid thegeometrical reconstruction, techniques like local color consis-tency (comparing the intensity of corresponding pixels [24])or depth matching (comparing the depth of correspondingpixels [6]) have been proposed to find the depth. Thesetechniques, instead of finding the intersection of the virtualray with the object surface, test a collection of points alongthe virtual ray, and return the one that produces the bestconsistency of intensity and/or depth. For example in Fig. 4(a),we test all points along the dash line through the center ofvirtual camerau (P is such a point.)

After having the depth or the intersection point, we traceback to the actual cameras to get the corresponding pixels. Therendered intensity is then computed using the correspondingpixels found from actual images. Ray-tracing techniques arehence computationally complex, and even with the help ofdepth information, the rendering quality is still limited becauseof depth discontinuities around object boundaries [6]. Also,because the images contain just discrete samples of the inten-sity function, exact pixel correspondences hardly exist. Thus,the use of approximated pixel correspondences may producedistortion of object images in the rendered images.

The differences between the ray-tracing approach and theapproach of the Propagation Algorithm is shown in Fig. 4. Inthe ray-tracing approach, the flow of information is from thevirtual cameras back to the actual cameras whenever we needthe (intensity) information. On the contrary, in the PropagationAlgorithm, the flow of information is from the actual camerasto the virtual camera. That is we collect all the availableinformation from actual cameras and send them to the virtualcamera before we actually do any further processing.

Information Propagation

Occlusion Removal

Intensity Interpolation

Fig. 5. Three steps of Propagation Algorithm: Information Propagation,Occlusion Removal, and Intensity Interpolation.

B. Overview of Propagation Algorithm

The key idea of the Propagation Algorithm isproactivity.Using the depth information we can proactivelypropagate allavailable information from actual cameras to the virtual cam-era. Then we propose a simple occlusion removal techniqueto keep only points relevant to the rendering of the virtualimage. These steps are realized in the continuous domain at thevirtual image plane, which is also a key feature differentiatingthe Propagation Algorithm to existing IBR methods. Finally,using remaining points after the occlusion removal step, weinterpolate the intensity at once for all virtual pixels.

To illustrate the main idea, we use the simple2D light fieldsetting [10], [7], [6] as depicted in Fig. 7. In this setting, allthe cameras lie on the camera lined = 0, and all the imagelines are on the lined = f . When we say an “image” we meana 1D signal (e.g. scan-lines in2D images). The PropagationAlgorithm contains three steps as shown in Fig. 5, illustratedin Fig. 6, and detailed in the next subsections. The followingis the main idea:

1. Information Propagation. In this step we propagate allthe available information to the virtual camera image plane.This can be done using the depth information of pixels.

In the setting as in Fig. 7, the intensity information at pixelsp0 andq0 of the actual camerau0 is propagated to the pointsp andq of the virtual camerau, respectively. Fig. 6(a) showsa scene viewed from the virtual camera atu = 4. The circlesare propagated from the actual camera atu0 = 2, and the dotsare propagated from the actual camera atu1 = 6.

2. Occlusion Removal.In this step we remove all the pointsin whose neighborhood there is another point with noticeably

4

280 290 300 310 320 330 340

30

40

50

60

pixel

(a)

dept

h

280 290 300 310 320 330 340

30

40

50

60

pixel

(b)

dept

h

280 290 300 310 320 330 340

0.2

0.4

0.6

0.8

pixel

(c)

inte

nsity

Fig. 6. Illustration of Propagation Algorithm for2D light field example. (a): the scene’s depth viewed from virtual camera atu = 4 and propagatedpoints from actual cameras atu0 = 2 (circle) andu1 = 6 (dot). (b): removed points (x-marker) and remaining points (dot). (c): remaining points (dot) andlinearly-interpolated virtual pixels (circle) compared with the actual ones (line). Note that (c) has the intensity as vertical axis, instead of depth as (a) and (b).

smaller depth; these points are likely occluded at the virtualcamera.

For example in the Fig. 7 pointp is propagated from thepixel p0, but then it should be removed because it is occludedby the object OBJ. In Fig. 6(b), occluded points are shown inx-markers, and remaining points in dots.

3. Intensity Interpolation. In this step we interpolate theremaining information using an appropriate kernel functiondepending on the characteristics of the intensity function.

In Fig. 6(c) we show the remaining points in dots, theinterpolated intensity values (virtual image) in circles, and theactual image with a line.

There are several advantages of this three-step framework.First, such proactive use of available information allows us tochange the number of cameras during the rendering process(e.g., in a real-time system) without considerably modifyingthe algorithm. Second, by doing the intensity interpolationat the end, we do not calculate the intensity on a pixel by

pixel basis, but rather we get the whole virtual image at once.This approach reduces the computational complexity as wellas provides flexibility in the resolution of the virtual image.Third, our approach also allows us to obtain the intensityfor pixels near image borders, where ray-tracing techniquesfail because of the lack of pixel correspondences. Forth, weavoid image distortion at the virtual image as we consistentlyperform the rendering process in the continuous domain.Finally, in case of multiple virtual cameras, the InformationPropagation step can be done once for all virtual cameras,before we independently render each virtual camera. Thiscommon process helps speed up the rendering process, becausewe reduce the data access load at actual cameras.

C. Information propagation

The idea of the Information Propagation step is that wecollect all the available information to the virtual camerabefore actually processing the information. For each actual

5

f

uu0

d

p

qq0

p0

Q

P

OBJ

Fig. 7. A simple2D light field setting with occlusion. In this setting, an“image” is a line on the image focal line.

pixel, the depth information allows us to recover the pointin the 3D world coordinate. With this information we canexactly compute the image position of the3D point at thevirtual image plane (this may not be at pixel positions) andthe relative depth to the virtual camera. The intensity, depthand position relative to the virtual camera will be propagatedto the virtual cameraC without any further processing. InFig. 7, if point P was visible at the virtual camerau, then itsimage must have been at positionp with the same intensityas pixel p0. The corresponding depths atp0 and p are alsothe same. Note that the depth is defined as the distance of asurface point to the image plane, and not to the camera center(see Fig. 2). We say that the intensity and depth informationof pixel p0 of the actual cameraC0 arepropagatedto pointpof the virtual cameraC.

We show now how to recover a3D point X from its imagex at a (virtual or actual) cameraC and the correspondingdepth, and conversely, how to compute the depth of a3Dpoint X relative to a cameraC. Let us start from the projectionequation (1):

(1) ⇔(

D(x)x1

)=

(KR KT

0 1

)X

⇔ X =(

KR KT0 1

)−1 (D(x)x

1

)⇔ X =

(R−1K−1 −R−1T

0 1

) (D(x)x

1

)⇔ X = D(x)R−1K−1x−R−1T.

(2)Eq. (2) is well-defined becauseK andR are invertible. This

equation shows that the3D pointX can be explicitly recoveredfrom its imagex if the depthD(x) is known in addition to thecalibration informationK, R, andT . To compute the depth ofX relative to a cameraC, we once again derive from Eq. (1).The third coordinate of (1) gives:

D(x) = π3(ΠX), (3)

in which π3 is the operator returning the third coordinate of avector inR3: π3

((a, b, c)T

)= c.

Fig. 8 summarizes the Information Propagation step usingpseudocode. This step can be done for each actual cameraindependently. So our algorithm can work with an arbitrary

% Initialize the set of propagated pointsP = ∅.For each actual camera Ci

For each pixel x0 ∈ Ii

X = D(x0)R−1i K−1

i x0 −R−1i Ti;

x = ΠX;I(x) = I(x0);D(x) = π3(ΠX);% Update the set of propagated pointsP = P + {x};

EndforEndfor

Fig. 8. Pseudocode of the Information Propagation step. All actual pixelsare propagated from actual cameras{Ci}N

i=1to virtual cameraC using thedepth information. No further processing is needed at this point.

number of actual cameras. This number can be modifiedanytime during the running of the algorithm without modifyingthe algorithm. Note that at this point we still work in thecontinuous domain and do not pixelize the image position.

In practice, some depth error may result due to the pro-cessing of the depth image (for example using a lowpassfilter). Around the boundaries, this error is usually highcompared to the error due to the quantization process ordepth measurement. Thus, we may find some outliers after theInformation Propagation step. Those are points pushed very faraway from their actual positions. This can cause other pointsaround outliers with larger depth to be removed in the nextstep. To overcome this problem, we detect these situationsby, for example, comparing the intensity of points with theirneighbors. A high change of intensity value in all directionscan be a good sign of outliers, contrary to changes in somedirection at image edges.

D. Occlusion removal:

In Fig. 7 the pixelp0 is propagated to the pointp, though thesurface pointP is occluded by the object OBJ. This subsectionpresents how we can prevent occluded points likep from beingtaken into account in image rendering. The main idea of thisstep is based on the following observation. If a3D pointis occluded at the virtual camera, such asP in Fig. 7, thenthere should be another point, such asQ in Fig. 7, in theneighborhood ofP with a smaller depth.

In the Occlusion Removal step, we remove all the pointsin whoseε-distance neighborhood there exists at least anotherpoint with σ-smaller depth. Fig. 9 illustrates the main ideaof the Occlusion Removal step: when we consider pointA,we create a removal zone (the shaded zone) for which allother points falling in this zone will be removed. This processis performed for all points after the Information Propagationstep. Fig. 10 shows the pseudocode of this step.

The parametersσ andε can be tuned to be appropriate withthe characteristics of the scene and/or applications. A largerσtends to keep more points, especially at object surface wherethe depth changes fast, though this may be done at the riskof including points from hidden objects. A largerε removes

6

removal zone for point A

B A

C

σε

depth

virtual image plane Fig. 9. Illustration of the Occlusion Removal step. PointC is consideredoccluded by pointA, and therefore is removed. PointB is not removed bylooking at pointA alone.

% Remove occluded points fromP to getQQ = P;For each point x ∈ P

If ∃ y ∈ P : ||x− y|| ≤ ε and D(x)−D(y) ≥ σQ = Q− {x};

EndifEndfor

Fig. 10. Pseudocode of the Occlusion Removal step. We remove all thepoints in whoseε-distance neighborhood there is at least another point withσ-smaller depth; these points are likely occluded at the virtual camera. In ourexperiments, we useε = 0.6 andσ = 0.05D(y).

more points, and keeps only points with high confidence to bevisible, though the price is that the image quality, especiallyaround boundaries, may be reduced.

E. Intensity interpolation

The Intensity Interpolation step is to render the virtual imageusing the remaining pointsQ after the Occlusion Removalstep, that is when all the information relevant to the renderingof the virtual image is at our disposal. As we assume that theintensity function is smooth enough, we can interpolate theintensity function using the intensity information of points inQ. The virtual image is then simply the value of this functionat the pixel positions (the pixelization process). Fig. 6(c)illustrates the main idea of the Intensity Interpolation step,and Fig. 11 shows the pseudocode of the step.

One issue of the Intensity Interpolation step is the choiceof the kernel function. In the case of full depth, the remainingpointsQ are usually very dense at the virtual camera imageplane. In practice, a simple tool like linear interpolation canprovide good rendering quality with a small computationalcost, as we use in our experiments.

There is a notable difference between1D nonuniforminterpolation and2D nonuniform interpolation. In1D in-terpolation, the presence of a total order inR allows us tointerpolate the function values using relevant samples, forexample to linearly interpolate consecutive samples in linear

% Interpolate the intensity functionIinterpolate = Interpolate (Q, {I(x) : x ∈ Q}, linear);% PixelizeIinterpolate to get the virtual imageFor each x ∈ I

I(x) = Iinterpolate(x);Endfor

Fig. 11. Pseudocode of the Intensity Interpolation step. We interpolate theintensity function using remaining points after the Occlusion Removal step.The virtual image is then obtained by a pixelization of the intensity function.

interpolation. In the2D case, there is no total order inR2. Weneed first to partition the planeR2 using some triangulationbefore performing the interpolation. For example, in the caseof 2D linear interpolation, one usually uses the Delaunaytriangulation to partition the plane, then for each triangle ofthe triangulation, we interpolate the function values linearlyusing the values at three triangle vertices.

F. Experimental results

In our experiments, we use the stereo data acquired byScharstein et al. [18]2. In this data, the disparity maps aregiven instead of the depth, though one can be obtained easilyfrom the other.

The configuration used in the experiments is purely trans-lational. We have two actual cameras atu0 = 2 andu1 = 6,and each one associates with an intensity image and a disparitymap. The actual intensity image atu = 4 is also given, and isused as ground truth to evaluate the rendered image of a virtualcamera atu = 4. Because the images are color, we render thevirtual image for each color channel (RGB) separately.

Fig. 13 demonstrates our result forTeddy images usingparameter valuesε = 0.6 andσ = 0.05D(y) in the OcclusionRemoval step, and linear interpolation in the Intensity Interpo-lation step. We can see that the proposed algorithm produces avery good result, even around boundaries where most existingIBR techniques fail.

IV. PROPAGATION ALGORITHM WITH PARTIAL DEPTH

INFORMATION

The Full Depth Propagation Algorithm relies on the avail-ability of depth information. The more depth information wehave, the more dense the propagated points at the virtualimage plane are, and so the better rendering quality we canexpect. However, in practice, we do not always have thefull depth information at our disposal. The current rangecamera technology has a lower resolution than the intensitycamera technology [9], [16]. If the depth is obtained bycorrespondence techniques [19], we only have the depth atpixels at easy-to-handle features (such as edges and corners).Thus, an algorithm for IBR using partial depth information isimportant to investigate.

While the feature correspondence problem remains as oneof the main bottlenecks in the field of computer vision,the range camera technology is an emerging alternative for

2The data is available at http://cat.middlebury.edu/stereo/newdata.html.

7

(a) Intensity image atu0 = 2. (b) Intensity image atu1 = 6.

(c) Depth image atu0 = 2. (d) Depth image atu1 = 6.

Fig. 12. Input images of Full Depth Propagation Algorithm. (a): intensity image atu0 = 2, (b): intensity image atu1 = 6, (c): depth image atu0 = 2,and (d): depth image atu1 = 6.

(a) Ground truth image atu = 4. (b) Rendered image atu = 4.

Fig. 13. Rendered image of the Full Depth Propagation Algorithm. (a): the ground truth image taken atu = 4, and (b): the rendered image atu = 4.

8

Fig. 14. The scene atu0 = 2 and depth available pixels (highlighted). Thedepth is available on a regular grid (dots). In our experiments, the grid is6intensity pixel distance in both directions (about3% of number of intensitypixels).

stereo applications. It provides more accurate and detailedinformation of the scene geometry. Hence, we decide to focuson the setting where the depth information is obtained by rangecameras in this paper, and leave the case of partial depth IBRwith nonuniform depth samples for future research. That is, inthis section, we consider the case where the depth informationis available at a uniform grid of pixels, or a downsampledversion of the full depth used in section III.

Fig. 14 highlights the pixels with depth available. In ourexperiments, the depth is available in a grid ofk = 6 intensitypixel distance in both vertical and horizontal directions. In thissetting, the amount of depth information is about3% of thetotal number of pixels.

A. Preprocessing or postprocessing

As the Full Depth Propagation Algorithm works well asshown in Section III, there are naturally two approaches to fur-ther continue in the direction of solving the IBR problem usingpartial depth information: preprocessing and postprocessing. Inthe preprocessing approach, we interpolate the depth functionfirst, then provide it to the Information Propagation step ofthe Full Depth Propagation Algorithm. In the postprocessingapproach, we just propagate the depth-available pixels, removeoccluded points, then interpolate the intensity function.

Among these two approaches, we adopt the preprocessingone for several reasons. First, the depth is much more regularthan the intensity to handle (see Fig. 15), since the objectsurfaces of the3D scene can be approximated by an unionof planar pieces. Thus, simple techniques such as bilinearinterpolation and nearest neighbor interpolation work well tointerpolate the depth. Furthermore, the number of disconti-nuities is usually smaller in depth images than in intensityimages. The second reason is that this approach makes useof intensity information at depth-unavailable pixels, while thepostprocessing approach simply discards this information.

Fig. 15. The depth image atu0 = 2. The depth is more regular then intensityto interpolate. Simple techniques such as nearest neighbor interpolation andbilinear interpolation work well to interpolate the depth.

B. Nearest neighbor interpolation and bilinear interpolationof the depth image

To interpolate the depth images from the samples availableat an uniform grid, there are two simple ways to approximatethe depth in2D: nearest neighbor interpolationand bilinearinterpolation. Nearest neighbor interpolation uses the depthvalue of the nearest depth-available point to interpolate thedepth function. This technique is similar to the piecewise con-stant interpolation in one dimension. It approximates decentlythe regions around the object boundaries.

Bilinear interpolation, for each sample position, linearlyinterpolates the value of the depth function at four samples ofthe square containing the sample in question. It is equivalent toa concatenation of two linear interpolations in both vertical andhorizontal directions. This is similar to the linear interpolationin classical 1D interpolation. It works best on the objectsurfaces, which can be approximated as union of planar pieceson the3D scene.

Fig. 16 shows the rendered images with the depth isinterpolated using nearest neighbor interpolation and bilinearinterpolation in the preprocessing step. Although the renderingquality is acceptable on object surfaces, we can notice artifactsaround the object boundaries. Nearest neighbor interpolationproduces blocky edges while bilinear interpolation results inblurred edges.

C. Segment-wise depth interpolation

Though both techniques sound simplistic, they can producedecent results (see Fig. 16). Nevertheless, the rendering qualityaround the object boundaries is still limited. Nearest neighborinterpolation and bilinear interpolation do not make full useof the knowledge of the intensity images. There is a similaritybetween the intensity value of neighboring pixels in imagesthat is not efficiently exploited.

In order to improve the rendering quality, in this subsectionwe propose a new technique, calledsegment-wise depth inter-polation, to incorporate both intensity and depth information

9

(a) Nearest neighbor interpolation of depth. (b) Bilinear interpolation of depth.

Fig. 16. Rendered images using simple techniques to interpolate the depth. (a): the rendered image using nearest neighbor interpolation of depth, and (b): therendered image using bilinear interpolation of depth. The distortion of the rendered image around the boundaries is still noticeable (blocky edges for nearestneighbor interpolation and blurred edges for bilinear interpolation). See Fig. 20 for the rendered image of the proposed Partial Depth Propagation Algorithm.

Only intensity available Depth is also available

Segmentation boundaries

Unit square S

Pixel x to interpolate the depth

Fig. 17. Pixel positions at an actual camera image plane. Points are intensitypixels, circles are pixels with depth information in addition to the intensityinformation. The downsample rate in both directions in this example isk = 6as in our experiments. The boxed square is an example illustrating how wedivide the image plane into squares to interpolate the depth in the algorithm.

available to approximate the depth. The proposed techniqueuses an image segmentation technique to segment the imageinto different regions. The idea is that the depth image willbe interpolated using only depth-available pixels in the sameobject surface.

To begin, we use an image segmentation technique to obtaina segmentation of the depth images. We will interpolate thedepth square by square, each square is an unit square of thedepth image, as illustrated in Fig. 17. For each unit squareS, if all of its four vertices belong to a same depth segment,we use bilinear interpolation technique to interpolate the depthof all other pixels insideS. Otherwise,S lies on boundaries

of different depth segments (so likely on different objectsurfaces). For each particular pixelx insideS, we determinewhich vertices ofS belong to the same intensity segmentwith x. The nearest neighbor interpolation technique is usedto approximate the depth ofx using these vertices. Fig. 18shows the pseudocode of the proposed segment-wise depthinterpolation.

In practice, some pixels belonging to tiny intensity segments(usually segments of texture on object surfaces) may be foundto not sharing segments with any vertex ofS. These pixelsare marked and filled with the depth value of their neighborsat the end, using for example morphological imaging tech-niques [23]. Some pixels that lie in the interval between twodepth-available pixels can be classified to either unit squarethey belong to.

Note that approximating the depth is much less complexthan trying to reconstruct the scene geometry, because weonly deal with2D intensity and depth images, instead of3Dobjects. Fig. 19 shows the reconstructed depth of the proposedtechnique for the case where depth images are downsampledversions of the intensity images with ratek = 6 for bothdirections. That is we need only one depth pixel everyk2 = 36intensity pixels, or equivalently, we need the depth informationat less than3% of the number of intensity pixels.

D. Experimental results

In our experiments, we use the image segmentation tech-nique called Mean Shift proposed by Comeniciu et. al. [4],and its software EDISON3.

Fig. 19 shows the reconstructed depth images using thesegment-wise interpolation technique. These depth imagesare then provided to the Information Propagation step ofthe Full Depth Propagation Algorithm to render the virtual

3The software is available at http://www.caip.rutgers.edu/riul/research/-code/EDISON/.

10

% Segmentation of intensity and depth imagessegmi = Segmentation (I(x) : x ∈ I);segmd = Segmentation (D(x) : x ∈ D);

For each unit square S of depth image D(x)If all the vertices of S belong to the same segment of segmd

V = Set of vertices of S;D(x)|S = Interpolate (V, {D(x) : x ∈ V}, bilinear);

ElseV = Set of vertices of S sharing the same segmi segment with x;D(x) = Interpolate (V, D(x)|V , nearest);

EndifEndfor

Fig. 18. Pseudocode of the Segment-wise interpolation of depth. The depth is interpolated for each unit square of the depth image. Linear interpolation isused on object surfaces (segments of depth image). Nearest neighbor interpolation is used around the object boundaries.

(a) Interpolated depth atu0 = 2. (b) Interpolated depth atu1 = 6.

Fig. 19. The reconstructed depth using segment-wise depth interpolation technique. (a): interpolated depth atu0 = 2, and (b): interpolated depth atu1 = 6.

image. Fig. 20 shows the rendered image of the PartialDepth Propagation Algorithm using about3% of depth (k =6), together with the rendered image using the Full DepthPropagation Algorithm and the ground truth image. We notethat the Partial Depth Propagation Algorithm does improvethe rendering quality around object boundaries compared tothe nearest neighbor depth interpolation and bilinear depthinterpolation. Its rendering quality is even comparable to therendering quality of the Full Depth Propagation Algorithm.

V. CONCLUSION AND DICUSSION

In this paper we present a new approach, called the Propaga-tion Algorithm, for the IBR problem. The algorithm includesthree steps: Information Propagation, Occlusion Removal andIntensity Interpolation. By assuming the availability of thedepth information at all or some image pixels, the algorithmproduces an excellent improvement of rendering quality, espe-cially around the object boundaries. Moreover, the approachprovides a framework for a rigorous and systematic investiga-tion of the rendering quality by turning the IBR problem from

a 7D interpolation problem of the plenoptic function into aclassical2D interpolation problem.

The proposed Propagation Algorithm turns the IBR prob-lem into a classical nonuniform sampling and interpolationproblem at the virtual image plane. Moreover, the dimensionof the problem is considerably reduced to2D (from 7Dof the general plenoptic function [1] or4D in light fieldsetting [10]). In some practical cases, such as that of purelytranslational configuration, we can even reduce the dimensionof the problem to1D interpolation by rendering the virtualimages by scan-lines.

The approach also opens new room to bring more signalprocessing flavor into the IBR problem. An error of intensityresults in the sample error of the intensity function, and anerror of the depth results in a jiter at the sample positions.Thus, we can analyze the rendering error in a more systematicand rigorous way.

Finally, as we already discussed in Section IV, the casewhere the depth is available at a nonuniform set is left forfuture research. This is the case where depth is recovered

11

from stereo reconstruction techniques using pixel correspon-dences [12], [5], [8].

REFERENCES

[1] E. H. Adelson and J. R. Bergen, “The plenoptic function and theelements of early vision,” inComputational Models of Visual Processing,M. Landy and J. A. Movshon, Eds. MIT Press, 1991, pp. 3–20.

[2] C.-F. Chang, G. Bishop, and A. Lastra, “LDI tree: A hierarchicalrepresentation for image-based rendering,” inProc. of SIGGRAPH,1999, pp. 291–298.

[3] S. E. Chen and L. Williams, “View interpolation for image synthesis,”in Proc. of SIGGRAPH, 1995, pp. 29–38.

[4] D. Comaniciu and P. Meer, “Mean shift: A robust approach towardfeature space analysis,”IEEE Trans. Patt. Recog. and Mach. Intell., pp.603–619, May 2002.

[5] O. D. Faugeras,Three-dimensional computer vision: A geometric view-point. Cambridge, MA: MIT Press, 1993.

[6] Z.-F. Gan, S.-C. Chan, K.-T. Ng, K.-L. Chan, and H.-Y. Shum, “On therendering and post-processing of simplified dynamic light fields withdepth,” in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc.,2004.

[7] S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, “The lumigraph,”in Proc. of SIGGRAPH, 1996, pp. 43–54.

[8] R. I. Hartley and A. Zisserman,Multiple View Geometry in ComputerVision, 2nd ed. Cambridge University Press, ISBN: 0521540518, 2004.

[9] M. Hebert, “Active and passive range sensing for robotics,” inProc. ofthe IEEE Int. Conf. on Robotics and Automation, 2000, pp. 102–110.

[10] M. Levoy and P. Hanrahan, “Light field rendering,” inProc. of SIG-GRAPH, 1996, pp. 31–40.

[11] M. Lhuillier and L. Quan, “Image-based rendering by joint viewtringulation,” IEEE Trans. Circ. and Syst., vol. 13, no. 11, pp. 1051–1062, November 2003.

[12] Y. Ma, J. Kosecka, S. Soatto, and S. Sastry,An Invitation to 3-D Vision.Springer, Nov. 2003.

[13] W. Mark, L. McMillan, and G. Bishop, “Post-rendering 3d warping,” inProc. I3D Graphics Symp., 1997, pp. 7–16.

[14] L. McMillan, “An image-based approach to three-dimensional computergraphics,” Ph.D. dissertation, University of North Carolina at ChapelHill, 1997.

[15] H. T. Nguyen and M. N. Do, “Image-based rendering with depthinformation using the propagation algorithm,” inProc. IEEE Int. Conf.Acoust., Speech, and Signal Proc., Philadelphia, 2005.

[16] V. Popescu, E. Sacks, and G. Bahmutov, “Interactive point-based mod-eling from dense color and sparse depth,” inEurographics Symposiumon Point-Based Graphics, 2004.

[17] C. B. S. Burak Gokturk, Hakan Yalcin, “A time-of-flight depth sensor- system description, issues and solutions,”Conference on ComputerVision and Pattern Recognition Workshop, vol. 3, pp. 35–43, 2004.

[18] D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps usingstructured light,” IEEE Computer Society Conference on ComputerVision and Pattern Recognition, pp. 195–202, June 2003.

[19] D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluationof dense two-frame stereo correspondence algorithms,”IEEE Workshopon Stereo and Multi-Baseline Vision, pp. 131–140, December 2001.

[20] S. M. Seitz and C. M. Dyer, “View morphing,” inProc. of SIGGRAPH,1996, pp. 21–30.

[21] J. Shade, S. Gortler, L. He, and R. Szeliski, “Layered depth images,” inProc. of SIGGRAPH, 1998.

[22] H.-Y. Shum, S. B. Kang, and S.-C. Chan, “Survey of image-basedrepresentations and compression techniques,”IEEE Trans. Circ. andSyst. for Video Tech., vol. 13, pp. 1020–1037, Nov. 2003.

[23] P. Soille,Morphological Image Analysis: Principles and Applications.Springer-Verlag, 1999.

[24] C. Zhang and T. Chen, “A self-reconfigurable camera array,” inEuro-graphics Symposium on Rendering., 2004.

[25] ——, “A survey on image-based rendering - representation, samplingand compression,”EURASIP Signal Processing: Image Communication,pp. 1–28, Jan. 2004.

12

(a) Ground truthTeddyimage. (b) Rendered image using full depth.

(c) Rendered image using depth at about3% of pixels.

Fig. 20. Virtual images atu = 4 rendered by Propagation Algorithm. (a): ground truthTeddy image atu = 4, (b): rendered image using Full DepthPropagation Algorithm, and (c): rendered image using about3% of depth rendered by Partial Depth Propagation Algorithm.

(a) Ground truthConesimage. (b) Rendered image using full depth. (c) Rendered image using depth at about3% ofpixels.

Fig. 21. Virtual images atu = 4 rendered by Propagation Algorithm. (a): ground truthConesimage atu = 4, (b) rendered image using Full DepthPropagation Algorithm, and (c) rendered image using about3% of depth rendered by Partial Depth Propagation Algorithm.

propagation algorithm for image-based rendering using full or

Documents