optimized stereo reconstruction using 3d graphics hardware

8/4/2019 Optimized Stereo Reconstruction Using 3D Graphics Hardware

1/49

Optimized Stereo Reconstruction

Using 3D Graphics Hardware


2/49

Abstract

In this paper we describe significantenhancements of our recently developed

stereo reconstruction software exploiting

features of commodity 3D graphics hardware


3/49

The aim of our research is the acceleration of

Dense reconstructions from stereo images

The implementation of our hardware basedmatching procedure avoids loss of accuracy

due to the limited precision of calculations

performed in 3D graphics processors


4/49

In this work we target on improving the time

to obtain reconstructions for images with small

resolution, and we covered other enhancements

to increase the speed for highly detailed

reconstructions.

Our optimized implementation is able to generate

up to 130 000 depth values per second on standard

PC hardware


5/49

1 Introduction

The automatic creation of digital 3D models

from images of real objects comprise a main

step in the generation of virtual environments

and is still a challenging research topic


6/49

Many stereo reconstruction algorithms toobtain models from images were

proposed,and almost all approaches are

based on traditional computations

performed in the main CPU


7/49

Very recently few methods were proposed,that exploit the parallelism and special-

purpose circuits found in current 3D graphics

hardware to accelerate the reconstruction

process


8/49

Although the method is still significantlyslower than the planesweeping approach

proposed by Yang et al. [19] (using a five

camera setup), our approach provides

goodresults with a calibrated stereo setup.


9/49

2 Related Work

2.1 Reconstruction from Stereo Images Stereomatching is the process offinding correspond-

ing or homologous points in two or more

images and is an essential task in computer

vision.


10/49

We distinguish between feature and areabased methods.

Feature based methods incline to sparse

but accurate results, whereas area based

methods deliver dense point clouds.


11/49

Feature based methods

The former methods use local features thatare extracted from intensity or color

distribution. These extracted feature vectors

are utilized to solve the correspondence

problem


12/49

Area based methods

Area based methods employ a similarityconstraint, where corresponding points are

assumed to have similar intensity or color

within a local window.


13/49

The Triclops vision system

consists of a hardware setup with three

cameras and appropriate software for

realtime stereo matching. The system is able

to generate depth images at a rate of about

20Hz for images up to 320x240 pixels on

current PC hardware


14/49

The Triclops vision system(cont..)

The software exploits the particularorientation of the cameras and MMX/SSE

instructions available on current CPUs.In

contrast to the Triclops system our approachcan handle images from cameras with

arbitrary relative orientation


15/49

Yang et al. [19] developed a fast stereoreconstruction method performed in 3D

hardware by utilizing a plane sweep approach

to find correct depth values.The number ofiterations is linear in the requested resolution

of depth values, therefore this method

islimited to rather coarse depth estimation


16/49

2.2 Accelerated Computations in

Graphics Hardware

Even before programmable graphics hardwarewas available, the fixed function pipeline of 3D

graphics processors was utilized to accelerate

numerical calculations [5, 6] and even toemulate programmable shading


17/49

Recently several authors identified currentgraph-ics hardware as a kind of auxiliaryprocessing unitto perform SIMD (single

instruction, multiple data)operationsefficiently. Thompson et al. [17] implementedseveral non-graphical algorithms to run onprogrammable graphics hardware and profiled

the execution times against CPU basedimplementa-tions.


18/49

3. Overview of Our Matching

Procedure

The original implementation of our 3Dgraphics hard-ware based matching procedure

is described in [20].The matching algorithm

essentially warps the left and the right imageonto the current triangular 3D

meshhypothesis according to the relative

orientation be-tween the images


19/49

The projection of the mesh into the left imageconstitutes always a regular grid (i.e. the left

image is never distorted). The warped images

are subtracted and the sum of absolutediffenences is calculated in a window around

the mesh vertices


20/49

The main steps of the algorithm, namelyimage warping and differencing, sum of

absolute differences calculation, and finding

the optimal depth value are performed on thegraphics processing unit. The main CPU

executes the control flow and updates the

current mesh hypothesis


21/49

In order to minimize transfer between hostand graphics memory most changes applied to

the current mesh geometry are only

appliedvirtually without requiring real updateof 3D vertices.For details we refer to our

earlier work


22/49

The control flow of the matching

algorithm

1 .The outermost loop adds the hierarchical

approach to our method and reduces the

probability of local minima e.g. imposed by

repetitivestructures. In every iteration themesh and image resolutions are doubled. The

refined mesh is obtained by linear (and

optionally median) filtering of the coarser one


23/49


algorithm

The inner loop chooses a set of independentvertices to be modified and updates the depth

val-ues of these vertices after performing the

inner-most loop. In order to avoid influence ofdepthchanges of neighboring vertices, only

one quarterof the vertices can be modified

simultaneously


24/49


algorithm

The innermost loop evaluates depth variations

for candidate vertices selected in the enclosing

loop. The best depth value is determined by re-peated image warping and error calculation wrt.

the tested depth hypothesis. The body of this

loop runs entirely on the 3D graphics processingunit (GPU).


25/49

4 Amortized Difference Image

Generation

For larger image resolutions (e.g. 1024 1024)rendering of the corresponding meshgenerated by the sampling points takes a

considerable amount of time.In the1megapixel case the mesh consists ofapproximately 131 000 triangles, which mustbe rendered for

every depth value (several hundred times intotal)


26/49

Especially on mobile graphic boards mesh

processing implies a severe performance penalty:

stereo matching of two 256256 images has

comparable speed ona desktop GPU and on a usually slower mobile

GPU

intended for laptop PCs, but matching 1-megapixelimages requires two times longer on the mobile

GPU


27/49

In order to reduce the number of meshdrawings up to four depth values are

evaluated in one pass.We use multitexturing

facilities to generate four texture coordinatesfor different depth values within the vertex

program


28/49

The fragment shader calculates the absolutedifferences for these deformations simultane-

ously and stores the results in the four color

channels(red, green, blue and alpha)


29/49

Note that the mesh hypothesis is updatedinfrequently and the actually evaluated mesh

is generated within the vertex shader by

deforming the incoming vertices according tothe current displacement


30/49

5. Chunked Image Transforms

In contrast to Yang and Pollefeys [18] wecalculate the error within a window explicitly

using multiple passes


31/49

In every pass four adjacent pixels areaccumulated and the result is written to a

temporary off-screen frame buffer (usually

called pixel buffer or P-buffer for short)


32/49

It is possible to set pixel buffers as destinationfor rendering operations (write access)or to

bind a pixel buffer as a texture (read

access),but a combined read and write accessis not available.In the default setting the

window size is 88, therefore 3 passes are

required


33/49

Note that we use specific encoding ofsummed values to avoid overflow due to the

limited accuracy of one color channel. We

refer to [20] for detail


34/49

Executing this multipass pipeline to obtain the

sum of absolute differences within a window

requires several P-buffer activations to select

the correct target buffer for writing. These

switches turned out to be relatively expensive

(about 0.15ms per switch


35/49

Incombination with the large number ofswitches the total time spent within these

operations comprise a significant fraction of

the overall matching time (about50% for256256 images). If the number of these

operations can be optimized, one can expect

substantial increase in performance of thematching procedure


36/49

Instead of directly executing the pipeline inthe innermost loop (requiring 5 P-buffer

switches) we reorganize the loops to

accumulate several intermediate results inone larger buffer with temporary results

arranged in tiles (see Figure 2)


37/49


38/49

Therefore P-buffer switches are amortizedover several iterations of the innermost loop.

This flexibility in the control flow is completely

transparent and needs not to be codedexplicitly within the software. Those stages in

the pipeline waiting for the input buffer to

become ready are skipped automatically


39/49

6. Minimum Determination Us-

ing the Depth Test

Thc hin ban u ca chng ti, chng tis dng mt mnhchng trnh cp nht cc li ti thiu

v ti uchiu su gi tr tm thy cho n nay.Cch tip cn ny hot ng trn cc thin thoi di ng ha, nhng n l

kh chm do kch hot b m cnthit P-(k t khi vic tnh ton ti thiu cth khng c thc hin ti ch)


40/49

Z-buffer tests allow conditional updates of theframe buffer in-place, but only very currentgraphics cards support user-defined

assignment of z-values within the fragmentshader

(e.g. by using the ARB_FRAGMENT_PROGRAMOpenGL extension). Older hardware always

interpolates z- values from the given geometry(vertices)


41/49

Using the depthtest provided by 3D graphicshardware, the given index of the currently

evaluated depth variation and the

corresponding sum of absolute differences iswritten into the destination buffer, if the

incoming erroris smaller than the minimum

already stored in the buffer


42/49

Therefore a point-wise optimum for evaluateddepth values for mesh vertices can be

computed easily and efficiently


43/49

7. Disparity Estimation

A rather straightforward extension of ourepipolarmatching procedure is disparity

estimation for two uncalibatrated images

without known relative orientation. In orderto address this setting the 3D meshhypothesis

(i.e. a 2D to 1D (depth) mapping) is replaced

by a 2D to 2D (texture coordinates)mapping,but the basic algorithm remains the

same.


44/49

Instead of an 1D search along the epipolar linethe procedure performs a 2D search for

appropriate disparity values


45/49

In this setting the rather complex vertexshader to generate texture coordinates for the

right image used for epipolar matching can be

omitted. In every iteration the current 2Ddisparity map hypothesis is modified in

vertical and horizontal direction and evaluated

(Figure 4).


46/49

With this approach it is possible to findcorresponding regions in typical small base-

line settings (i.e. only small change in position

and rotation between the two images)


47/49

8. Results

We run timing experiments on a desktop PCwith an Athlon XP 2700 and an ATI Radeon

9700 and on a laptop PC with a mobile Athlon

XP 2200 and an ATI Radeon 9000 Mobility.Table 1 depicts timingresults for those two

hardware setups and illustratesthe effect of

the various optimizations mentioned intheprevious sections


48/49

The timings were obtainedfor epipolar stereomatching of the dataset shownin Figure 5. The

total number of evaluated meshhypothesis is

224 (256x256), 280 (512x512) and336(1024x1024). In the highest resolution

(1024x1024)each vertex is actually tested with

84 depth valuesout of a range ofapproximately 600 possible values


49/49

Because of limitations in graphics hardwarewe arecurrently restricted to images with

power of two dimensions.

optimized stereo reconstruction using 3d graphics hardware

Documents