cs229 synthesizing images with out-of-plane...

CS229 • December 2013

Synthesizing images with out-of-planetransformations using stereo images

Ruitang Chen, Zhuodong He, Dai Shen

Stanford University

Abstract

Using a pair of stereo cameras, we can acquire the depth information, and thus synthesize imageswith 3D transformations. However, the resulting images often look uncanny because of data noises.In this project, we attempt to repair the generated stereo images. One way to address this problemis via unsupervised feature learning that could be achieved using linear autoencoders, also known asReconstruction Independent Component Analysis (RICA). This algorithm has been shown to outperformother algorithms by penalizing lack of diversity among features. We improve the algorithm by addingan extension layer, and compare its performance with another image inpainting technique based on fastmarching method. Our research shows that the inpainting method holds good promise.

I. Introduction

In computer vision, the process of obtainingdata has often been time-consuming and ex-pensive. Moreover, research outcome often

heavily depends on both the quality and quan-tity of the data used. Consequently, novel waysof data generation has become an increasinglyhot research topic. However, generated dataoften contains corruptions, which is inevitablein the simulation of reality from data at hand.Many of the corruptions fall into the categoryof inpainting problem. Inpainting problemsoccur when pixel values are missing,or imagescontain undesirable noises which hinder theuse of the these data in other applications. Thisproject aims to address this issue through De-noising Autoencoder (DA) specifically in thecontext of stereo view transformations.

One of the biggest motivations in choosingDA is that it employs deep learning that cancapture complicated nonlinear structures hid-den in the images. DA is a neural network thataims to reconstruct the original image froma noisy version of it through finding out thehidden structure in the image. In our case, thetraining data are obtained from two stereo cam-eras mounted on the top of a car. Images fromthe left camera are transformed and compared

to those of the right. Some pixel values willbe lost due to data inaccuracies (loss of depthinformation or occlusion), which will producenoises. DA is the used to restore the lost pixelsand reconstruct the original images.

The specific DA algorithm we consider isReconstruction Independent Component Anal-ysis (RICA), which will be explained in detailin the next section. We also propose a non-linear extension of RICA which improves theperformance. In the end, we implement aninpainting Telea algorithm which outperformsRICA in the denoising process.

II. Methods

I. 3D transformation

The data set is obtained from Kitti[3]. Im-ages are taken from cameras mounted ontwo extreme sides of a car. Images are recti-fied in advance. Depth maps are generatedusing Semi-Global Block Matching (SGBM)algorithm[4] which uses pixel-wise cost cal-culation. Disparity is the difference betweenintensity of two pixels at the same location oftwo images and is inversely proportional todepth, which is derived by the formula:

1


depth = T ∗ f /disparity(x, y), (1)

where T is the distance of two cameras, andf is focal length of camera. Hence we are ableto get the real world depth data and map thepixel on the image to 3D location according tothe formula:x

y1

=

fx 0 Cu0 fy Cv0 0 1

X/ZY/Z

1

, (2)

where fx, fy are focal lengths, Cu,Cv are coordi-nates of the principal point of the camera, and(X, Y, Z) is the real-world coordinate.

Figure 1: A disparity map.

New image at a different angle can be ob-tained through imposing the following rotationmatrix such as the one along z-axis,cos θ − sin θ 0

sin θ cos θ 00 0 1

. (3)

In the original real world coordinate. Wecan make line transformation, and use thesame method to project real world image datato 2D plane.

Figure 2: a left image that has been transformed to matchthat of the right. Many pixels lost their valuesduring the trasnformation.

However, in this process some pixel valueswill lose due to inaccurate calculation of depthinformation. Hence we are trying to repair

these black holes of images. We then proposethree methods, RICA, deep RICA and inpaint-ing.

II. RICA

Model In this section, we introduce thebasic models of RICA which serve as the foun-dation in our denoising method. Compared totraditional autoencoder, RICA is able to traindata with overcomplete features (number offeatures » dimentionality of data) and is insen-sitive to whitening (a preprosessing step thatdecorrelate the data). Therefore, this algorithmhas been shown to outpeform many other au-toencoder algorithms.Suppose we have unlabeled training exam-ple set x(1), x(2)..., in our case are vec-tors transformed by corrupted-image ma-trix, y(1), y(2)... are the original right im-age. We modified the problem to be trainingx(1), x(2)... to find an encoding matrix WTWto restore x(i) to y(i).

Hence, RICA produce the following uncon-strained optimization problem:

minW

[λ

m

m

∑i=1‖WTWx(i)− y(i)‖2

2 +m

∑i=1

n

∑j=1

g(Wjx(i))],

where g is a nonlinear convex function, inour case is the sparsity cost.

g(Wjx(i)) = λ ∗ (ε + Wjx(i)))(12 )

W ∈ Rk×n is the weight matrix, where k isnumber of features. W is the encoding matrix,hence the encoding step is Wxi. The activationfunction of RICA is proposed to be a linearfunction of Wx(i). Next we are going to pro-pose a deep version of RICA, which use anextra layer of f (Wx(i)) For convenience, wedefine that

H(W) =m

∑i=1‖WTWx(i) − y(i)‖2

2,

G(W) =m

∑i=1

n

∑j=1

g(Wjx(i)),

2


J(W) =λ

mH(W) + G(W).

Process of RICA

1. Initialize parameters (λ, ε, number of hid-den units).

2. Use backpropagation algorithm to calcu-late a(2), a(3), δ(1), δ2, δ(3).a(l) to denote the activation of layer l

3. Calculate5W H(W),5W G(W),5W J(W).

4. Use the unconstrained optimizer L-BFGSto find W∗ = argmin

WJ(W).

5. encode corrupted image by imposingWx(i).

6. Calculate squared error ‖Wx(i) − y(i)‖22

to validate accuracy.

Data processing Using Kitti dataset, weare able to obtain a disparity map for eachset of images. Then we artificially generateright image based on the camera calibrationdata we get. Due to limit of computer memoryspace, W cannot be high dimension. We dividethe picture into 32 pixel × 32 pixel patchesas training sets. Then we reshape image ma-trix into vector x(1), x(2)..., the object vectorsy(1), y(2)... are original right images obtainedfrom Kitti data set. In this project, we run RICAon 25 pictures of 300 patches.

III. Deep RICA

Due to the linearity of RICA, data underfittingmay occur. In this sense, we propose to add anadditional hidden layer to the neural network.For inputs x(i), We preprocess it and calculatethe new input as sigmoid(Ux(i)). And we alsouse backpropagation on the additional layer.We train the data on the same data set as RICA.

Effect of parameter λ:λ plays an important in the calculation of cost.Very small values of λ provide inadequateweight to the error term of H(W). For our

study, we chose λ to be 0.05.

Effect of number of hidden units (features):A low number of hidden units make poorpredictions. Whereas high number of hiddenunits may lead to overfitting. We run ouralgorithm at k = 400.

IV. Inpainting

Here we employ telea algorithm for inpainting.To inpaint a region Ω, we select a boundarypoint p and take a small neighborhood Bε(p)of size ε of the known image around p . Theinpainting of p should be determined by thevalues of the known image points close to p,i.e., in Bε(p). For ε small enough, we considera first order approximation Iq(p) of the imagein point p, given the image I(q) and gradient5I(q) values of point q.

The inpainting method is described by:

Iq(p) = I(q) +5I(q)(p− q) (4)

I(p) =∑q∈Bε(p) ω(p, q)[I(q) +5I(q)(p− q)]

∑q∈Bε(p) ω(p, q)(5)

The above equation explains how to inpaint apoint on the boundary of the to-be-inpaintedregion as a function of known image pixels.To inpaint the whole Ω, we iteratively applyEquation(4) to all the discrete pixels of ∂Ω andadvance the boundary inside Ω until the wholeregion has been painted. Inpainting points inincreasing distance order from ∂Ω ensures thatimages closest to known image are filled infirst, thus mimicking manual inpainting tech-niques.To implement this, we use a fast matchingmethod which solves Eikonal equation:‖ 5 T‖ = 1 on Ω, with T = 0 on ∂Ω. Thesoluion T is the distance map of Ω pixels tothe boundary.

Data processing In the noisy images we syn-thesized, we define the to-be-inpainted re-gion,i.e. the mask as all black pixels. andthen we train each image with telea algorithm.

3


III. Results

The following are some of the images after ro-tation transformations. They show that morethan 45 degree transformation would be accom-panied by too many distortions and occlusioneffects to be of practical usage.

Here a is the image rotated upwards by 10

repaired by telea, b is the image rotated down-wards by 10 repaired by telea, c is the imagerotated to the right by 30 repaired by telea, dis the image rotated to the right by 60 repairedby telea, e is the image repaired by linear RICA,f is the image repaired by nonlinear RICA. Im-ages repaired by deep RICA has smaller squareerror than linear RICA. We are able to gener-ate rather clear image by Telea till 30. Imagerotated 60 is comparatively distorted.

We validate the error rate of repaired im-ages by calculating its mean square error by

formulam∑

i=1‖x(i)− y(i)‖2

2 comparing to the orig-

inal images.

Table 1: mean squared error of repaired images

Mean-squared error of Methods (1e3)

Telea RICA deep-RICA

2.2 2.4 2.4

IV. Discussion

I. Advantages and Limitations

One of the limitations of RICA is that it reliescommon inner structures of corrupted pictures.However, in our dataset, noises tend to centerin certain areas (e.g. the sky) where dispar-ity information is inaccurate due to far dis-tance. Noises of this kind more often thannot bear few similarities, if any at all. Infact, RICA works most effectively on Gaussiannoises, which is not a very accurate model ofthe reality.

Deep RICA is a better model with a lowermean squared error than the linear RICA. How-ever, the added complexity leads to a longerrun time, especially for large data sets. In ourdata set, training time is still acceptable. Teleaalgorithm generates repaired images with thelowest mean squared errors. Instead of find-ing common features in images, this method isbased on local optimization. Since an inpaint-ing mask tells the algorithm which pixels cor-respond to the noises, this method only worksin the predefined region. It is also more cost-effective in terms of implementation. Inpaint-ing does not need other image information totrain the data sets. It simply repairs the imagebased on the image itself. However, in somecases where noise is undefined, inpainting isimpossible. Also, inpainting distance is a pa-rameter that changes with respect to differentimage conditions. For large data set, this isalso impossible to tune.

By comparison, RICA, which requires nopredefinition of corrupted pixels, has broaderapplications. However, in synthesizing 2D im-ages with out-of-plane transformations, Teleamay be a better choice.

4


V. Conclusion

Through 3D transformation and image restora-tion process, we are able to generate imagesof different orientations and locations. In sodoing, We can acquire data much faster andmore efficiently. For future work, we wouldalso like to extrapolate images beyond camerapositions, which would be more difficult thanour current interpolated images.

VI. Acknowledgement

We would like to thank Mr.Tao Wang, a phDstudent at Stanford CS department, for helpingus with the project.

References

[1] ICA with Reconstruction cost for efficientovercomplete feature learning. Andrew Y.N.

et. al.

[2] An image inpainting technique based on thefast marching method. Alexandru.T.

[3] The KITTI Vision Benchmark Suitehttp://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=stereo

[4] Semi-Global Block-Matching Algorithmhttp://zone.ni.com/reference/en-XX/help/372916M-01/nivisionconceptsdita/guid-53310181-e4af-4093-bba1-f80b8c5da2f4/

[5] Image Denoising and Inpainting with DeepNeural Networkshttp://hebb.mit.edu/courses/9.641/lectures/Xie%20Xu%20Chen%20image%20denoising%20inpainting%20deep%20neural%20networks%2012.pdf

5

http://www.cvlibs.net/datasets/kitti/eval_stereo_flow.php?benchmark=stereo



http://zone.ni.com/reference/en-XX/help/372916M-01/nivisionconceptsdita/guid-53310181-e4af-4093-bba1-f80b8c5da2f4/




http://hebb.mit.edu/courses/9.641/lectures/Xie%20Xu%20Chen%20image%20denoising%20inpainting%20deep%20neural%20networks%2012.pdf




cs229 synthesizing images with out-of-plane...

Documents