stereo and multiview sequence processing. outline stereopsis stereo imaging principle disparity...

Stereo and Multiview Sequence Processing

Outline

Stereopsis Stereo Imaging Principle Disparity Estimation Intermediate View Synthesis Stereo Sequence Coding

Stereopsis Retinal disparityRetinal disparity

The horizontal distance between the corresponding left and right image points of the superimposed retinal images.

The disparity is zero if the eyes are converged.

Stereopsis The sense of depth combined

from two different perspective views by the mind.

Arbitrary Camera Configuration

Stereo Imaging Principle (1)

Xl = [R]lX + Tl, Xr = [R]rX + Tr

Cw

CrCl

X

Z

Y

X

ll

l

y

xx

rr

r

y

xx

rliZ

YFy

Z

XFx

i

iii

i

iii , ,

rl

l

l

l

l

rll

r

r

r

r

r

F

y

x

F

Z

F

y

x

F

ZT

]R[

Xr = [R]rlXl + Trl, where[R]rl = [R]r[R]l

T, Trl = Tr –[R]r[R]lTTl

Given xl and xr

Zr and Zl Xl, Xr, Yl, Yr (X, Y, Z)

(Perspective projection)

([R]l , [R]r : Orthonormal)

12.2.1

12.2.2

12.2.3

12.2.4

12.2.5

Parallel Camera Configuration


xl xr

X

Yl Yw YrZlZw Zr

Xl Xw XrCl Cw Cr

B

Z

YFyyy

Z

BXFx

Z

BXFx

ZZZYYYB

XXB

XX

rlrl

rlrlrl

,2/

,2/

;,,2

,2

Z

FBxxd rlx

xxx

rl

d

FBZ

d

ByY

d

xxBX

,,

2

)(

12.2.6

12.2.7

xl xr

F

B

Cl Cw Cr

ZX

X+B/2

X-B/2

X

ZlZw Zr

12.2.8

12.2.9

3-D view X-Z view (Y=0)

Results of eq. 12.2.8 Basis for derive the depth from the dispari

ty info The disparity value of a 3-D point (X, Y, Z) i

s independent of the X and Y coordinates, and is inversely proportional to the Z value.

The range of the disparity increases with the baseline B, the distance between the two cameras.

dx > 0

Converging Camera Configuration


xl xr

X

YlYw YrZl

Zw Zr

XlXw XrCl Cw Cr

2/sin

0

2/cos

,

cos0sin

010

sin0cos

2/sin

0

2/cos

,

cos0sin

010

sin0cos

B

B

B

B

lr

ll

TR

TR

ZBX

YFy

ZBX

ZBXFx

ZBX

YFy

ZBX

ZBXFx

rr

ll

cos)2/(sin,

cos)2/(sin

sin)2/(cos

cos)2/(sin,

cos)2/(sin

sin)2/(cos

xl xr

BCl Cw Cr

X

X

Zl

Zw

Zr

Z

12.2.10

12.2.11

12.2.2 and 12.2.4

3-D view X-Z view (Y=0)12.2.12


Epipolar Geometry Epipolar Constraint

For any imaged point that falls on the left epipolar line, its corresponding pixel in the right image must be on the right epipolar line

Fundamental matrix The relation between an im

age point and its epipolar line can be characterized by a 3 by 3 matrix, [F]

xl

xr

X

el

er

epl

epr

Cl

Cr

: Epipolar planeepl , epr: Epipolar lineel: Left epipoleer: Right epipole

l

r

0~][~ ,0~][~ rTT

llTr xxxx FF

]1,[~ TT xx


Parallel camera Epipoles are at infinity, and epipolar

lines are parallel For any given point, the left and right

epipolar lines associated with this point are horizontal lines with the same y coordinate as this point

This can simplify the disparity estimation problem

el

er

Cl

Cr

Disparity Estimation (1) Constraints on Disparity Distribution Epipolar constraint Unidirectionality with parallel cameras

With the parallel camera configuration, the DV has only horizontal components and is always positive.

Ordering constraint Let xr,1 and xr,2 be two points in the right image on the

same horizontal line. xr,1< xr,2 xl,1< xl,2 dx,2 > dx,1+xr,1- xr,2

xl,2 > xl,1 xl,2 - xr,2 > xl,1 - xr,2 dx,2 > xl,1 - xr,1 +xr,1- xr,2

Disparity Estimation (2)

Models for the Disparity Function A simple case: The surface of the imaged

scene is approximated by a plane.

cbYaXYXZ ),(

FbyFxa

BacyxZ

cF

Zyb

B

F

ZxayxZ

rrrr

rrrr

1

2),(

)2

(),(

)(2

1),( rrrrx byaxF

aBcyxd

12.3.1

12.2.6 and 12.2.7

12.3.3

12.3.2

12.3.4

The disparity function is affine in the image coordinate when the surface is a plane

Patch: the planar condition holds

Divided into small patches such that each patch is approximately planar

The disparity estimation problem the estimation of three affine parameters for

each patch the estimation of the disparity (dx only) at thr

ee corner points the estimation the disparity at nodal points,

and the disparity function within each patch can then be interpolated form the nodal points using the affine model

Disparity Estimation (3) Block-Based Approach

A disparity function is described by a constant or a low-order polynomial

determined by minimizing the error between the two views after warping, based on the estimated disparity function

Solved by exhaustive or gradient-descent search with constraints listed in Page 10.

Search range should be much larger. This model is only appropriate for the flat

surface that is parallel with the image plane. This model is good when the block size is

small.


Two-dimensional mesh-based approachX

xl xrBm,l Bm,r

Finding nodal displacements by minimizingthe disparity-compensated prediction errorbetween corresponding elements, summed over the FOUR elements attached to this node

Parallel set-up, only horizontal disparities must be searched

Original left Original right

Regular mesh on the left image

Corresponding mesh on the right image

Predictive right image by BMA (32.03 dB)

Predictive right image by mesh (27.48 dB)

The mesh-based scheme yields a visually more accurate prediction

Disparity Estimation (5) Intra-Line Edge

Matching Using Dynamic Programming The stereo matching

process can be considered as finding a path in a graph.

# of edge points in the left image

# of edge points in the right image

Right scan line

Left sca

n lin

e


Joint Structure and Motion Estimation Modeling the surface of the imaged

object with a 3-D mesh.

The 3-D mesh projects to 2-D meshes in the left and right images

Intermediate View Synthesis (1)

Naïve approach Linear interpolation without considering

disparity

Dcl is the baseline distance from the central to the l

eft view yielding blurred images

),()()()()( xxxxx rrllc ww

)(1)(,)( xxx lrcrcl

crl ww

DD

Dw

Disparity-compensated interpolation

Intermediate View Synthesis (2)

))(()())(()()( xdxxxdxxx crrrclllc ww

x + dcr(x)x + dcl(x)

,0

,1

,

)(crcl

cr

l

DD

D

w x

if x is visible in both views,

if x is visible only in the left views,

if x is visible only in the right views,

Suppose dcl(x) and dcr(x) are known

In reality, only dlr(x) can be estimated. It is not easy to generate dcl(x) and dcr(x) from dlr(x)

Intermediate View Synthesis (3) Solved if dlr(x) is estimated by the mesh-based appr

oach

xl,nxc,n xr,n

nrcrcl

clnl

crcl

crnc DD

D

DD

D,,, xxx

Stereo Sequence Coding (1)

Multiview profile of MPEG-2

Coding left view seq. Sl, first, for the right view seq., each frame is predicated from the corresponding frame in Sl, based on an estimated disparity field and the prediction error image are coded.

P B B B

I B B P

Rightview

Leftview


Incomplete 3-D representation of multiview sequences: augmented text map, region segmentation, disparity info for each region

Putting the texture maps of all the different regions in an augmented image.

Originalleft

Originalright

Augmentedtexture

Disparitymap


Mixed-resolution coding Based on the HVS, the resolution of one of the

two images can be considerably reduced when the image is presented for a short time

One of the left and right sequences is coded at a high resolution, while the other is first down-sampled spatially and temporally, then coded

High resolution

Low resolution

Leftsequence

Rightsequence


3-D object-based coding

Objectsegmentation

Motion and structure

estimation

Shape and motion

parameter coding

Reference texture image

extraction

Reference texture image

coding

Coded view synthesis

Synthesis error image coding

Left and rightsequences

Shape andmotion bits

Texture bits

Synthesis error bits

3-D object-based coding Instead of deriving 2-D motion and disparity for

performing MCP and DCP, 3-D structure and motion parameters are estimated from the stereo or multiple views

The structure, motion, and surface texture of each object are coded, instead of individual image frames

At the decoder, desired views are synthesized Advantages

accurate 3-D estimation with the 3-D info derived from the stereo pair, one can

generate any intermediate view the coded 3-D info enables manipulation of the imaged

object and scene Wire-framed object, nodal positions, nodal

displacement vectors, segmentation map, I3D


3-D model-based coding It is very difficult to derive the 3-D

structure of the objects in a scene automatically

Building a generic model for each potential object

Only a few objects are in the scene ex. Teleconferencing applications

Pre-designed generic face and body models can be used

stereo and multiview sequence processing. outline stereopsis stereo imaging principle disparity...

Documents