estimation principles - rutgers school of …soe.rutgers.edu/~meer/teachtoo/estimation-2.pdfdf hd f...

ESTIMATION PRINCIPLES

in computer vision

Thank you for the slides. They come mostly from the following sources.

Marc Pollefeys U. on North Carolina

Ramani Duraiswami U. of Maryland

meer

Line

meer

Line

peter

Text Box

of

Derivative of a matrix

meer

Line

peter

Text Box

J(x) = delta f(x) / delta x we use J(x)^T, will be (d x n)

peter

Text Box

if it is a square matrix then gives a scalar Jacobian determinant too

peter

Line

Jacobian and Hessian

peter

Text Box

J here is the GRADIENT of f and is transposed (1xd). The variable x is d-dimensional so it is (1xd) (dx1) = 1. H = H^T since is a symmetrical matrix

Least Squares, SVD, Pseudoinverse

• Ax=b A is m×n, x is n×1 and b is m×1.• A=USVt where U is m×m, S is m×n and V is n×n• USVt x=b. So SVt x=Utb• If A has rank r, then r singular values are significant

Vtx= diag(σ1-1,…,σr

-1 ,0, …, 0)Utbx= Vdiag(σ1

-1,…,σr-1 ,0, …, 0)Utb

11

,tri

r i r ri i

σ ε σ εσ +

=

= > ≤∑ u bx v

•Pseudoinverse A+=V diag(σ1-1,…,σr

-1 ,0, …, 0) Ut

–A+ is a n×m matrix. –If rank (A) =n then A+=(AtA)-1A

–If A is square A+=A-1

meer

Text Box

T

peter

Text Box

A has rank r unperturbed like here !!

Well Posed problems• Hadamard postulated that for a problem to be “well

posed”1. Solution must exist2. It must be unique3. Small changes to the input data should cause small changes to the

solution

• Many problems in science and computer vision result in “ill-posed” problems.

– Numerically it is common to have condition 3 violated.

• Recall from the SVD 11

,tni

i r ri i

σ ε σ εσ +

=

= > ≤∑ u bx v

•If σs are close to zero small changes in the “data” vector b cause big changes in x.•Converting ill-posed problem to well-posed one is called regularization.

meer

Text Box

Jacques Salomon Hadamard (December 8,1865 –October 17,1963)

Regularization• Pseudoinverse provides one means of

regularization

• Another is to solve (A+εI)x=b 21

( )n

tii i

i i

σε σ=

=+∑x u b v

•Solution of the regular problem requires minimizing of ||Ax-b||2

•This corresponds to minimizing

||Ax-b||2 + ε||x||2

–Philosophy – pay a “penalty” of O(ε) to ensure solution does not blow up.–In practice we may know that the data has an uncertainty of a certain magnitude … so it makes sense to optimize with this constraint.

•Ill-posed problems are also called “ill-conditioned”

peter

Text Box

in computer vision

Derivative

• In 1-D

• Taylor series: for a continuous function

0

( ) ( )limh

df f x h f x

dx h→+ −=

( )

2 2

2

2 2

2

( ) ( )2 !

( ) ( ) 12 !

n n

nx x x

n nn

nx x x

df h d f h d ff x h f x h

dx dx n dx

df h d f h d ff x h f x h

dx dx n dx

+ = + + + + +

− = − + + + − +

" "

" "

f(x)

x

•Geometric interpretation –Approximate smooth curveby values of tangent, curvature, etc.

peter

Text Box

----------------> gradient vector points toward the direction of maximum change

peter

Text Box

second derivatives...

peter

Line

peter

Text Box

An example of local interpolation: DERIVATIVES, POLYNOMIAL INTERPOLATION, SPLINES

Remarks• Mean value theorem:

– f(b)-f(a)=(b-a)df/dx|c a<c<b

– There is at least one point betweena and b on the curve where the slope matches that of the straight line joiningthe two points x

f(x)

•df/dx=0–represents a minimum, maximum or saddle point of the curve y=f(x)

–d2f/dx2 > 0 minimum, d2f/dx2 < 0 maximum

–d2f/dx2 = 0 saddle point

Finite differences

• Approximate derivatives at points by using values of a function known at certain neighboring points

• Truncate Taylor series and obtain an expression for the derivatives

• Forward differences: use value at the point and forward x x x x

• Backward differences

( ) ( )

( ) ( )

21 2

2

21 2

2

( ) ( )2

( ) ( )2

x x

x x

df h d fh f x h f x O h

dx dx

df h d fh f x f x h O h

dx dx

−

−

= + − − +

= − − + +

peter

Text Box

forward differences

peter

Text Box

backward differences

Finite Differences• Central differences

– Higher order approximation

( )

( )

2 22

2 2

2

( ) ( ) ( ) ( )2

2 2

( ) ( )2

x x x

x

df f x h f x h d f f x f x h h d fO h

dx h dx h dx

df f x h f x hO h

dx h

+ − − −= − + + +

+ − −= +

–However we need data on both sides

–Not possible for data on the edge of an image

–Not possible in time dependent problems (we have data at current time and previous one)

peter

Text Box

add the two together...

Approximation• Order of the approximation O(h), O(h2)

• Sidedness, one sided, central etc.

• Points around point where derivative is calculated that are involved are called the “stencil” of the approximation.

• Second derivative

( )2 2

22 2

2

2 2

( ) ( ) ( ) ( )0

2 2

( ) 2 ( ) ( )( )

x x

f x h f x h d f f x f x h h d fO h

h dx h dx

d f f x h f x f x hO h

dx h

+ − − −= − − + +

+ − + −= +

23 ( ) 4 ( ) ( 2 )( )

2df f x f x h f x h

O hdx h

− + + − += +

•One sided difference of O(h2)

meer

Text Box

---

meer

Text Box

h^2

peter

Text Box

forward d f (h, h+1) - h/2 d^2 f (h, h+1, h+2)

peter

Text Box

df (forward) - df (backward) = 0

Polynomial interpolation• Instead of playing with Taylor series we can obtain fits

using polynomial expansions.– 3 points fit a quadratic ax2+bx+c

• Can calculate the 1st and 2nd derivatives

– 4 points fit a cubic, etc.

• Given x1, x2, x3, x4 and values f1, f2, f3, f412 3 2 3

0 01 11 1 1 1 1 12 3 2 3

1 12 22 2 2 2 2 22 3 2 3

2 23 33 3 3 3 3 32 3 2 3

3 34 44 4 4 4 4 4

1 1

1 1

1 1

1 1

a af fx x x x x x

a af fx x x x x x

a af fx x x x x x

a af fx x x x x x

− = =

•Vandermonde system – fast algorithms for solution.•If more data than degree .. Can get a least squares solution.•Matlab functions polyfit, polyval

Remarks• Can use the fitted polynomial to calculate derivatives

• If equation is solved analytically this provides expressions for the derivatives.

• Equation can become quite ill conditioned – especially if equations are not normalized.

ax2+bx+c can also be written as a* (x-x0)2+b* (x-x0) + c*

– Find the polynomial through x0-h, x0, x0+h

20 1

1 02

2 1

1

1 0 0

1

h h a f

a f

h h a f

− − =

–a0=f0, a1=,(f1-f-1)/2h a2=(f-1-2f0+f1)/2h2

–Gives the expected values of the derivatives.

peter

Text Box

peter

Text Box

first der.

peter

Text Box

second der.

peter

Text Box

"mean"

Polynomial interpolation• Results from Algebra

– Polynomial of degree n through n+1 points is unique – Polynomials of degree less than xn is an n dimensional space.– 1,x,x2, …,xn-1 form a basis.

• Any other polynomial can be represented as a combination of these basis elements.

– Other sets of independent polynomials can also form bases.

• To fit a polynomial through x0,…,xn with values f0, …,fn– Use Lagrangian basis lk.

0,

, 0,...,n

ik

i k ii k

x xl k n

x x=≠

−= =−∏

–p(x)=a0l0+a1l1+…+anln.–Then ai=fi

–Many polynomial bases: Chebyshev, Legendre, Laguerre …–Bernstein, Bookstein …

peter

Text Box

from i = 0 to i = n

peter

Text Box

peter

Text Box

Lagrangian polynomial better solution than through the Vandermonde matrix better interpolation functions

peter

Text Box

p(x_i) = f_i by definition also p(x_i) = f_i delta_{ii} because this l_i = 1 in x_i and 0 otherwise p(x) is n-dimensional so in the sum of the factors

Increasing n• As n increases we can

increase the polynomial degree.

• However the function in between is very poorly interpolated.

• Becomes ill-posed.

• For large n interpolant blows up.

•Idea: –Taylor series provides good local approximations

–Use local approximations

•Splines

Spline interpolation• Piecewise polynomial approximation

– E.g. interpolation in a table– Given xk ,xk+1, fk and fk+1 evaluate f at a point x such that

xk<x<xk+1

11 1

1 1

,( )

0 , otherwise

k kk k k k

k k k k

x x x xf f x x x

x x x xf x+

+ ++ +

− − + ≤ ≤ − −=

•Construct approximations of this type on each subintervalThis method uses Lagrangian interpolants

•Endpoints are called breakpoints

•For higher polynomial degree we need more conditions

• e.g. specify values at points inside the interval [xk<x<xk+1]•Specifying function and derivative values at the end points xk,xk+1 leads to cubic Hermite interpolation

peter

Line

Cubic Spline• Splines – name given to a flexible piece of wood used by

draftsmen to draw curves through points.– Bend wood piece so that it passes through known points and draw a line

through it.– Most commonly used interpolant used is the cubic spline– Provides continuity of the function, 1st and 2nd derivatives at the

breakpoints.– Given n+1 points we have n intervals– Each polynomial has four

unknown coefficients• Specifying function values

provides 2 equations• Two derivative continuity

equations provides two more

{ }

( ) ( )( ) ( )

'' ''1

' '1

, , 1,..., 1

( ) 1, , 1

2, ,

2, ,

i i

i i

i i

i i

x f i n

P x f i n

P x P x i n

P x P x i n

−

−

= += = +

= =

= =

"

"

"

•Left with two free conditions. Usually chosen so that second derivatives are zero at ends

peter

Text Box

P(x) = A f_j + B f_{j+1} + C f''_j + D f''_{j+1} A,B linear in x due to Lagrange interpolation C,D cubic in x due to second order derivative and through A, B

peter

Text Box

P(x) spline x_i <= x <= x_{i+1}

peter

Text Box

f'' and x on i-1, i, i+1 = f and x on i-1, i, i+1 n-1 linear equations f''' are the unknowns

peter

Text Box

at i = 1 and i = n+1

peter

Text Box

taking this into account we obtain

peter

Text Box

A = [x_(j+1) - x] / [x_(j+1} - x_j] B = 1 - A C = 1/6 [A^3 - A] [x_{j+1} - x_j]^2 D = 1/6 [B^3 - B] [x_{j+1} - x_j]^2

Interpolating along a curve

• Curve can be givenas x(s) and y(s)

• Given xi,yi,si

• Can fit splines for x and y

• Can compute tangents, curvature and normal based on this fit

• Things like intensity van vary along the curve. Can also fit I(s)

*

*

**

**

* **

meer

Text Box

C

Typical Optimization Problems• Model fitting

– Fit a straight line or polynomial through data

yi=Σj aj xji

– Fit a sum of cosines, exponentials etc.

yi=Σj aj φj(xi)

Model φj s, parameters ajs data (xi,yi)

• Determine a transformation– Determine a homography matrix

x′′′′ = Hx– Determine the fundamental matrix

x′′′′tFx = 0x x ′′′′

H

x

y

Algebraic Distance• Algebraic system Ax = b

• Approximate solution x/

• Residual ||A x/ - b||

• Residual is also calledalgebraic distance

• Algorithms that seek toreduce the residual arecalled “minimumresidual” algorithms

x

y

peter

Text Box

error only in y ! not used by now...

peter

Text Box

x = [x y]^T !!!

peter

Line

peter

Line

peter

Text Box

this is even less that the algebraic distance

peter

Text Box

x' can be any point given at the beginning !!

Scaling• Try to avoid anyone equation being overly represented.

• Scale each equation– Scale by largest coefficient so that it becomes 1

ai1/a11

– Scale so that sum of coefficients is 1

a112+ a12

2 +…+ a1n2=1

• Scaling also has the benefit of avoiding round-off.

peter

Text Box

for example: in fundamental matrix estimation scaling could be beneficial

Weighted Least Squares• Multiplying an equation by a number will increase its

weight or influence in the cost function.

• Not always a bad thing– May want to weight different equations differently

• How to select weights?– Number of observations

– Reliability of measurement• Measured variances

• How good is the least squares solution? How “probable”are the parameter estimates?

• Bring in notions of statistics

peter

Text Box

Maximum Likelihood parameter estimation (with Gaussian distribution) in just one of many ways....

peter

Text Box

but the weighted introduce also additional dependence !

peter

Text Box

in computer vision only with robust (outliers removes) processing can one be successful

Cost functions for image based data• Notation

– Measured value of a point x∼

– True value of a point x– Estimated value of a point x^

– Transformation or model is denoted H• Model y=H(x) and x=H-1(y)

• Symmetric error functions– Case 1: Error only in one image

• Could arise if we are imaging a calibration pattern with knowncoordinates and trying to determine camera calibration

– Appropriate error function is

Find H^ that minimizes Σj d(x′′′′~j,H^xj

~)2

peter

Text Box

error in all coordinates here only two coord.

peter

Text Box

of the two images Not used in general.

peter

Text Box

in fact this is a heteroscedastic process since point in 2D has different variance

peter

Text Box

As

peter

Text Box

peter

Text Box

full rank !

Cost Functions

• Ideally there is areal cost beingminimized– E.g. Dollars or

distance travelled– Then each equation

makes sense

• Statistical measures• Review concepts of

Metrics

meer

Text Box

Hartley & Zisserman First edition number

peter

Line

peter

Line

Constrained optimization• We have to optimize f(x) subject to g(x)=0

– Makes sense if g(x)=0 leaves a few degrees of freedom (N-M)

• Approach 1 (Eliminate constraints)– Eliminate variables using constraint equations and solve a

reduced problem f(x*)=0 – Not practical, except for simple problems

• Approach 2 (Penalty function)– Construct a new minimization function f(x)+Pg(x) where P>>1– If constraint is violated the minimization function increases

rapidly, forcing the optimization routine to solutions where it is not violated

• Approach 3 (Lagrange Multipliers)– Solution has to lie on the surface of g(x)=0– Can’t have ∇f =0 anymore– However we require ∇f parallel to ∇g=0

peter

Line

Lagrangian• Consider the Lagrangian function

L(x, λ) = f + λg

gLgfL =∂∂∇+∇=

∂∂ ),(,),( λ

λλλ xx

x• Extremize the Lagrangian

0)(),(,0),( ==∂∂=∇+∇=

∂∂ xxxx

gLgfL λλ

λλ

• So this gives us both the constraint equation and the way to optimize the function on the surface.

Optimization Techniques• Different problem here

– Given a set of locations xi where one has measured a fitness functionχ/f(xxxx) find a vector of parameters (xxxx) that minimizes it

• For the case where the function was linear we already have methods such as SVD to solve the linear system.

• Here we are concerned with systems where the equation is not so simple. – In particular f may be a nonlinear function of parameters xxxx

• Differential calculus provides us with ways of estimating extrema.– The minimum (max) of f occurs at ∇f / , or– ∇f is in the direction of increasing f, or– Given an interval ∇f has opposite signs at the boundary there must

be a point inside where ∇f must be zero

• However calculus is local– So these methods can only guarantee a local extremum

meer

Text Box

=

peter

Text Box

Almost all the models in computer vision describing 3D relations, 3D to 2D relations or 2D to 2D relations, are nonlinear mainly because the projection transformation. Exmp. epipolar geometry, camera calibration etc.

peter

Line

peter

Text Box

the second derivatives, H, is positive definite --- x in a local minimum H, is negative definite --- x in a local maximun

Bisection methods• Given a function f at three points a,b,c with [a<b<c], and a

way to evaluate f at a new point– Given 2 initial guesses f(a) and f(b), if f(a)>f(b) move in the

direction a to b and choose a new parameter c.

– Find a triplet [a,b,c] sothat and f(c)>f(b)and f(a)>f(b)

– Choose a new pointbetween a and bor b and c

– Repeat until thepoints a, b and care sufficientlyclose

peter

Text Box

no derivatives !!!

Paraboloic bracketing

peter

Text Box

in multidimension can not work especially when more than one extrema are close

Bracketing a minimum in multiple dimensions

• Smallest region bounded by a group of points in– 1D is bounded by two points (a line segment)

– 2D is bounded by three points (a triangle)

– 3D by four points (a tetrahedron)

– In ND by N+1 points (a simplex)

• Can find a direction of a decreasing function in – 1D by the line from point with higher value to lower

– 2D by joining point with highest value through point with average value on the opposite side of the triangle

– And so on for ND

• However cannot guarantee a bracket of a minimum in ND

peter

Text Box

no derivatives !

peter

Text Box

affine space

Downhill Simplex Method (Nelder-Mead)• Reflection: Project along the

direction of decrease with size 1.

• Reflection and expansion:If decrease is large try a step of size 2.

• Contraction: Result of reflection is bad, so try a simple reduction within simplex.

• Multiple contraction: If result of contraction does not give a better result than lowest point.

• Conclude: volume of simplex becomes below tolerance.

meer

Text Box

J. A. Nelder and R. Mead. A simplex method for function minimization .Computer Journal, 7:308.313, 1965.

meer

Text Box

mean minus the highest sample !

meer

Text Box

tetrahedron

meer

Text Box

relative to the lowest point

Newton iteration

Taylor approximation

Δ+≈Δ+ J)(P )(P 00 ff PXJ∂∂

=Jacobian

)(PX 1f−

Δ−=Δ−−≈− JJ)(PX)(PX 001 eff

( ) 0T-1T

0TT JJJJJJ ee =Δ⇒=Δ⇒

Δ+=+ i1i PP ( ) 0T-1T JJJ e=Δ

( ) 01-T-11-T JJJ eΣΣ=Δ

normal eq.

peter

Text Box

X is the measurement vector P is the parameter vector, to be found X = f(P) + e_0 and the vector e_0 has minimized

peter

Text Box

matrix

peter

Text Box

It is like as the beginning not transpose as we use it in computation. You just have to be consistent with the notation.

peter

Text Box

P_1 = P_0 + DELTA

peter

Text Box

J+ is more correct

peter

Text Box

iteration till convergence a local minimum, and depends on P_0 too

peter

Text Box

minimizing this expression

peter

Text Box

if the function f in not too nonlinear, J^T J is a good approximation of the Hessian matrix and the method is the Gauss-Newton technique

peter

Text Box

i

peter

Text Box

if X weighted with a Gaussian covariance SIGMA_X J^T SIGMA_X^{-1} J DELTA_i = J^T SIGMA_X^{-1} e_i

peter

Text Box

GAUSS --

Gradient Descent• We have a function f and an estimate of its gradient ∇f

• Decrease f by a quantity along the direction of ∇f– Begin initialize x, tol, k=0

do k<-k+1

x x-hk ∇f

until hk∇f<tol`return x

end

• Determining h is not easy– Called “learning rate” in AI

– Hard to determine h• If h is too small algorithm will be too slow to converge. If it is too large the

procedure will diverge

• Can select it using a line search or using a Newton method.

peter

Text Box

slow convergence due to zig-zagging

peter

Text Box

More complicated steepest descent techniques also exist. We will not describe them.

600 Appendix 6 Iterative Estimation Methods

Levenberg-Marquardt method is essentially a Gauss-Newton method that transitions smoothly to gradient descent when the Gauss-Newton updates fail.

To summarize, we have so far considered three methods of minimization of a cost function g(P) = ||e(P)||2/2:

(i) Newton. Update equation:

C / P P A = -gP

where gPP = eJeP + e j p e and c/P = eje. Newton iteration is based on the assumption of an approximately quadratic cost function near the minimum, and will show rapid convergence if this condition is met. The disadvantage of this approach is that the computation of the Hessian may be difficult. In addition, far from the minimum the assumption of quadratic behaviour is probably invalid, so a lot of extra work is done with little benefit,

(ii) Gauss-Newton. Update equation:

eJePA = —eje

This is equivalent to Newton iteration in which the Hessian is approximated by e je p . Generally this is a good approximation, particularly close to a minimum, or when e is nearly linear in P.

(iii) Gradient descent. Update equation:

AA = —eje = —gF.

The Hessian in Newton iteration is replaced by a multiple of the identity matrix. Each update is in the direction of most rapid local decrease of the function value. The value of A may be chosen adaptively, or by a line search in the downward gradient direction. Generally, gradient descent by itself is not recommended, but in conjunction with Gauss-Newton it yields the commonly used Levenberg-Marquardt method.

A6.2 Levenberg-Marquardt iteration

The Levenberg-Marquardt (abbreviated LM) iteration method is a slight variation on the Gauss-Newton iteration method. The normal equations JT J A = — JTe are replaced by the augmented normal equations (JT J + Al)A = - J T e , for some value of A that varies from iteration to iteration. Here I is the identity matrix. A typical initial value of A is 1CT3 times the average of the diagonal elements of N = JT J.

If the value of A obtained by solving the augmented normal equations leads to a reduction in the error, then the increment is accepted and A is divided by a factor (typically 10) before the next iteration. On the other hand if the value A leads to an increased error, then A is multiplied by the same factor and the augmented normal equations are solved again, this process continuing until a value of A is found that gives rise to a decreased error. This process of repeatedly solving the augmented normal equations for different values of A until an acceptable A is found constitutes one iteration of the LM algorithm. An implementation of the LM algorithm is given in [Press-88].

meer

Line

peter

Text Box

g_P ~Jacobian g_PP ~Hessian

peter

Text Box

minimization in DELTA

peter

Text Box

e(P) = f(P) - X ! reversed !

peter

Line

peter

Line

peter

Line

peter

Line

Levenberg-Marquardt

0TT JNJJ e=Δ=Δ

0TJN' e=Δ

Augmented normal equations

Normal equations

J)λdiag(JJJN' TT +=

30 10λ −=

10/λλ :success 1 ii =+

ii λ10λ :failure = solve again

accept

λ small ~ Newton (quadratic convergence)

λ large ~ descent (guaranteed decrease)

meer

Text Box

Gauss-

peter

Text Box

in book ... = - J^T e_0 !!!

peter

Text Box

from the Gauss-Newton iteration is changed a little

peter

Text Box

error decreases

peter

Text Box

error increases

peter

Text Box

next iteration...

Levenberg-Marquardt

Requirements for minimization• Function to compute f• Start value P0

• Optionally, function to compute J(but numerical ok, too)

peter

Text Box

In NumRec book too N = J^T J and a entries are N'_ii = (1 + lambda) N_ii N'_ij = N_ij for small lambda is Gauss-Newton, for large lambda off-diagonal entries because insignificant

peter

Text Box

if lambda is large >>1 will be still a down-hill direction but not the same with pure gradient descent since each parameter p_i minimizes separetly

peter

Text Box

if one entry J_i = delta f / delta p_i ~ 0 the entry in N and N' is also zero and N will be singular. Rare occurrence!

peter

Text Box

the maximum between |10^{-4} x_i | and 10^{-6}

Sparse Levenberg-Marquardt

• complexity for solving• prohibitive for large problems

(100 views 10,000 points ~30,000 unknowns)

• Partition parameters• partition A • partition B (only dependent on A and itself)

0T-1 JN' e=Δ3N

peter

Text Box

3D recovery of points:

peter

Text Box

two views for 2D homography: 8(9) parameters, 2n (x,y) in the first image; 2n + 8; the second image errors comes from the first !

peter

Text Box

11(12)*100 + 3*10000

peter

Text Box

set of parameters describing the cameras

peter

Text Box

set of parameters describing the points

peter

Text Box

the parameter vector P = (a^T, b^T)^T measurement vector X SIGMA_X covariance matrix of X

peter

Text Box

a

peter

Text Box

b

peter

Text Box

a

peter

Text Box

only if N' is nonsingular otherwise N'{+}

peter

Text Box

measurement vector and SIGMA_X in real coordinates or variances, e.g. x, y.

ε = X – Xhat we seek to minimize ||ε ||2ΣX

A = [∂Xhat /∂a] B = [∂Xhat /∂ b] and the normal equation is

JT Σ-1X J Δ = JT Σ-1

X ε J = [A | B ]

U, V and W, with * in the book because multiplying with 1 + λon the diagonal, are matrices (AT Σ-1

X A)…

peter

Text Box

[ A^T SIGMA_X^{-1} A A^T SIGMA_X^{-1} B ] [ DELTA_a ] [ A^T SIGMA_X^{-1} e ] = [ B^T SIGMA_X^{-1} A B^T SIGMA_X^{-1} B ] [ DELTA_b ] [ B^T SIGMA_X^{-1} e ]

peter

Text Box

so U = A^T SIGMA_X^{-1} A and V = B^T SIGMA_X^{-1} B become [....]* we don't put [...]* but leave in [...]

Sparse bundle adjustment

residuals:normal equations:

with

note: tie points should be in partition A

meer

Line

peter

Text Box

measured camera points

peter

Text Box

camera part

peter

Text Box

point part

peter

Text Box

a

peter

Text Box

B

peter

Text Box

b

peter

Text Box

A

peter

Text Box

these are without the covariance SIGMA_X^{-1} !!!

peter

Text Box

a

peter

Text Box

a

peter

Text Box

a

peter

Text Box

derivative of one camera

peter

Text Box

b

peter

Text Box

b

peter

Text Box

b

peter

Text Box

b

peter

Text Box

a

peter

Text Box

derivative of a point

peter

Text Box

k number of cameras i number of points

peter

Text Box

A

peter

Text Box

B

peter

Text Box

U and V without the " * "


normal equations:

modified normal equations:

solve in two parts:

meer

Text Box

| | | | | | |

peter

Text Box

A

peter

Text Box

A

peter

Text Box

A

peter

Text Box

a

peter

Text Box

a

peter

Text Box

a

peter

Text Box

a

peter

Text Box

cameras part

peter

Text Box

points part

peter

Text Box

b

peter

Text Box

B

peter

Text Box

B

peter

Text Box

B

peter

Text Box

B

peter

Text Box

b

peter

Text Box

b

peter

Text Box

B

peter

Text Box

multiply with this matrix

peter

Text Box

In the book Algorithm A6.1 gives the partitioned Levenberg-Marquardt method


• Covariance estimation

-1WVY =( )

1a

Tb

1-a

VYYWWVU

−

+

+∑=∑−=∑

Yaab ∑=∑ -

peter

Text Box

SIGMA_P = (J^T SIGMA_X^{-1} J)^{+} pseudo-inverse will cover it later in the course

peter

Text Box

^T

peter

Text Box

exmp. no variation allowed in the directions perpendicular to the constraint surface

peter

Text Box

assuming V is invertible the points

peter

Text Box

used the formula if a matrix G is invertible (G H G^T)^{+} = G^{-T} H^{+} G^{-1} valid if null(H) G^T = null(H) G^{-1} start with J^T SIGMA_X^{-1} J and transform is into G H G^T See book.


U1

U2

U3

WT

W

V

P1 P2 P3 M

Jacobian of has sparse block structure

=J == JJN T

12xm 3xn(in general

much larger)

im.pts. view 1

( )( )∑∑= =

m

k

n

iikD

1 1

2ki M̂P̂,m

Needed for non-linear minimization

meer

Text Box

11

peter

Text Box

cameras points

peter

Text Box

the covariance matrix SIGMA_X is block-diagonal diag[ SIGMA_1 ... SIGMA_n ]

peter

Text Box

In the book Algorithm A6.3 and A6.4 gives the partitioned Levenberg-Marquardt method which works in computer vision

peter

Text Box

X_ij is the i-th point in the j-th camera in 2D the X^_ij does depend only on the j-th camera, a_j; in 3D the X^_ij does depend only on the i-th point, b_i ;

peter

Text Box

part of the the normal equations for 3 cameras and 4 points

Sparse bundle adjustment• Eliminate dependence of camera/motion

parameters on structure parametersNote in general 3n >> 11m

WT V

U-WV-1WT

=×⎥⎦⎤

⎢⎣⎡ − −

NI0WVI 1

11xm 3xn

Allows much more efficient computationse.g. 100 views,10000 points,

solve ±1000x1000, not ±30000x30000

Often still band diagonaluse sparse linear algebra algorithms

meer

Text Box

~11 * cameras

meer

Text Box

separating the two parts

meer

Text Box

See 2HZSec.A6.7.

peter

Text Box

the normal equation is symmetrical

estimation principles - rutgers school of …soe.rutgers.edu/~meer/teachtoo/estimation-2.pdfdf hd f...

Documents