estimation principles - rutgers school of …soe.rutgers.edu/~meer/teachtoo/estimation-2.pdfdf hd f...

44
ESTIMATION PRINCIPLES in computer vision

Upload: truongngoc

Post on 18-Apr-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

ESTIMATION PRINCIPLES

in computer vision

Thank you for the slides. They come mostly from the following sources.

Marc Pollefeys U. on North Carolina

Ramani Duraiswami U. of Maryland

meer
Line
meer
Line
peter
Text Box
of

Derivative of a matrix

meer
Line
peter
Text Box
J(x) = delta f(x) / delta x we use J(x)^T, will be (d x n)
peter
Text Box
if it is a square matrix then gives a scalar Jacobian determinant too
peter
Line

Jacobian and Hessian

peter
Text Box
J here is the GRADIENT of f and is transposed (1xd). The variable x is d-dimensional so it is (1xd) (dx1) = 1. H = H^T since is a symmetrical matrix

Least Squares, SVD, Pseudoinverse

• Ax=b A is m×n, x is n×1 and b is m×1.• A=USVt where U is m×m, S is m×n and V is n×n• USVt x=b. So SVt x=Utb• If A has rank r, then r singular values are significant

Vtx= diag(σ1-1,…,σr

-1 ,0, …, 0)Utbx= Vdiag(σ1

-1,…,σr-1 ,0, …, 0)Utb

11

,tri

r i r ri i

σ ε σ εσ +

=

= > ≤∑ u bx v

•Pseudoinverse A+=V diag(σ1-1,…,σr

-1 ,0, …, 0) Ut

–A+ is a n×m matrix. –If rank (A) =n then A+=(AtA)-1A

–If A is square A+=A-1

meer
Text Box
T
peter
Text Box
A has rank r unperturbed like here !!

Well Posed problems• Hadamard postulated that for a problem to be “well

posed”1. Solution must exist2. It must be unique3. Small changes to the input data should cause small changes to the

solution

• Many problems in science and computer vision result in “ill-posed” problems.

– Numerically it is common to have condition 3 violated.

• Recall from the SVD 11

,tni

i r ri i

σ ε σ εσ +

=

= > ≤∑ u bx v

•If σs are close to zero small changes in the “data” vector b cause big changes in x.•Converting ill-posed problem to well-posed one is called regularization.

meer
Text Box
Jacques Salomon Hadamard (December 8,1865 –October 17,1963)

Regularization• Pseudoinverse provides one means of

regularization

• Another is to solve (A+εI)x=b 21

( )n

tii i

i i

σε σ=

=+∑x u b v

•Solution of the regular problem requires minimizing of ||Ax-b||2

•This corresponds to minimizing

||Ax-b||2 + ε||x||2

–Philosophy – pay a “penalty” of O(ε) to ensure solution does not blow up.–In practice we may know that the data has an uncertainty of a certain magnitude … so it makes sense to optimize with this constraint.

•Ill-posed problems are also called “ill-conditioned”

peter
Text Box
in computer vision

Derivative

• In 1-D

• Taylor series: for a continuous function

0

( ) ( )limh

df f x h f x

dx h→+ −=

( )

2 2

2

2 2

2

( ) ( )2 !

( ) ( ) 12 !

n n

nx x x

n nn

nx x x

df h d f h d ff x h f x h

dx dx n dx

df h d f h d ff x h f x h

dx dx n dx

+ = + + + + +

− = − + + + − +

" "

" "

f(x)

x

•Geometric interpretation –Approximate smooth curveby values of tangent, curvature, etc.

peter
Text Box
----------------> gradient vector points toward the direction of maximum change
peter
Text Box
second derivatives...
peter
Line
peter
Text Box
An example of local interpolation: DERIVATIVES, POLYNOMIAL INTERPOLATION, SPLINES

Remarks• Mean value theorem:

– f(b)-f(a)=(b-a)df/dx|c a<c<b

– There is at least one point betweena and b on the curve where the slope matches that of the straight line joiningthe two points x

f(x)

•df/dx=0–represents a minimum, maximum or saddle point of the curve y=f(x)

–d2f/dx2 > 0 minimum, d2f/dx2 < 0 maximum

–d2f/dx2 = 0 saddle point

Finite differences

• Approximate derivatives at points by using values of a function known at certain neighboring points

• Truncate Taylor series and obtain an expression for the derivatives

• Forward differences: use value at the point and forward x x x x

• Backward differences

( ) ( )

( ) ( )

21 2

2

21 2

2

( ) ( )2

( ) ( )2

x x

x x

df h d fh f x h f x O h

dx dx

df h d fh f x f x h O h

dx dx

= + − − +

= − − + +

peter
Text Box
forward differences
peter
Text Box
backward differences

Finite Differences• Central differences

– Higher order approximation

( )

( )

2 22

2 2

2

( ) ( ) ( ) ( )2

2 2

( ) ( )2

x x x

x

df f x h f x h d f f x f x h h d fO h

dx h dx h dx

df f x h f x hO h

dx h

+ − − −= − + + +

+ − −= +

–However we need data on both sides

–Not possible for data on the edge of an image

–Not possible in time dependent problems (we have data at current time and previous one)

peter
Text Box
add the two together...

Approximation• Order of the approximation O(h), O(h2)

• Sidedness, one sided, central etc.

• Points around point where derivative is calculated that are involved are called the “stencil” of the approximation.

• Second derivative

( )2 2

22 2

2

2 2

( ) ( ) ( ) ( )0

2 2

( ) 2 ( ) ( )( )

x x

f x h f x h d f f x f x h h d fO h

h dx h dx

d f f x h f x f x hO h

dx h

+ − − −= − − + +

+ − + −= +

23 ( ) 4 ( ) ( 2 )( )

2df f x f x h f x h

O hdx h

− + + − += +

•One sided difference of O(h2)

meer
Text Box
---
meer
Text Box
h^2
peter
Text Box
forward d f (h, h+1) - h/2 d^2 f (h, h+1, h+2)
peter
Text Box
df (forward) - df (backward) = 0

Polynomial interpolation• Instead of playing with Taylor series we can obtain fits

using polynomial expansions.– 3 points fit a quadratic ax2+bx+c

• Can calculate the 1st and 2nd derivatives

– 4 points fit a cubic, etc.

• Given x1, x2, x3, x4 and values f1, f2, f3, f412 3 2 3

0 01 11 1 1 1 1 12 3 2 3

1 12 22 2 2 2 2 22 3 2 3

2 23 33 3 3 3 3 32 3 2 3

3 34 44 4 4 4 4 4

1 1

1 1

1 1

1 1

a af fx x x x x x

a af fx x x x x x

a af fx x x x x x

a af fx x x x x x

− = =

•Vandermonde system – fast algorithms for solution.•If more data than degree .. Can get a least squares solution.•Matlab functions polyfit, polyval

Remarks• Can use the fitted polynomial to calculate derivatives

• If equation is solved analytically this provides expressions for the derivatives.

• Equation can become quite ill conditioned – especially if equations are not normalized.

ax2+bx+c can also be written as a* (x-x0)2+b* (x-x0) + c*

– Find the polynomial through x0-h, x0, x0+h

20 1

1 02

2 1

1

1 0 0

1

h h a f

a f

h h a f

− − =

–a0=f0, a1=,(f1-f-1)/2h a2=(f-1-2f0+f1)/2h2

–Gives the expected values of the derivatives.

peter
Text Box
peter
Text Box
first der.
peter
Text Box
second der.
peter
Text Box
"mean"

Polynomial interpolation• Results from Algebra

– Polynomial of degree n through n+1 points is unique – Polynomials of degree less than xn is an n dimensional space.– 1,x,x2, …,xn-1 form a basis.

• Any other polynomial can be represented as a combination of these basis elements.

– Other sets of independent polynomials can also form bases.

• To fit a polynomial through x0,…,xn with values f0, …,fn– Use Lagrangian basis lk.

0,

, 0,...,n

ik

i k ii k

x xl k n

x x=≠

−= =−∏

–p(x)=a0l0+a1l1+…+anln.–Then ai=fi

–Many polynomial bases: Chebyshev, Legendre, Laguerre …–Bernstein, Bookstein …

peter
Text Box
from i = 0 to i = n
peter
Text Box
peter
Text Box
Lagrangian polynomial better solution than through the Vandermonde matrix better interpolation functions
peter
Text Box
p(x_i) = f_i by definition also p(x_i) = f_i delta_{ii} because this l_i = 1 in x_i and 0 otherwise p(x) is n-dimensional so in the sum of the factors

Increasing n• As n increases we can

increase the polynomial degree.

• However the function in between is very poorly interpolated.

• Becomes ill-posed.

• For large n interpolant blows up.

•Idea: –Taylor series provides good local approximations

–Use local approximations

•Splines

Spline interpolation• Piecewise polynomial approximation

– E.g. interpolation in a table– Given xk ,xk+1, fk and fk+1 evaluate f at a point x such that

xk<x<xk+1

11 1

1 1

,( )

0 , otherwise

k kk k k k

k k k k

x x x xf f x x x

x x x xf x+

+ ++ +

− − + ≤ ≤ − −=

•Construct approximations of this type on each subintervalThis method uses Lagrangian interpolants

•Endpoints are called breakpoints

•For higher polynomial degree we need more conditions

• e.g. specify values at points inside the interval [xk<x<xk+1]•Specifying function and derivative values at the end points xk,xk+1 leads to cubic Hermite interpolation

peter
Line

Cubic Spline• Splines – name given to a flexible piece of wood used by

draftsmen to draw curves through points.– Bend wood piece so that it passes through known points and draw a line

through it.– Most commonly used interpolant used is the cubic spline– Provides continuity of the function, 1st and 2nd derivatives at the

breakpoints.– Given n+1 points we have n intervals– Each polynomial has four

unknown coefficients• Specifying function values

provides 2 equations• Two derivative continuity

equations provides two more

{ }

( ) ( )( ) ( )

'' ''1

' '1

, , 1,..., 1

( ) 1, , 1

2, ,

2, ,

i i

i i

i i

i i

x f i n

P x f i n

P x P x i n

P x P x i n

= += = +

= =

= =

"

"

"

•Left with two free conditions. Usually chosen so that second derivatives are zero at ends

peter
Text Box
P(x) = A f_j + B f_{j+1} + C f''_j + D f''_{j+1} A,B linear in x due to Lagrange interpolation C,D cubic in x due to second order derivative and through A, B
peter
Text Box
P(x) spline x_i <= x <= x_{i+1}
peter
Text Box
f'' and x on i-1, i, i+1 = f and x on i-1, i, i+1 n-1 linear equations f''' are the unknowns
peter
Text Box
at i = 1 and i = n+1
peter
Text Box
taking this into account we obtain
peter
Text Box
A = [x_(j+1) - x] / [x_(j+1} - x_j] B = 1 - A C = 1/6 [A^3 - A] [x_{j+1} - x_j]^2 D = 1/6 [B^3 - B] [x_{j+1} - x_j]^2

Interpolating along a curve

• Curve can be givenas x(s) and y(s)

• Given xi,yi,si

• Can fit splines for x and y

• Can compute tangents, curvature and normal based on this fit

• Things like intensity van vary along the curve. Can also fit I(s)

*

*

**

**

* **

meer
Text Box
C

Typical Optimization Problems• Model fitting

– Fit a straight line or polynomial through data

yi=Σj aj xji

– Fit a sum of cosines, exponentials etc.

yi=Σj aj φj(xi)

Model φj s, parameters ajs data (xi,yi)

• Determine a transformation– Determine a homography matrix

x′′′′ = Hx– Determine the fundamental matrix

x′′′′tFx = 0x x ′′′′

H

x

y

Algebraic Distance• Algebraic system Ax = b

• Approximate solution x/

• Residual ||A x/ - b||

• Residual is also calledalgebraic distance

• Algorithms that seek toreduce the residual arecalled “minimumresidual” algorithms

x

y

peter
Text Box
error only in y ! not used by now...
peter
Text Box
x = [x y]^T !!!
peter
Line
peter
Line
peter
Text Box
this is even less that the algebraic distance
peter
Text Box
x' can be any point given at the beginning !!

Scaling• Try to avoid anyone equation being overly represented.

• Scale each equation– Scale by largest coefficient so that it becomes 1

ai1/a11

– Scale so that sum of coefficients is 1

a112+ a12

2 +…+ a1n2=1

• Scaling also has the benefit of avoiding round-off.

peter
Text Box
for example: in fundamental matrix estimation scaling could be beneficial

Weighted Least Squares• Multiplying an equation by a number will increase its

weight or influence in the cost function.

• Not always a bad thing– May want to weight different equations differently

• How to select weights?– Number of observations

– Reliability of measurement• Measured variances

• How good is the least squares solution? How “probable”are the parameter estimates?

• Bring in notions of statistics

peter
Text Box
Maximum Likelihood parameter estimation (with Gaussian distribution) in just one of many ways....
peter
Text Box
but the weighted introduce also additional dependence !
peter
Text Box
in computer vision only with robust (outliers removes) processing can one be successful

Cost functions for image based data• Notation

– Measured value of a point x∼

– True value of a point x– Estimated value of a point x^

– Transformation or model is denoted H• Model y=H(x) and x=H-1(y)

• Symmetric error functions– Case 1: Error only in one image

• Could arise if we are imaging a calibration pattern with knowncoordinates and trying to determine camera calibration

– Appropriate error function is

Find H^ that minimizes Σj d(x′′′′~j,H^xj

~)2

peter
Text Box
error in all coordinates here only two coord.
peter
Text Box
of the two images Not used in general.
peter
Text Box
in fact this is a heteroscedastic process since point in 2D has different variance
peter
Text Box
As
peter
Text Box
peter
Text Box
full rank !

Cost Functions

• Ideally there is areal cost beingminimized– E.g. Dollars or

distance travelled– Then each equation

makes sense

• Statistical measures• Review concepts of

Metrics

meer
Text Box
Hartley & Zisserman First edition number
peter
Line
peter
Line

Constrained optimization• We have to optimize f(x) subject to g(x)=0

– Makes sense if g(x)=0 leaves a few degrees of freedom (N-M)

• Approach 1 (Eliminate constraints)– Eliminate variables using constraint equations and solve a

reduced problem f(x*)=0 – Not practical, except for simple problems

• Approach 2 (Penalty function)– Construct a new minimization function f(x)+Pg(x) where P>>1– If constraint is violated the minimization function increases

rapidly, forcing the optimization routine to solutions where it is not violated

• Approach 3 (Lagrange Multipliers)– Solution has to lie on the surface of g(x)=0– Can’t have ∇f =0 anymore– However we require ∇f parallel to ∇g=0

peter
Line

Lagrangian• Consider the Lagrangian function

L(x, λ) = f + λg

gLgfL =∂∂∇+∇=

∂∂ ),(,),( λ

λλλ xx

x• Extremize the Lagrangian

0)(),(,0),( ==∂∂=∇+∇=

∂∂ xxxx

gLgfL λλ

λλ

• So this gives us both the constraint equation and the way to optimize the function on the surface.

Optimization Techniques• Different problem here

– Given a set of locations xi where one has measured a fitness functionχ/f(xxxx) find a vector of parameters (xxxx) that minimizes it

• For the case where the function was linear we already have methods such as SVD to solve the linear system.

• Here we are concerned with systems where the equation is not so simple. – In particular f may be a nonlinear function of parameters xxxx

• Differential calculus provides us with ways of estimating extrema.– The minimum (max) of f occurs at ∇f / , or– ∇f is in the direction of increasing f, or– Given an interval ∇f has opposite signs at the boundary there must

be a point inside where ∇f must be zero

• However calculus is local– So these methods can only guarantee a local extremum

meer
Text Box
=
peter
Text Box
Almost all the models in computer vision describing 3D relations, 3D to 2D relations or 2D to 2D relations, are nonlinear mainly because the projection transformation. Exmp. epipolar geometry, camera calibration etc.
peter
Line
peter
Text Box
the second derivatives, H, is positive definite --- x in a local minimum H, is negative definite --- x in a local maximun

Bisection methods• Given a function f at three points a,b,c with [a<b<c], and a

way to evaluate f at a new point– Given 2 initial guesses f(a) and f(b), if f(a)>f(b) move in the

direction a to b and choose a new parameter c.

– Find a triplet [a,b,c] sothat and f(c)>f(b)and f(a)>f(b)

– Choose a new pointbetween a and bor b and c

– Repeat until thepoints a, b and care sufficientlyclose

peter
Text Box
no derivatives !!!

Paraboloic bracketing

peter
Text Box
in multidimension can not work especially when more than one extrema are close

Bracketing a minimum in multiple dimensions

• Smallest region bounded by a group of points in– 1D is bounded by two points (a line segment)

– 2D is bounded by three points (a triangle)

– 3D by four points (a tetrahedron)

– In ND by N+1 points (a simplex)

• Can find a direction of a decreasing function in – 1D by the line from point with higher value to lower

– 2D by joining point with highest value through point with average value on the opposite side of the triangle

– And so on for ND

• However cannot guarantee a bracket of a minimum in ND

peter
Text Box
no derivatives !
peter
Text Box
affine space

Downhill Simplex Method (Nelder-Mead)• Reflection: Project along the

direction of decrease with size 1.

• Reflection and expansion:If decrease is large try a step of size 2.

• Contraction: Result of reflection is bad, so try a simple reduction within simplex.

• Multiple contraction: If result of contraction does not give a better result than lowest point.

• Conclude: volume of simplex becomes below tolerance.

meer
Text Box
J. A. Nelder and R. Mead. A simplex method for function minimization .Computer Journal, 7:308.313, 1965.
meer
Text Box
mean minus the highest sample !
meer
Text Box
tetrahedron
meer
Text Box
relative to the lowest point

Newton iteration

Taylor approximation

Δ+≈Δ+ J)(P )(P 00 ff PXJ∂∂

=Jacobian

)(PX 1f−

Δ−=Δ−−≈− JJ)(PX)(PX 001 eff

( ) 0T-1T

0TT JJJJJJ ee =Δ⇒=Δ⇒

Δ+=+ i1i PP ( ) 0T-1T JJJ e=Δ

( ) 01-T-11-T JJJ eΣΣ=Δ

normal eq.

peter
Text Box
X is the measurement vector P is the parameter vector, to be found X = f(P) + e_0 and the vector e_0 has minimized
peter
Text Box
matrix
peter
Text Box
It is like as the beginning not transpose as we use it in computation. You just have to be consistent with the notation.
peter
Text Box
P_1 = P_0 + DELTA
peter
Text Box
J+ is more correct
peter
Text Box
iteration till convergence a local minimum, and depends on P_0 too
peter
Text Box
minimizing this expression
peter
Text Box
if the function f in not too nonlinear, J^T J is a good approximation of the Hessian matrix and the method is the Gauss-Newton technique
peter
Text Box
i
peter
Text Box
if X weighted with a Gaussian covariance SIGMA_X J^T SIGMA_X^{-1} J DELTA_i = J^T SIGMA_X^{-1} e_i
peter
Text Box
GAUSS --

Gradient Descent• We have a function f and an estimate of its gradient ∇f

• Decrease f by a quantity along the direction of ∇f– Begin initialize x, tol, k=0

do k<-k+1

x x-hk ∇f

until hk∇f<tol`return x

end

• Determining h is not easy– Called “learning rate” in AI

– Hard to determine h• If h is too small algorithm will be too slow to converge. If it is too large the

procedure will diverge

• Can select it using a line search or using a Newton method.

peter
Text Box
slow convergence due to zig-zagging
peter
Text Box
More complicated steepest descent techniques also exist. We will not describe them.

600 Appendix 6 Iterative Estimation Methods

Levenberg-Marquardt method is essentially a Gauss-Newton method that transitions smoothly to gradient descent when the Gauss-Newton updates fail.

To summarize, we have so far considered three methods of minimization of a cost function g(P) = ||e(P)||2/2:

(i) Newton. Update equation:

C / P P A = -gP

where gPP = eJeP + e j p e and c/P = eje. Newton iteration is based on the assumption of an approximately quadratic cost function near the minimum, and will show rapid convergence if this condition is met. The disadvantage of this approach is that the computation of the Hessian may be difficult. In addition, far from the minimum the assumption of quadratic behaviour is probably invalid, so a lot of extra work is done with little benefit,

(ii) Gauss-Newton. Update equation:

eJePA = —eje

This is equivalent to Newton iteration in which the Hessian is approximated by e je p . Generally this is a good approximation, particularly close to a minimum, or when e is nearly linear in P.

(iii) Gradient descent. Update equation:

AA = —eje = —gF.

The Hessian in Newton iteration is replaced by a multiple of the identity ma­trix. Each update is in the direction of most rapid local decrease of the function value. The value of A may be chosen adaptively, or by a line search in the downward gradient direction. Generally, gradient descent by itself is not rec­ommended, but in conjunction with Gauss-Newton it yields the commonly used Levenberg-Marquardt method.

A6.2 Levenberg-Marquardt iteration

The Levenberg-Marquardt (abbreviated LM) iteration method is a slight variation on the Gauss-Newton iteration method. The normal equations JT J A = — JTe are replaced by the augmented normal equations (JT J + Al)A = - J T e , for some value of A that varies from iteration to iteration. Here I is the identity matrix. A typical initial value of A is 1CT3 times the average of the diagonal elements of N = JT J.

If the value of A obtained by solving the augmented normal equations leads to a reduction in the error, then the increment is accepted and A is divided by a factor (typi­cally 10) before the next iteration. On the other hand if the value A leads to an increased error, then A is multiplied by the same factor and the augmented normal equations are solved again, this process continuing until a value of A is found that gives rise to a decreased error. This process of repeatedly solving the augmented normal equations for different values of A until an acceptable A is found constitutes one iteration of the LM algorithm. An implementation of the LM algorithm is given in [Press-88].

meer
Line
peter
Text Box
g_P ~Jacobian g_PP ~Hessian
peter
Text Box
minimization in DELTA
peter
Text Box
e(P) = f(P) - X ! reversed !
peter
Line
peter
Line
peter
Line
peter
Line

Levenberg-Marquardt

0TT JNJJ e=Δ=Δ

0TJN' e=Δ

Augmented normal equations

Normal equations

J)λdiag(JJJN' TT +=

30 10λ −=

10/λλ :success 1 ii =+

ii λ10λ :failure = solve again

accept

λ small ~ Newton (quadratic convergence)

λ large ~ descent (guaranteed decrease)

meer
Text Box
Gauss-
peter
Text Box
in book ... = - J^T e_0 !!!
peter
Text Box
from the Gauss-Newton iteration is changed a little
peter
Text Box
error decreases
peter
Text Box
error increases
peter
Text Box
next iteration...

Levenberg-Marquardt

Requirements for minimization• Function to compute f• Start value P0

• Optionally, function to compute J(but numerical ok, too)

peter
Text Box
In NumRec book too N = J^T J and a entries are N'_ii = (1 + lambda) N_ii N'_ij = N_ij for small lambda is Gauss-Newton, for large lambda off-diagonal entries because insignificant
peter
Text Box
if lambda is large >>1 will be still a down-hill direction but not the same with pure gradient descent since each parameter p_i minimizes separetly
peter
Text Box
if one entry J_i = delta f / delta p_i ~ 0 the entry in N and N' is also zero and N will be singular. Rare occurrence!
peter
Text Box
the maximum between |10^{-4} x_i | and 10^{-6}

Sparse Levenberg-Marquardt

• complexity for solving• prohibitive for large problems

(100 views 10,000 points ~30,000 unknowns)

• Partition parameters• partition A • partition B (only dependent on A and itself)

0T-1 JN' e=Δ3N

peter
Text Box
3D recovery of points:
peter
Text Box
two views for 2D homography: 8(9) parameters, 2n (x,y) in the first image; 2n + 8; the second image errors comes from the first !
peter
Text Box
11(12)*100 + 3*10000
peter
Text Box
set of parameters describing the cameras
peter
Text Box
set of parameters describing the points
peter
Text Box
the parameter vector P = (a^T, b^T)^T measurement vector X SIGMA_X covariance matrix of X
peter
Text Box
a
peter
Text Box
b
peter
Text Box
a
peter
Text Box
only if N' is nonsingular otherwise N'{+}
peter
Text Box
measurement vector and SIGMA_X in real coordinates or variances, e.g. x, y.

ε = X – Xhat we seek to minimize ||ε ||2ΣX

A = [∂Xhat /∂a] B = [∂Xhat /∂ b] and the normal equation is

JT Σ-1X J Δ = JT Σ-1

X ε J = [A | B ]

U, V and W, with * in the book because multiplying with 1 + λon the diagonal, are matrices (AT Σ-1

X A)…

peter
Text Box
[ A^T SIGMA_X^{-1} A A^T SIGMA_X^{-1} B ] [ DELTA_a ] [ A^T SIGMA_X^{-1} e ] = [ B^T SIGMA_X^{-1} A B^T SIGMA_X^{-1} B ] [ DELTA_b ] [ B^T SIGMA_X^{-1} e ]
peter
Text Box
so U = A^T SIGMA_X^{-1} A and V = B^T SIGMA_X^{-1} B become [....]* we don't put [...]* but leave in [...]

Sparse bundle adjustment

residuals:normal equations:

with

note: tie points should be in partition A

meer
Line
peter
Text Box
measured camera points
peter
Text Box
camera part
peter
Text Box
point part
peter
Text Box
a
peter
Text Box
B
peter
Text Box
b
peter
Text Box
A
peter
Text Box
these are without the covariance SIGMA_X^{-1} !!!
peter
Text Box
a
peter
Text Box
a
peter
Text Box
a
peter
Text Box
derivative of one camera
peter
Text Box
b
peter
Text Box
b
peter
Text Box
b
peter
Text Box
b
peter
Text Box
a
peter
Text Box
derivative of a point
peter
Text Box
k number of cameras i number of points
peter
Text Box
A
peter
Text Box
B
peter
Text Box
U and V without the " * "

Sparse bundle adjustment

normal equations:

modified normal equations:

solve in two parts:

meer
Text Box
| | | | | | |
peter
Text Box
A
peter
Text Box
A
peter
Text Box
A
peter
Text Box
a
peter
Text Box
a
peter
Text Box
a
peter
Text Box
a
peter
Text Box
cameras part
peter
Text Box
points part
peter
Text Box
b
peter
Text Box
B
peter
Text Box
B
peter
Text Box
B
peter
Text Box
B
peter
Text Box
b
peter
Text Box
b
peter
Text Box
B
peter
Text Box
multiply with this matrix
peter
Text Box
In the book Algorithm A6.1 gives the partitioned Levenberg-Marquardt method

Sparse bundle adjustment

• Covariance estimation

-1WVY =( )

1a

Tb

1-a

VYYWWVU

+

+∑=∑−=∑

Yaab ∑=∑ -

peter
Text Box
SIGMA_P = (J^T SIGMA_X^{-1} J)^{+} pseudo-inverse will cover it later in the course
peter
Text Box
^T
peter
Text Box
exmp. no variation allowed in the directions perpendicular to the constraint surface
peter
Text Box
assuming V is invertible the points
peter
Text Box
used the formula if a matrix G is invertible (G H G^T)^{+} = G^{-T} H^{+} G^{-1} valid if null(H) G^T = null(H) G^{-1} start with J^T SIGMA_X^{-1} J and transform is into G H G^T See book.

Sparse bundle adjustment

U1

U2

U3

WT

W

V

P1 P2 P3 M

Jacobian of has sparse block structure

=J == JJN T

12xm 3xn(in general

much larger)

im.pts. view 1

( )( )∑∑= =

m

k

n

iikD

1 1

2ki M̂P̂,m

Needed for non-linear minimization

meer
Text Box
11
peter
Text Box
cameras points
peter
Text Box
the covariance matrix SIGMA_X is block-diagonal diag[ SIGMA_1 ... SIGMA_n ]
peter
Text Box
In the book Algorithm A6.3 and A6.4 gives the partitioned Levenberg-Marquardt method which works in computer vision
peter
Text Box
X_ij is the i-th point in the j-th camera in 2D the X^_ij does depend only on the j-th camera, a_j; in 3D the X^_ij does depend only on the i-th point, b_i ;
peter
Text Box
part of the the normal equations for 3 cameras and 4 points

Sparse bundle adjustment• Eliminate dependence of camera/motion

parameters on structure parametersNote in general 3n >> 11m

WT V

U-WV-1WT

=×⎥⎦⎤

⎢⎣⎡ − −

NI0WVI 1

11xm 3xn

Allows much more efficient computationse.g. 100 views,10000 points,

solve ±1000x1000, not ±30000x30000

Often still band diagonaluse sparse linear algebra algorithms

meer
Text Box
~11 * cameras
meer
Text Box
separating the two parts
meer
Text Box
See 2HZSec.A6.7.
peter
Text Box
the normal equation is symmetrical