computer vision: models, learning and inference convexity ...cv192/wiki.files/cv192...cv/ml people)...

Post on 03-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computer Vision: Models, Learning and Inference–

Convexity; Least Squares; Robustness; Some Statistics

Oren Freifeld and Ron Shapira-Weber

Computer Science, Ben-Gurion University

April 7, 2019

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 1 / 82

1 Convexity

2 Linear Model, Least Squares and Weighted Least Squares

3 Outliers, “Wrong” Assumptions, and Robustness

4 Some Statistics and M-Estimators

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 2 / 82

Convexity

Definition (convex function from I ⊂ R to R)

Let I ⊂ R be an interval.f : I → R is called convex if, ∀x1, x2 ∈ I, ∀t ∈ (0, 1),

f(tx1 + (1− t)x2) ≤ tf(x1) + (1− t)f(x2) .

In words: any chord connecting two values of f is always not below thegraph of f .

Example

f(x) = x2 is convex.

Example

f(x) = |x| is convex.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 3 / 82

Convexity

Definition (strictly-convex function from I ⊂ R to R)

Let I ⊂ R be an interval.f : I → R is called strictly convex if, ∀x1 6= x2 ∈ I, ∀t ∈ (0, 1),

f(tx1 + (1− t)x2) < tf(x1) + (1− t)f(x2)

In words: any chord connecting two values of f is always above thegraph of f .

A strictly-convex function is convex; the converse is false.

Example

f(x) = x2 is strictly convex.

Counterexample

f(x) = |x| is convex but not strictly convex.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 4 / 82

Convexity

Fact

A differentiable f : I → R is convex ⇐⇒

f(x) ≥ f(y) + f ′(y)(x− y) ∀x, y ∈ I

and it is strictly convex ⇐⇒

f(x) > f(y) + f ′(y)(x− y) ∀x, y ∈ I , x 6= y

a differentiable convex function f is not below its tangents.

a strictly differentiable convex function f is above its tangents.

Fact

f is convex and differentiable & f ′(x) = 0 ⇒ x is a global minimum of f .

Fact

f ′′ exists & f ′′ ≥ 0 (resp. f ′′ > 0) on I ⇒ f is convex (strictly convex).

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 5 / 82

Convexity

Fact

A differentiable f : I → R is convex ⇐⇒

f(x) ≥ f(y) + f ′(y)(x− y) ∀x, y ∈ I

and it is strictly convex ⇐⇒

f(x) > f(y) + f ′(y)(x− y) ∀x, y ∈ I , x 6= y

a differentiable convex function f is not below its tangents.

a strictly differentiable convex function f is above its tangents.

Fact

f is convex and differentiable & f ′(x) = 0 ⇒ x is a global minimum of f .

Fact

f ′′ exists & f ′′ ≥ 0 (resp. f ′′ > 0) on I ⇒ f is convex (strictly convex).

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 5 / 82

Convexity

Definition (convex combination (of points))∑mi=1 αixi, a linear combination of {xi}mi=1 ⊂ Rn, is called convex if all

the αi’s are nonnegative and∑m

i=1 αi = 1.

Definition (convex set)

S ⊂ Rn is called convex if it is closed under convex combinations.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 6 / 82

Convexity

Definition (convex function from a convex subset of Rn to R)

Let S ⊂ Rn be convex. f : S → R is called convex if,∀x1,x2 ∈ S,∀t ∈ (0, 1),

f(tx1 + (1− t)x2) ≤ tf(x1) + (1− t)f(x2)

and it is called strictly convex if, ∀x1 6= x2 ∈ I, ∀t ∈ (0, 1),

f(tx1 + (1− t)x2) < tf(x1) + (1− t)f(x2)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 7 / 82

Convexity

Fact

Let S ⊂ Rn be convex. A differentiable f : S → R is convex ⇐⇒

f(x) ≥ f(y) +∇f(y)(x− y) ∀x,y ∈ S

and it is strictly convex ⇐⇒

f(x) > f(y) +∇f(y)(x− y) ∀x,y ∈ S ,x 6= y

Fact

f is convex and differentiable & ∇f(x) = 0 ⇒ x is a global minimum off .

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 8 / 82

Convexity

Definition (Hessian)

the Hessian matrix of a twice-differentiable f : Rn → R is

∂2f

∂x21

∂2f

∂x1 ∂x2· · · ∂2f

∂x1 ∂xn

∂2f

∂x2 ∂x1

∂2f

∂x22· · · ∂2f

∂x2 ∂xn...

.... . .

...

∂2f

∂xn ∂x1

∂2f

∂xn ∂x2· · · ∂2f

∂x2n

(1)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 9 / 82

Convexity

Fact

A twice-differentiable f : S → Rn is convex (resp. strictly convex) ⇐⇒its Hessian matrix is SPSD (SPD) on the interior of S.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 10 / 82

Convexity

Fact

A weighted sum, with nonnegative weights, of convex functions is convex.A weighted sum, with nonnegative weights, of strictly convex functions isstrictly convex (if we ignore the degenerate case that all the weights arezero).

Example

If (xi)ni=1 ⊂ R and (wi)

ni=1 ⊂ R≥0 (and the wi’s do not depend on the

xi’s) then

f : R→ R f(µ) =∑N

i=1wi(µ− xi)2

is strictly convex while

g : R→ R g(µ) =∑N

i=1wi|µ− xi|

is convex.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 11 / 82

Convexity

Norms

Fact

Every norm is a convex function.

Example

Let p ≥ 1.f : Rn → R≥0, f : x 7→ ‖x‖`p is convex.

Example

Let Q be an n× n SPD matrix.f : Rn → R≥0, f : x 7→ ‖x‖Q , (xTQx)1/2 is convex.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 12 / 82

Convexity

Some Operations that Preserve Convexity of Functions

Nonnegative weighted sum of convex functions is convex

Example

f : Rn → R≥0 f : x 7→ ‖x‖2`2 ,n∑i=1

x2i

Composition with an affine function.

Example

f : Rn → R≥0 f : x 7→ ‖Ax+ b‖`p A ∈ Rm×n b ∈ Rm

Pointwise maximum and supremum.

Example

f(x) = maxi∈{1,2,3}

{f1(x), f2(x), f3(x)}

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 13 / 82

Convexity

Convex Optimization

Convex functions are “easy” to optimize.

Sometimes they have closed-form solutions for their minimizer(s).

Exercise

f : R→ R f : x 7→ ax2 + bx+ c a ∈ R>0 b, c ∈ R

Show that arg minx f(x) = − b2a (as you saw in high school).

But, more importantly, even whey have no closed-form minimizers it isstill “easy”.

Textbook: Boyd and Vandenberghe’s Convex Optimization. Also checkout Boyd’s video lectures at Stanford’s youtube channel.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 14 / 82

Convexity

Convex Optimization

The take-home message: while learning convex optimization is veryuseful, unless you want to specialize in that field, it often suffices (forCV/ML people) to focus on how to:

recognize the problem is convex; ortransform a non-convex problem to a convex one; orapproximate a non-convex problem using a convex one; orsolve a non-convex problem by iterating between convex subproblems.

E.g., this is how Convex Optimization is taught by Boyd at Stanford.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 15 / 82

Linear Model, Least Squares and Weighted Least Squares

Linear Model

Measurements: {yNi=1} ⊂ Rd

Hi is a known matrix, possibly dependent on i.

Residual of a linear model:

ri︸︷︷︸d×1

= Hi︸︷︷︸d×k

θ︸︷︷︸k×1

− yi︸︷︷︸d×1

(2)

When we discussed optical flow for gray-scale images, d was 1.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 16 / 82

Linear Model, Least Squares and Weighted Least Squares

Least-Squares Estimation in a Linear Model

θLS , arg minθ

N∑i=1

‖ri‖2︸ ︷︷ ︸‖r‖2

H ,[HT

1 · · · HTN

]T ∈ R(Nd)×k

y ,[yT1 · · · yTN

]T ∈ R(Nd)×1

r ,[rT1 · · · rTN

]T= Hθ − y =∈ R(Nd)×1 (3)

The cost function, from Rk to R>0, is convex; it’s strictly convex ifrank(H) = k.A minimizer satisfies

HTHθLS = HTy (4)

and is unique if rank(H) = k; i.e., if HTH is invertible.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 17 / 82

Linear Model, Least Squares and Weighted Least Squares

Weighted Least-Squares

θWLS , arg minθ

N∑i=1

wi‖ri‖2︸ ︷︷ ︸‖W 1/2(Hθ−y)‖2

(5)

W = diag(w1, . . . , w1︸ ︷︷ ︸d times

, w2, . . . , w2︸ ︷︷ ︸d times

, wN , . . . , wN︸ ︷︷ ︸d times

)

W 1/2 = diag(√w1, . . . ,

√w1︸ ︷︷ ︸

d times

,√w2, . . . ,

√w2︸ ︷︷ ︸

d times

,√wN , . . . ,

√wN︸ ︷︷ ︸

d times

) (6)

HTWHθWLS = HTWy (7)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 18 / 82

Outliers, “Wrong” Assumptions, and Robustness

Least-Squares Estimation is not Robust

10 5 0 5 1010

0

10

20

30

40

50

60

70Least Squares

true

LS

Alsotrue for weighted least squares.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 19 / 82

Outliers, “Wrong” Assumptions, and Robustness

Robust Least-Squares

Measurements: {yNi=1} ⊂ Rd

Residual: ri︸︷︷︸d×1

= Hi︸︷︷︸d×k

θ︸︷︷︸k×1

− yi︸︷︷︸d×1

Wantarg min

θ

∑i

ρ(‖ri‖)

where ρ : R→ R>0 is differentiable

ψ(x) = dρ(x)dx is called the influence function.

If ρ is of the form x 7→ x2 then this is least squares.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 20 / 82

Outliers, “Wrong” Assumptions, and Robustness

Robust Error Function

Example (Geman-McClure robust error function)

ρ(x) = x2/(x2 + σ2)

30 20 10 0 10 20 30x

0.0

0.2

0.4

0.6

0.8

1.0

Geman-McClure Robust Error Function: ρ(x) = x2/(x2 + σ2)

σ= 0. 5

σ= 1

σ= 5

σ= 10

σ= 20

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 21 / 82

Outliers, “Wrong” Assumptions, and Robustness

Influence Function: Derivative of the Error Function

Example (Derivative of the Geman-McClure robust error function)

ψ(x) =d

dxρ(x) = (2σ2x)/(x2 + σ2)2

30 20 10 0 10 20 30x

1.5

1.0

0.5

0.0

0.5

1.0

1.5Influence Function: ψ(x) = (2σ2x)/(x2 + σ2)2

σ= 0. 5

σ= 1

σ= 5

σ= 10

σ= 20

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 22 / 82

Outliers, “Wrong” Assumptions, and Robustness

Definition (quasiconvex function))

Let S ⊂ Rn be a convex set. f : S → R is called quasiconvex if all itssublevel sets

Sα = {x|f(x) ≤ α} ,

for α ∈ R, are convex.

Example

Geman-McClure robust error function:

x→ x2

x2 + σ2

If S ⊂ R and f : S → R is quasiconvex then f is unimodal.

A quasiconvex function is usually still easy to minimize. But . . .

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 23 / 82

Outliers, “Wrong” Assumptions, and Robustness

Fact

A weighted sum, with nonnegative weights, of quasiconvex functions isusually not quasiconvex.

Example

If (xi)ni=1 ⊂ R and (wi)

ni=1 ⊂ R≥0 (and the wi’s do not depend on the

xi’s) then

f : R→ R f(µ) =

N∑i=1

wiρ(µ− xi) ,

where ρ is the Geman-McClure robust error function, is usually notunimodal.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 24 / 82

Outliers, “Wrong” Assumptions, and Robustness

Minimizing a Nonconvex Robust Cost Function

⇒ when ρ(x) is more robust than |x| then optimization is hard. . .

Example for possible approaches:Assuming ρ is differentiable, can try gradient-based methods, possibly withgraduated optimization such as

graduated non-convexity (minimize (1− t)fconvex + tf); or“annealing” (e.g., gradually decrease σ in the GM robust error function)

Iterative Reweighted Least Squares (IRLS)

Figure taken from Wikipediawww.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 25 / 82

Convexity

Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 26 / 82

Convexity

Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 27 / 82

Convexity

Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 28 / 82

Convexity

Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 29 / 82

Convexity

Iterative Reweighted Least-Squares

Idea: use w(x) = ψ(x)/x

Start with an LS solution, then alternate between computing weightsbased on the residual errors, and a WLS solution using fixed weightsfrom the previous iteration.

Initialization: solve

HTHθ[0]IRLS = HTy (8)

Alternate

1 Set w[k]i =

ψ(∥∥∥r[k]

i

∥∥∥)∥∥∥r[k]i

∥∥∥ where r[k]i =Hiθ

[k]IRLS − yi

2 Solve

HTW [k]Hθ[k+1]IRLS =HTW [k]y (9)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 30 / 82

Convexity

Iterative Reweighted Least-Squares

ERLS =∑i

ρ(‖ri‖)

∇θERLS =∑i

ψ(‖ri‖)∇θ‖ri‖want= 0 (10)

EIRLS =∑i

w(ri)‖ri‖2 (11)

E[k+1]IRLS =

∑i

w(r[k]i )‖ri‖2 =

∑i

ψ(∥∥∥r[k]i ∥∥∥)∥∥∥r[k]i ∥∥∥ ‖ri‖2 ≈

∑i

ψ(∥∥∥r[k]i ∥∥∥) ‖ri‖

0 = ∇θE[k+1]IRLS ≈

∑i

ψ(∥∥∥r[k]i ∥∥∥)∇θ‖ri‖

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 31 / 82

Convexity

Iterative Reweighted Least-Squares

ERLS =∑i

ρ(‖ri‖)

∇θERLS =∑i

ψ(‖ri‖)∇θ‖ri‖want= 0 (10)

EIRLS =∑i

w(ri)‖ri‖2 (11)

E[k+1]IRLS =

∑i

w(r[k]i )‖ri‖2 =

∑i

ψ(∥∥∥r[k]i ∥∥∥)∥∥∥r[k]i ∥∥∥ ‖ri‖2 ≈

∑i

ψ(∥∥∥r[k]i ∥∥∥) ‖ri‖

0 = ∇θE[k+1]IRLS ≈

∑i

ψ(∥∥∥r[k]i ∥∥∥)∇θ‖ri‖

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 31 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Least Squares

true

LS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 1

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 2

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 3

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 4

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 5

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 6

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 7

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 8

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 9

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 10

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 11

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 12

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 13

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 14

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

10 5 0 5 1010

0

10

20

30

40

50

60

70Iterative Rewighted Least Squares, iter 15

true

IRLS

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82

Convexity

Recall Lucas-Kanade

N∑i=1

g(x,xi)

(∇xI(xi, t)

[u(x)v(x)

]+ It(xi, t)

)2

=∥∥∥W 1

2ε∥∥∥2 = εTWε =

N∑i=1

√g(x,xi)∇xI(xi, t)︸ ︷︷ ︸Hi:1×2

[u(x)v(x)

]︸ ︷︷ ︸θ:2×1

+√g(x,xi)It(xi, t)︸ ︷︷ ︸

−yi∈R

2

(12)

W = diag([g(x,x1) . . . g(x,xN )

]) (not the same W from IRLS)

W12 = diag(

[ √g(x,x1) . . .

√g(x,xN )

]) (13)[

u(x)v(x)

]WLS

, arg minu(x),v(x)

εTWε (14)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 33 / 82

Convexity

Robust Lucas-Kanade

ERLK =

N∑i=1

ρ

√g(x,xi)∇xI(xi, t)︸ ︷︷ ︸

Hi:1×2

[u(x)v(x)

]︸ ︷︷ ︸θ:2×1

+√g(x,xi)It(xi, t)︸ ︷︷ ︸

−yi∈R︸ ︷︷ ︸ri∈R

(15)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 34 / 82

Convexity

Robust Affine Lucas-Kanade

[u(xi)v(xi)

]=

[xi − x yi − y 1 0 0 0

0 0 0 xi − x yi − y 1

]︸ ︷︷ ︸

A(x,xi),

θ1θ2θ3θ4θ5θ6

︸ ︷︷ ︸θ,

(16)

ERALK =

N∑i=1

ρ

√g(x,xi)∇xI(xi, t)A(x,xi)︸ ︷︷ ︸

Hi:1×6

θ +√g(x,xi)It(xi, t)︸ ︷︷ ︸

−yi∈R︸ ︷︷ ︸ri∈R

(17)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 35 / 82

Convexity

Random Sample Consensus (RANSAC)

A non-deterministic algorithm

For generality, write the model as f(xi,θ) = yi

Stick to the least-squares formulation

Algorithm: given data {xi,yi}Ni=1, alternate between

Pick a very small random subset of the data. Its cardinality should suffice forestimating the parameters (e.g., you can’t estimate a line from a singlepoint). Fit a least-squares model to it.

For each data point in the original set, compute∥∥∥f(xi, θ)− yi

∥∥∥. Using a

threshold, reject outliers.

Once a certain iteration achieves enough inliers (points not rejected),the algorithm stops.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 36 / 82

Convexity

RANSAC and Optical Flow

For HS-type models, RANSAC is inapplicable.

For LK-type models, it fits only if the neighborhood is sufficiently large.For example, if we try to find a single affine flow for the entire image.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 37 / 82

Some Statistics and M-Estimators

Definition (sample mean for R-valued data)

The sample mean of {xi}Ni=1 ⊂ R is

x ,1

N

N∑i=1

xi (18)

Definition (sample mean for Rn-valued data)

The sample mean of {xi}Ni=1 ⊂ Rn is

x ,1

N

N∑i=1

xi (19)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 38 / 82

Some Statistics and M-Estimators

Definition (sample mean for R-valued data)

The sample mean of {xi}Ni=1 ⊂ R is

x ,1

N

N∑i=1

xi (18)

Definition (sample mean for Rn-valued data)

The sample mean of {xi}Ni=1 ⊂ Rn is

x ,1

N

N∑i=1

xi (19)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 38 / 82

Some Statistics and M-Estimators

The Sample Mean as a Minimizer

Fact

The sample mean minimizes the sum of squared Euclidean distances:

x = arg minµ∈R

N∑i=1

(xi − µ)2 . (20)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 39 / 82

Some Statistics and M-Estimators

Example

Let x1, x2 ∈ R. Then

x =x1 + x2

2= arg min

µ∈R(x1 − µ)2 + (x2 − µ)2 . (21)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 40 / 82

Some Statistics and M-Estimators

The Sample Mean as a Minimizer

Fact

The sample mean minimizes the sum of squared scaled Euclideandistances:

x = arg minµ∈R

N∑i=1

(xi − µ)2

σ2σ > 0 . (22)

Proof.

N∑i=1

(xi − µ)2

σ2=

1

σ2

N∑i=1

(xi − µ)2

arg minµ∈R

1

σ2

N∑i=1

(xi − µ)2 = arg minµ∈R

N∑i=1

(xi − µ)2 = x

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 41 / 82

Some Statistics and M-Estimators

The Sample Mean as a Minimizer

Fact

The sample mean minimizes the sum of squared scaled Euclideandistances:

x = arg minµ∈R

N∑i=1

(xi − µ)2

σ2σ > 0 . (22)

Proof.

N∑i=1

(xi − µ)2

σ2=

1

σ2

N∑i=1

(xi − µ)2

arg minµ∈R

1

σ2

N∑i=1

(xi − µ)2 = arg minµ∈R

N∑i=1

(xi − µ)2 = x

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 41 / 82

Some Statistics and M-Estimators

Example

Let x1, x2 ∈ R. Then

x =x1 + x2

2= arg min

µ∈R

(x1 − µ)2

σ2+

(x2 − µ)2

σ2. (23)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 42 / 82

Some Statistics and M-Estimators

σ = 2:

30 20 10 0 10 20 30

µ

100

0

100

200

300

(µ− x1)2/σ2

30 20 10 0 10 20 30

µ

200

100

0

100

200

300

400

(µ− x2)2/σ2

30 20 10 0 10 20 30

µ

200

100

0

100

200

300

400

both funcs

30 20 10 0 10 20 30

µ

200

0

200

400

600

800

sum both funcs

Σi(xi −µ)2/σ2

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 43 / 82

Some Statistics and M-Estimators

σ = 4:

30 20 10 0 10 20 30

µ

40

20

0

20

40

60

80

(µ− x1)2/σ2

30 20 10 0 10 20 30

µ

50

0

50

100

(µ− x2)2/σ2

30 20 10 0 10 20 30

µ

50

0

50

100

both funcs

30 20 10 0 10 20 30

µ

50

0

50

100

150

200

sum both funcs

Σi(xi −µ)2/σ2

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 44 / 82

Some Statistics and M-Estimators

σ = 4, N = 3:

30 20 10 0 10 20 30

µ

40

20

0

20

40

60

80

all 3 funcs

30 20 10 0 10 20 30

µ

100

0

100

200

300

sum all funcs

Σi(xi −µ)2/σ2

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 45 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

40

20

0

20

40

60

80

100

all 4 funcs

30 20 10 0 10 20 30

µ

100

0

100

200

300

400

sum all funcs

Σi(xi −µ)2/σ2

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 46 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

40

20

0

20

40

60

80

100

all 5 funcs

30 20 10 0 10 20 30

µ

200

100

0

100

200

300

400

500

sum all funcs

Σi(xi −µ)2/σ2

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 47 / 82

Some Statistics and M-Estimators

σ = 4, N = 7 (not robust to outliers):

30 20 10 0 10 20 30

µ

40

20

0

20

40

60

80

100

all 7 funcs

30 20 10 0 10 20 30

µ

200

100

0

100

200

300

400

500

sum all funcs

Σi(xi −µ)2/σ2

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 48 / 82

Some Statistics and M-Estimators

The Sample Mean as a Minimizer

Fact

x = arg minµ∈Rn

N∑i=1

∥∥∥∥ 1

σ(xi − µ)

∥∥∥∥2`2

. (24)

Outline of the proof:

(i) Show that, ∇µE(µ), the gradient of E(µ) =∑N

i=1

∥∥ 1σ (xi − µ)

∥∥2`2

w.r.t. µ =[µ1 . . . µn

]T, is proportional to[ ∑N

i=1(µ1 − xi,1) . . .∑N

i=1(µn − xi,n)]

(25)

where xi,j is the j-th entry of xi.

(ii) Solve ∇µE(µ) = 0

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 49 / 82

Some Statistics and M-Estimators

The Sample Mean as a Minimizer

Fact

x = arg minµ∈Rn

N∑i=1

∥∥∥∥ 1

σ(xi − µ)

∥∥∥∥2`2

. (24)

Outline of the proof:

(i) Show that, ∇µE(µ), the gradient of E(µ) =∑N

i=1

∥∥ 1σ (xi − µ)

∥∥2`2

w.r.t. µ =[µ1 . . . µn

]T, is proportional to[ ∑N

i=1(µ1 − xi,1) . . .∑N

i=1(µn − xi,n)]

(25)

where xi,j is the j-th entry of xi.

(ii) Solve ∇µE(µ) = 0

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 49 / 82

Some Statistics and M-Estimators

More generally, let Q be an SPD n-by-n matrix.

Fact

x = arg minµ∈Rn

N∑i=1

‖xi − µ‖2Q (26)

where‖xi − µ‖2Q = (xi − µ)TQ(xi − µ)

If Q = σ−2In×n ∝ In×n this reduces to the previous problem:

arg minµ∈Rn

N∑i=1

∥∥∥∥ 1

σ(xi − µ)

∥∥∥∥2`2

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 50 / 82

Some Statistics and M-Estimators

Informal Definition (Statistic)

A statistic is a function that depends only on the data.

Trivially, a function of a statistic is also a statistic.

Example

S1(x1, . . . ,xN ;N) =∑N

i=1 xi ∈ Rn is a statistic of (x1, . . . ,xN ) ⊂ Rn.

The sames goes for x = 1N

∑Ni=1 xi = 1

N S1(x1, . . . ,xN ;N).

Example

S2(x1, . . . ,xN ;N) =∑N

i=1 xixTi ∈ Rn×n is a statistic of

(x1, . . . ,xN ) ⊂ Rn. The sames goes for 1N S2(x1, . . . ,xN ) and

( 1N

∑Ni=1 xix

Ti )− xxT

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 51 / 82

Some Statistics and M-Estimators

Informal Definition (Statistic)

A statistic is a function that depends only on the data.

Trivially, a function of a statistic is also a statistic.

Example

S1(x1, . . . ,xN ;N) =∑N

i=1 xi ∈ Rn is a statistic of (x1, . . . ,xN ) ⊂ Rn.

The sames goes for x = 1N

∑Ni=1 xi = 1

N S1(x1, . . . ,xN ;N).

Example

S2(x1, . . . ,xN ;N) =∑N

i=1 xixTi ∈ Rn×n is a statistic of

(x1, . . . ,xN ) ⊂ Rn. The sames goes for 1N S2(x1, . . . ,xN ) and

( 1N

∑Ni=1 xix

Ti )− xxT

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 51 / 82

Some Statistics and M-Estimators

Informal Definition (Statistic)

A statistic is a function that depends only on the data.

Trivially, a function of a statistic is also a statistic.

Example

S1(x1, . . . ,xN ;N) =∑N

i=1 xi ∈ Rn is a statistic of (x1, . . . ,xN ) ⊂ Rn.

The sames goes for x = 1N

∑Ni=1 xi = 1

N S1(x1, . . . ,xN ;N).

Example

S2(x1, . . . ,xN ;N) =∑N

i=1 xixTi ∈ Rn×n is a statistic of

(x1, . . . ,xN ) ⊂ Rn. The sames goes for 1N S2(x1, . . . ,xN ) and

( 1N

∑Ni=1 xix

Ti )− xxT

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 51 / 82

Some Statistics and M-Estimators

Definition (k-th order statistic)

The k-th order statistic of {xi}Ni=1 ⊂ R is the k-th-smallest value among

{xi}Ni=1. It is denoted by x(k). Thus, x(1) ≤ x(2) ≤ . . . ≤ x(N).

Example

x(1) = min {x1, . . . , xN} and x(N) = max {x1, . . . , xN}.

Definition (order statistics)

The ordered N -tuple of the sorted values,

(x(1), x(2), . . . , x(N)) , (27)

is called the order statistics of {xi}Ni=1.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 52 / 82

Some Statistics and M-Estimators

Definition (k-th order statistic)

The k-th order statistic of {xi}Ni=1 ⊂ R is the k-th-smallest value among

{xi}Ni=1. It is denoted by x(k). Thus, x(1) ≤ x(2) ≤ . . . ≤ x(N).

Example

x(1) = min {x1, . . . , xN} and x(N) = max {x1, . . . , xN}.

Definition (order statistics)

The ordered N -tuple of the sorted values,

(x(1), x(2), . . . , x(N)) , (27)

is called the order statistics of {xi}Ni=1.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 52 / 82

Some Statistics and M-Estimators

Definition (k-th order statistic)

The k-th order statistic of {xi}Ni=1 ⊂ R is the k-th-smallest value among

{xi}Ni=1. It is denoted by x(k). Thus, x(1) ≤ x(2) ≤ . . . ≤ x(N).

Example

x(1) = min {x1, . . . , xN} and x(N) = max {x1, . . . , xN}.

Definition (order statistics)

The ordered N -tuple of the sorted values,

(x(1), x(2), . . . , x(N)) , (27)

is called the order statistics of {xi}Ni=1.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 52 / 82

Some Statistics and M-Estimators

For simplicity, let us assume N is odd.

Definition (Sample median)

If N is odd, then the sample median of {xi}Ni=1 ⊂ R, is

x(N+12

) (28)

If N is even, there are several different ways to define the samplemedian; when N is large then they usually become similar.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 53 / 82

Some Statistics and M-Estimators

For simplicity, let us assume N is odd.

Definition (Sample median)

If N is odd, then the sample median of {xi}Ni=1 ⊂ R, is

x(N+12

) (28)

If N is even, there are several different ways to define the samplemedian; when N is large then they usually become similar.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 53 / 82

Some Statistics and M-Estimators

The Sample Median as a Minimizer

Fact

The sample median minimizes the sum of scaled `1 distances:

x(N+12 ) = arg min

m∈R

N∑i=1

|xi −m|σ

σ > 0 . (29)

We may take this as the definition of the sample median (in which case,we don’t need to worry if N is even or odd).

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 54 / 82

Some Statistics and M-Estimators

σ = 2:

30 20 10 0 10 20 30

µ

5

0

5

10

15

|µ− x1|/σ

30 20 10 0 10 20 30

µ

10

5

0

5

10

15

20

|µ− x2|/σ

30 20 10 0 10 20 30

µ

10

5

0

5

10

15

20

both funcs

30 20 10 0 10 20 30

µ

10

0

10

20

30

40

sum both funcs

Σi|xi −µ|/σsample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 55 / 82

Some Statistics and M-Estimators

σ = 4:

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

|µ− x1|/σ

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

10

|µ− x2|/σ

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

10

both funcs

30 20 10 0 10 20 30

µ

5

0

5

10

15

20

sum both funcs

Σi|xi −µ|/σsample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 56 / 82

Some Statistics and M-Estimators

σ = 4, N = 3:

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

10

all 3 funcs

30 20 10 0 10 20 30

µ

10

0

10

20

30

sum all funcs

Σi|xi −µ|/σsample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 57 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

10

all 4 funcs

30 20 10 0 10 20 30

µ

10

0

10

20

30

40

sum all funcs

Σi|xi −µ|/σsample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 58 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

10

all 5 funcs

30 20 10 0 10 20 30

µ

20

10

0

10

20

30

40

50

sum all funcs

Σi|xi −µ|/σsample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 59 / 82

Some Statistics and M-Estimators

σ = 4, N = 7 (more robust to outliers):

30 20 10 0 10 20 30

µ

4

2

0

2

4

6

8

10

all 7 funcs

30 20 10 0 10 20 30

µ

20

10

0

10

20

30

40

50

60sum all funcs

Σi|xi −µ|/σsample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 60 / 82

Some Statistics and M-Estimators

The Sample Median is More Robust Than the Sample Mean

Since `1 is more robust than `2, this interpretation in terms ofoptimization problems explains why the sample median is more robustthan the sample mean.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 61 / 82

Some Statistics and M-Estimators

Median Filtering

Replace pixel (i, j) with the median of the value in, say, a 5 by 5,neighborhood around it. This is a nonlinear operation.

Images taken from Wikipediawww.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 62 / 82

Some Statistics and M-Estimators

Trimmed Average

The median is also an extreme case of the truncated (or trimmed) average:

1

N − 2N0

N−N0∑i=N0

x(i) (30)

We will return to the trimmed average later when we discuss Robust PCA(PCA is a dimensionality-reduction technique which we will discuss aswell).

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 63 / 82

Some Statistics and M-Estimators

M-Estimators

More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N

i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):

arg maxθ

N∏i=1

p(xi;θ) = arg minθ

N∑i=1

− log p(xi;θ)

MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.

M -estimators can be defined even when the space is nonlinear — wewill see some examples.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82

Some Statistics and M-Estimators

M-Estimators

More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N

i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):

arg maxθ

N∏i=1

p(xi;θ) = arg minθ

N∑i=1

− log p(xi;θ)

MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.

M -estimators can be defined even when the space is nonlinear — wewill see some examples.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82

Some Statistics and M-Estimators

M-Estimators

More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N

i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):

arg maxθ

N∏i=1

p(xi;θ) = arg minθ

N∑i=1

− log p(xi;θ)

MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.

M -estimators can be defined even when the space is nonlinear — wewill see some examples.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82

Some Statistics and M-Estimators

M-Estimators

More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N

i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):

arg maxθ

N∏i=1

p(xi;θ) = arg minθ

N∑i=1

− log p(xi;θ)

MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.

M -estimators can be defined even when the space is nonlinear — wewill see some examples.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82

Some Statistics and M-Estimators

The Sample Mean as an MLE

Fact

The sample mean maximizes the likelihood under a Gaussian likelihoodmodel:

x = arg maxµ∈R

N∏i=1

N (xi;µ, σ2) . (31)

Remark

∫R· · ·∫R

N∏i=1

N (xi;µ, σ2) dx1 · · · dxN = 1

∫RN (xi;µ, σ

2) dµ = 1 ∀i but

∫R

N∏i=1

N (xi;µ, σ2) dµ 6= 1 ∀N > 1

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 65 / 82

Some Statistics and M-Estimators

Example

Let (x1, x2) ∼ N (x1;µ, σ2)N (x2;µ, σ

2).

x =x1 + x2

2= arg max

µN (x1;µ, σ

2)N (x2;µ, σ2) (32)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 66 / 82

Some Statistics and M-Estimators

σ = 2:

30 20 10 0 10 20 30

µ

0.01

0.00

0.01

0.02

0.03

0.04

(2σ2π)−1/2exp(− (µ− x1)2/σ2)

30 20 10 0 10 20 30

µ

0.01

0.00

0.01

0.02

0.03

0.04

(2σ2π)−1/2exp(− (µ− x2)2/σ2)

30 20 10 0 10 20 30

µ

0.01

0.00

0.01

0.02

0.03

0.04

both funcs

30 20 10 0 10 20 30

µ

0.000005

0.000000

0.000005

0.000010

0.000015

product of both funcs

(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 67 / 82

Some Statistics and M-Estimators

σ = 4:

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

(2σ2π)−1/2exp(− (µ− x1)2/σ2)

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

(2σ2π)−1/2exp(− (µ− x2)2/σ2)

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

both funcs

30 20 10 0 10 20 30

µ

0.00001

0.00000

0.00001

0.00002

0.00003

product of both funcs

(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 68 / 82

Some Statistics and M-Estimators

σ = 4, N = 3:

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

all 3 funcs

30 20 10 0 10 20 30

µ

1

0

1

2

3

1e 7 product all funcs

(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 69 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

all 4 funcs

30 20 10 0 10 20 30

µ

1

0

1

2

31e 9 product all funcs

(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 70 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

all 5 funcs

30 20 10 0 10 20 30

µ

1.0

0.5

0.0

0.5

1.0

1.5

2.0

2.51e 11 product all funcs

(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 71 / 82

Some Statistics and M-Estimators

σ = 4, N = 7 (not robust to outliers):

30 20 10 0 10 20 30

µ

0.004

0.002

0.000

0.002

0.004

0.006

0.008

0.010

all 7 funcs

30 20 10 0 10 20 30

µ

0.5

0.0

0.5

1.0

1.5

1e 46 product all funcs

(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)

sample mean

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 72 / 82

Some Statistics and M-Estimators

More generally, let Σ be an SPD n-by-n matrix.

Fact

The inverse of an SPD matrix is also SPD.

Fact

x = arg maxµ∈Rn

N∏i=1

1

(2π)n/2|Σ|1/2exp

(−1

2‖xi − µ‖2Σ−1

)(33)

where‖xi − µ‖2Σ−1 = (xi − µ)TΣ−1(xi − µ),

is called the Mahalanobis distance.

|Σ| is the determinant of Σwww.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 73 / 82

Some Statistics and M-Estimators

The Sample Median as an MLE

Fact

The sample median maximizes the likelihood under a Laplace-distributionlikelihood model

x = arg maxµ∈R

N∏i=1

f(xi;µ, σ) . (34)

where

f(x;µ, σ) =1

2σexp

(−|x− µ|

σ

)

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 74 / 82

Some Statistics and M-Estimators

c

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

(2σ)−1exp(− |µ− x1|/σ)

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

(2σ)−1exp(− |µ− x2|/σ)

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

both funcs

30 20 10 0 10 20 30

µ

0.001

0.000

0.001

0.002

0.003

product of both funcs

(2σ)−1/n∏iexp(− |xi −µ|/σ)

sample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 75 / 82

Some Statistics and M-Estimators

σ = 4, N = 3:

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

all 3 funcs

30 20 10 0 10 20 30

µ

0.0002

0.0001

0.0000

0.0001

0.0002

0.0003

0.0004

product all funcs

(2σ)−1/n∏iexp(− |xi −µ|/σ)

sample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 76 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

all 4 funcs

30 20 10 0 10 20 30

µ

0.00001

0.00000

0.00001

0.00002

0.00003

product all funcs

(2σ)−1/n∏iexp(− |xi −µ|/σ)

sample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 77 / 82

Some Statistics and M-Estimators

σ = 4, N = 4:

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

all 5 funcs

30 20 10 0 10 20 30

µ

0.000001

0.000000

0.000001

0.000002

0.000003

product all funcs

(2σ)−1/n∏iexp(− |xi −µ|/σ)

sample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 78 / 82

Some Statistics and M-Estimators

σ = 4, N = 7 (more robust to outliers than the Gaussian):

30 20 10 0 10 20 30

µ

0.05

0.00

0.05

0.10

all 7 funcs

30 20 10 0 10 20 30

µ

0.4

0.2

0.0

0.2

0.4

0.6

0.8

1e 13 product all funcs

(2σ)−1/n∏iexp(− |xi −µ|/σ)

sample median

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 79 / 82

Some Statistics and M-Estimators

Definition (sample correlation matrix)

1N

∑Ni=1 xix

Ti ∈ Rn×n is called the sample correlation matrix of

{xi}Ni=1 ⊂ Rn

Definition (sample covariance matrix)

1N

∑Ni=1(xi − x)(xi − x)T is called the sample covariance matrix of

{xi}Ni=1 ⊂ Rn

Fact

1N

N∑i=1

(xi − x)(xi − x)T =

(1N

N∑i=1

xixTi

)− xxT

1N

N∑i=1

(xi − x)(xi − x)T = 1N

N∑i=1

(xi − x)xTi = 1N

N∑i=1

xi(xi − x)T

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 80 / 82

Some Statistics and M-Estimators

Definition (sample correlation matrix)

1N

∑Ni=1 xix

Ti ∈ Rn×n is called the sample correlation matrix of

{xi}Ni=1 ⊂ Rn

Definition (sample covariance matrix)

1N

∑Ni=1(xi − x)(xi − x)T is called the sample covariance matrix of

{xi}Ni=1 ⊂ Rn

Fact

1N

N∑i=1

(xi − x)(xi − x)T =

(1N

N∑i=1

xixTi

)− xxT

1N

N∑i=1

(xi − x)(xi − x)T = 1N

N∑i=1

(xi − x)xTi = 1N

N∑i=1

xi(xi − x)T

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 80 / 82

Some Statistics and M-Estimators

Fact

Both 1N

∑Ni=1 xix

Ti and 1

N

∑Ni=1(xi − x)(xi − x)T are symmetric n× n

matrices and their eigenvalues are always nonnegative.⇒ they are SPSD. If their eigenvalues are positive, then they are also SPD.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 81 / 82

Some Statistics and M-Estimators

Version Log

7/4/2019, ver 1.00.

www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 82 / 82

top related