![Page 1: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/1.jpg)
Computer Vision: Models, Learning and Inference–
Convexity; Least Squares; Robustness; Some Statistics
Oren Freifeld and Ron Shapira-Weber
Computer Science, Ben-Gurion University
April 7, 2019
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 1 / 82
![Page 2: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/2.jpg)
1 Convexity
2 Linear Model, Least Squares and Weighted Least Squares
3 Outliers, “Wrong” Assumptions, and Robustness
4 Some Statistics and M-Estimators
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 2 / 82
![Page 3: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/3.jpg)
Convexity
Definition (convex function from I ⊂ R to R)
Let I ⊂ R be an interval.f : I → R is called convex if, ∀x1, x2 ∈ I, ∀t ∈ (0, 1),
f(tx1 + (1− t)x2) ≤ tf(x1) + (1− t)f(x2) .
In words: any chord connecting two values of f is always not below thegraph of f .
Example
f(x) = x2 is convex.
Example
f(x) = |x| is convex.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 3 / 82
![Page 4: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/4.jpg)
Convexity
Definition (strictly-convex function from I ⊂ R to R)
Let I ⊂ R be an interval.f : I → R is called strictly convex if, ∀x1 6= x2 ∈ I, ∀t ∈ (0, 1),
f(tx1 + (1− t)x2) < tf(x1) + (1− t)f(x2)
In words: any chord connecting two values of f is always above thegraph of f .
A strictly-convex function is convex; the converse is false.
Example
f(x) = x2 is strictly convex.
Counterexample
f(x) = |x| is convex but not strictly convex.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 4 / 82
![Page 5: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/5.jpg)
Convexity
Fact
A differentiable f : I → R is convex ⇐⇒
f(x) ≥ f(y) + f ′(y)(x− y) ∀x, y ∈ I
and it is strictly convex ⇐⇒
f(x) > f(y) + f ′(y)(x− y) ∀x, y ∈ I , x 6= y
a differentiable convex function f is not below its tangents.
a strictly differentiable convex function f is above its tangents.
Fact
f is convex and differentiable & f ′(x) = 0 ⇒ x is a global minimum of f .
Fact
f ′′ exists & f ′′ ≥ 0 (resp. f ′′ > 0) on I ⇒ f is convex (strictly convex).
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 5 / 82
![Page 6: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/6.jpg)
Convexity
Fact
A differentiable f : I → R is convex ⇐⇒
f(x) ≥ f(y) + f ′(y)(x− y) ∀x, y ∈ I
and it is strictly convex ⇐⇒
f(x) > f(y) + f ′(y)(x− y) ∀x, y ∈ I , x 6= y
a differentiable convex function f is not below its tangents.
a strictly differentiable convex function f is above its tangents.
Fact
f is convex and differentiable & f ′(x) = 0 ⇒ x is a global minimum of f .
Fact
f ′′ exists & f ′′ ≥ 0 (resp. f ′′ > 0) on I ⇒ f is convex (strictly convex).
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 5 / 82
![Page 7: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/7.jpg)
Convexity
Definition (convex combination (of points))∑mi=1 αixi, a linear combination of {xi}mi=1 ⊂ Rn, is called convex if all
the αi’s are nonnegative and∑m
i=1 αi = 1.
Definition (convex set)
S ⊂ Rn is called convex if it is closed under convex combinations.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 6 / 82
![Page 8: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/8.jpg)
Convexity
Definition (convex function from a convex subset of Rn to R)
Let S ⊂ Rn be convex. f : S → R is called convex if,∀x1,x2 ∈ S,∀t ∈ (0, 1),
f(tx1 + (1− t)x2) ≤ tf(x1) + (1− t)f(x2)
and it is called strictly convex if, ∀x1 6= x2 ∈ I, ∀t ∈ (0, 1),
f(tx1 + (1− t)x2) < tf(x1) + (1− t)f(x2)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 7 / 82
![Page 9: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/9.jpg)
Convexity
Fact
Let S ⊂ Rn be convex. A differentiable f : S → R is convex ⇐⇒
f(x) ≥ f(y) +∇f(y)(x− y) ∀x,y ∈ S
and it is strictly convex ⇐⇒
f(x) > f(y) +∇f(y)(x− y) ∀x,y ∈ S ,x 6= y
Fact
f is convex and differentiable & ∇f(x) = 0 ⇒ x is a global minimum off .
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 8 / 82
![Page 10: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/10.jpg)
Convexity
Definition (Hessian)
the Hessian matrix of a twice-differentiable f : Rn → R is
∂2f
∂x21
∂2f
∂x1 ∂x2· · · ∂2f
∂x1 ∂xn
∂2f
∂x2 ∂x1
∂2f
∂x22· · · ∂2f
∂x2 ∂xn...
.... . .
...
∂2f
∂xn ∂x1
∂2f
∂xn ∂x2· · · ∂2f
∂x2n
(1)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 9 / 82
![Page 11: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/11.jpg)
Convexity
Fact
A twice-differentiable f : S → Rn is convex (resp. strictly convex) ⇐⇒its Hessian matrix is SPSD (SPD) on the interior of S.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 10 / 82
![Page 12: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/12.jpg)
Convexity
Fact
A weighted sum, with nonnegative weights, of convex functions is convex.A weighted sum, with nonnegative weights, of strictly convex functions isstrictly convex (if we ignore the degenerate case that all the weights arezero).
Example
If (xi)ni=1 ⊂ R and (wi)
ni=1 ⊂ R≥0 (and the wi’s do not depend on the
xi’s) then
f : R→ R f(µ) =∑N
i=1wi(µ− xi)2
is strictly convex while
g : R→ R g(µ) =∑N
i=1wi|µ− xi|
is convex.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 11 / 82
![Page 13: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/13.jpg)
Convexity
Norms
Fact
Every norm is a convex function.
Example
Let p ≥ 1.f : Rn → R≥0, f : x 7→ ‖x‖`p is convex.
Example
Let Q be an n× n SPD matrix.f : Rn → R≥0, f : x 7→ ‖x‖Q , (xTQx)1/2 is convex.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 12 / 82
![Page 14: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/14.jpg)
Convexity
Some Operations that Preserve Convexity of Functions
Nonnegative weighted sum of convex functions is convex
Example
f : Rn → R≥0 f : x 7→ ‖x‖2`2 ,n∑i=1
x2i
Composition with an affine function.
Example
f : Rn → R≥0 f : x 7→ ‖Ax+ b‖`p A ∈ Rm×n b ∈ Rm
Pointwise maximum and supremum.
Example
f(x) = maxi∈{1,2,3}
{f1(x), f2(x), f3(x)}
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 13 / 82
![Page 15: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/15.jpg)
Convexity
Convex Optimization
Convex functions are “easy” to optimize.
Sometimes they have closed-form solutions for their minimizer(s).
Exercise
f : R→ R f : x 7→ ax2 + bx+ c a ∈ R>0 b, c ∈ R
Show that arg minx f(x) = − b2a (as you saw in high school).
But, more importantly, even whey have no closed-form minimizers it isstill “easy”.
Textbook: Boyd and Vandenberghe’s Convex Optimization. Also checkout Boyd’s video lectures at Stanford’s youtube channel.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 14 / 82
![Page 16: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/16.jpg)
Convexity
Convex Optimization
The take-home message: while learning convex optimization is veryuseful, unless you want to specialize in that field, it often suffices (forCV/ML people) to focus on how to:
recognize the problem is convex; ortransform a non-convex problem to a convex one; orapproximate a non-convex problem using a convex one; orsolve a non-convex problem by iterating between convex subproblems.
E.g., this is how Convex Optimization is taught by Boyd at Stanford.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 15 / 82
![Page 17: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/17.jpg)
Linear Model, Least Squares and Weighted Least Squares
Linear Model
Measurements: {yNi=1} ⊂ Rd
Hi is a known matrix, possibly dependent on i.
Residual of a linear model:
ri︸︷︷︸d×1
= Hi︸︷︷︸d×k
θ︸︷︷︸k×1
− yi︸︷︷︸d×1
(2)
When we discussed optical flow for gray-scale images, d was 1.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 16 / 82
![Page 18: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/18.jpg)
Linear Model, Least Squares and Weighted Least Squares
Least-Squares Estimation in a Linear Model
θLS , arg minθ
N∑i=1
‖ri‖2︸ ︷︷ ︸‖r‖2
H ,[HT
1 · · · HTN
]T ∈ R(Nd)×k
y ,[yT1 · · · yTN
]T ∈ R(Nd)×1
r ,[rT1 · · · rTN
]T= Hθ − y =∈ R(Nd)×1 (3)
The cost function, from Rk to R>0, is convex; it’s strictly convex ifrank(H) = k.A minimizer satisfies
HTHθLS = HTy (4)
and is unique if rank(H) = k; i.e., if HTH is invertible.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 17 / 82
![Page 19: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/19.jpg)
Linear Model, Least Squares and Weighted Least Squares
Weighted Least-Squares
θWLS , arg minθ
N∑i=1
wi‖ri‖2︸ ︷︷ ︸‖W 1/2(Hθ−y)‖2
(5)
W = diag(w1, . . . , w1︸ ︷︷ ︸d times
, w2, . . . , w2︸ ︷︷ ︸d times
, wN , . . . , wN︸ ︷︷ ︸d times
)
W 1/2 = diag(√w1, . . . ,
√w1︸ ︷︷ ︸
d times
,√w2, . . . ,
√w2︸ ︷︷ ︸
d times
,√wN , . . . ,
√wN︸ ︷︷ ︸
d times
) (6)
HTWHθWLS = HTWy (7)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 18 / 82
![Page 20: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/20.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Least-Squares Estimation is not Robust
10 5 0 5 1010
0
10
20
30
40
50
60
70Least Squares
true
LS
Alsotrue for weighted least squares.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 19 / 82
![Page 21: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/21.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Robust Least-Squares
Measurements: {yNi=1} ⊂ Rd
Residual: ri︸︷︷︸d×1
= Hi︸︷︷︸d×k
θ︸︷︷︸k×1
− yi︸︷︷︸d×1
Wantarg min
θ
∑i
ρ(‖ri‖)
where ρ : R→ R>0 is differentiable
ψ(x) = dρ(x)dx is called the influence function.
If ρ is of the form x 7→ x2 then this is least squares.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 20 / 82
![Page 22: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/22.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Robust Error Function
Example (Geman-McClure robust error function)
ρ(x) = x2/(x2 + σ2)
30 20 10 0 10 20 30x
0.0
0.2
0.4
0.6
0.8
1.0
Geman-McClure Robust Error Function: ρ(x) = x2/(x2 + σ2)
σ= 0. 5
σ= 1
σ= 5
σ= 10
σ= 20
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 21 / 82
![Page 23: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/23.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Influence Function: Derivative of the Error Function
Example (Derivative of the Geman-McClure robust error function)
ψ(x) =d
dxρ(x) = (2σ2x)/(x2 + σ2)2
30 20 10 0 10 20 30x
1.5
1.0
0.5
0.0
0.5
1.0
1.5Influence Function: ψ(x) = (2σ2x)/(x2 + σ2)2
σ= 0. 5
σ= 1
σ= 5
σ= 10
σ= 20
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 22 / 82
![Page 24: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/24.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Definition (quasiconvex function))
Let S ⊂ Rn be a convex set. f : S → R is called quasiconvex if all itssublevel sets
Sα = {x|f(x) ≤ α} ,
for α ∈ R, are convex.
Example
Geman-McClure robust error function:
x→ x2
x2 + σ2
If S ⊂ R and f : S → R is quasiconvex then f is unimodal.
A quasiconvex function is usually still easy to minimize. But . . .
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 23 / 82
![Page 25: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/25.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Fact
A weighted sum, with nonnegative weights, of quasiconvex functions isusually not quasiconvex.
Example
If (xi)ni=1 ⊂ R and (wi)
ni=1 ⊂ R≥0 (and the wi’s do not depend on the
xi’s) then
f : R→ R f(µ) =
N∑i=1
wiρ(µ− xi) ,
where ρ is the Geman-McClure robust error function, is usually notunimodal.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 24 / 82
![Page 26: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/26.jpg)
Outliers, “Wrong” Assumptions, and Robustness
Minimizing a Nonconvex Robust Cost Function
⇒ when ρ(x) is more robust than |x| then optimization is hard. . .
Example for possible approaches:Assuming ρ is differentiable, can try gradient-based methods, possibly withgraduated optimization such as
graduated non-convexity (minimize (1− t)fconvex + tf); or“annealing” (e.g., gradually decrease σ in the GM robust error function)
Iterative Reweighted Least Squares (IRLS)
Figure taken from Wikipediawww.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 25 / 82
![Page 27: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/27.jpg)
Convexity
Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 26 / 82
![Page 28: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/28.jpg)
Convexity
Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 27 / 82
![Page 29: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/29.jpg)
Convexity
Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 28 / 82
![Page 30: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/30.jpg)
Convexity
Results for a Gradient-Based Method in the Global ApproachFigure from Michael Black’s PhD, 1992
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 29 / 82
![Page 31: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/31.jpg)
Convexity
Iterative Reweighted Least-Squares
Idea: use w(x) = ψ(x)/x
Start with an LS solution, then alternate between computing weightsbased on the residual errors, and a WLS solution using fixed weightsfrom the previous iteration.
Initialization: solve
HTHθ[0]IRLS = HTy (8)
Alternate
1 Set w[k]i =
ψ(∥∥∥r[k]
i
∥∥∥)∥∥∥r[k]i
∥∥∥ where r[k]i =Hiθ
[k]IRLS − yi
2 Solve
HTW [k]Hθ[k+1]IRLS =HTW [k]y (9)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 30 / 82
![Page 32: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/32.jpg)
Convexity
Iterative Reweighted Least-Squares
ERLS =∑i
ρ(‖ri‖)
∇θERLS =∑i
ψ(‖ri‖)∇θ‖ri‖want= 0 (10)
EIRLS =∑i
w(ri)‖ri‖2 (11)
E[k+1]IRLS =
∑i
w(r[k]i )‖ri‖2 =
∑i
ψ(∥∥∥r[k]i ∥∥∥)∥∥∥r[k]i ∥∥∥ ‖ri‖2 ≈
∑i
ψ(∥∥∥r[k]i ∥∥∥) ‖ri‖
0 = ∇θE[k+1]IRLS ≈
∑i
ψ(∥∥∥r[k]i ∥∥∥)∇θ‖ri‖
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 31 / 82
![Page 33: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/33.jpg)
Convexity
Iterative Reweighted Least-Squares
ERLS =∑i
ρ(‖ri‖)
∇θERLS =∑i
ψ(‖ri‖)∇θ‖ri‖want= 0 (10)
EIRLS =∑i
w(ri)‖ri‖2 (11)
E[k+1]IRLS =
∑i
w(r[k]i )‖ri‖2 =
∑i
ψ(∥∥∥r[k]i ∥∥∥)∥∥∥r[k]i ∥∥∥ ‖ri‖2 ≈
∑i
ψ(∥∥∥r[k]i ∥∥∥) ‖ri‖
0 = ∇θE[k+1]IRLS ≈
∑i
ψ(∥∥∥r[k]i ∥∥∥)∇θ‖ri‖
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 31 / 82
![Page 34: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/34.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Least Squares
true
LS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 35: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/35.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 1
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 36: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/36.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 2
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 37: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/37.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 3
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 38: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/38.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 4
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 39: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/39.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 5
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 40: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/40.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 6
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 41: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/41.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 7
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 42: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/42.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 8
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 43: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/43.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 9
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 44: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/44.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 10
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 45: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/45.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 11
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 46: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/46.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 12
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 47: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/47.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 13
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 48: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/48.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 14
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 49: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/49.jpg)
Convexity
10 5 0 5 1010
0
10
20
30
40
50
60
70Iterative Rewighted Least Squares, iter 15
true
IRLS
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 32 / 82
![Page 50: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/50.jpg)
Convexity
Recall Lucas-Kanade
N∑i=1
g(x,xi)
(∇xI(xi, t)
[u(x)v(x)
]+ It(xi, t)
)2
=∥∥∥W 1
2ε∥∥∥2 = εTWε =
N∑i=1
√g(x,xi)∇xI(xi, t)︸ ︷︷ ︸Hi:1×2
[u(x)v(x)
]︸ ︷︷ ︸θ:2×1
+√g(x,xi)It(xi, t)︸ ︷︷ ︸
−yi∈R
2
(12)
W = diag([g(x,x1) . . . g(x,xN )
]) (not the same W from IRLS)
W12 = diag(
[ √g(x,x1) . . .
√g(x,xN )
]) (13)[
u(x)v(x)
]WLS
, arg minu(x),v(x)
εTWε (14)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 33 / 82
![Page 51: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/51.jpg)
Convexity
Robust Lucas-Kanade
ERLK =
N∑i=1
ρ
√g(x,xi)∇xI(xi, t)︸ ︷︷ ︸
Hi:1×2
[u(x)v(x)
]︸ ︷︷ ︸θ:2×1
+√g(x,xi)It(xi, t)︸ ︷︷ ︸
−yi∈R︸ ︷︷ ︸ri∈R
(15)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 34 / 82
![Page 52: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/52.jpg)
Convexity
Robust Affine Lucas-Kanade
[u(xi)v(xi)
]=
[xi − x yi − y 1 0 0 0
0 0 0 xi − x yi − y 1
]︸ ︷︷ ︸
A(x,xi),
θ1θ2θ3θ4θ5θ6
︸ ︷︷ ︸θ,
(16)
ERALK =
N∑i=1
ρ
√g(x,xi)∇xI(xi, t)A(x,xi)︸ ︷︷ ︸
Hi:1×6
θ +√g(x,xi)It(xi, t)︸ ︷︷ ︸
−yi∈R︸ ︷︷ ︸ri∈R
(17)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 35 / 82
![Page 53: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/53.jpg)
Convexity
Random Sample Consensus (RANSAC)
A non-deterministic algorithm
For generality, write the model as f(xi,θ) = yi
Stick to the least-squares formulation
Algorithm: given data {xi,yi}Ni=1, alternate between
Pick a very small random subset of the data. Its cardinality should suffice forestimating the parameters (e.g., you can’t estimate a line from a singlepoint). Fit a least-squares model to it.
For each data point in the original set, compute∥∥∥f(xi, θ)− yi
∥∥∥. Using a
threshold, reject outliers.
Once a certain iteration achieves enough inliers (points not rejected),the algorithm stops.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 36 / 82
![Page 54: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/54.jpg)
Convexity
RANSAC and Optical Flow
For HS-type models, RANSAC is inapplicable.
For LK-type models, it fits only if the neighborhood is sufficiently large.For example, if we try to find a single affine flow for the entire image.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 37 / 82
![Page 55: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/55.jpg)
Some Statistics and M-Estimators
Definition (sample mean for R-valued data)
The sample mean of {xi}Ni=1 ⊂ R is
x ,1
N
N∑i=1
xi (18)
Definition (sample mean for Rn-valued data)
The sample mean of {xi}Ni=1 ⊂ Rn is
x ,1
N
N∑i=1
xi (19)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 38 / 82
![Page 56: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/56.jpg)
Some Statistics and M-Estimators
Definition (sample mean for R-valued data)
The sample mean of {xi}Ni=1 ⊂ R is
x ,1
N
N∑i=1
xi (18)
Definition (sample mean for Rn-valued data)
The sample mean of {xi}Ni=1 ⊂ Rn is
x ,1
N
N∑i=1
xi (19)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 38 / 82
![Page 57: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/57.jpg)
Some Statistics and M-Estimators
The Sample Mean as a Minimizer
Fact
The sample mean minimizes the sum of squared Euclidean distances:
x = arg minµ∈R
N∑i=1
(xi − µ)2 . (20)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 39 / 82
![Page 58: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/58.jpg)
Some Statistics and M-Estimators
Example
Let x1, x2 ∈ R. Then
x =x1 + x2
2= arg min
µ∈R(x1 − µ)2 + (x2 − µ)2 . (21)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 40 / 82
![Page 59: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/59.jpg)
Some Statistics and M-Estimators
The Sample Mean as a Minimizer
Fact
The sample mean minimizes the sum of squared scaled Euclideandistances:
x = arg minµ∈R
N∑i=1
(xi − µ)2
σ2σ > 0 . (22)
Proof.
N∑i=1
(xi − µ)2
σ2=
1
σ2
N∑i=1
(xi − µ)2
arg minµ∈R
1
σ2
N∑i=1
(xi − µ)2 = arg minµ∈R
N∑i=1
(xi − µ)2 = x
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 41 / 82
![Page 60: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/60.jpg)
Some Statistics and M-Estimators
The Sample Mean as a Minimizer
Fact
The sample mean minimizes the sum of squared scaled Euclideandistances:
x = arg minµ∈R
N∑i=1
(xi − µ)2
σ2σ > 0 . (22)
Proof.
N∑i=1
(xi − µ)2
σ2=
1
σ2
N∑i=1
(xi − µ)2
arg minµ∈R
1
σ2
N∑i=1
(xi − µ)2 = arg minµ∈R
N∑i=1
(xi − µ)2 = x
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 41 / 82
![Page 61: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/61.jpg)
Some Statistics and M-Estimators
Example
Let x1, x2 ∈ R. Then
x =x1 + x2
2= arg min
µ∈R
(x1 − µ)2
σ2+
(x2 − µ)2
σ2. (23)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 42 / 82
![Page 62: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/62.jpg)
Some Statistics and M-Estimators
σ = 2:
30 20 10 0 10 20 30
µ
100
0
100
200
300
(µ− x1)2/σ2
30 20 10 0 10 20 30
µ
200
100
0
100
200
300
400
(µ− x2)2/σ2
30 20 10 0 10 20 30
µ
200
100
0
100
200
300
400
both funcs
30 20 10 0 10 20 30
µ
200
0
200
400
600
800
sum both funcs
Σi(xi −µ)2/σ2
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 43 / 82
![Page 63: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/63.jpg)
Some Statistics and M-Estimators
σ = 4:
30 20 10 0 10 20 30
µ
40
20
0
20
40
60
80
(µ− x1)2/σ2
30 20 10 0 10 20 30
µ
50
0
50
100
(µ− x2)2/σ2
30 20 10 0 10 20 30
µ
50
0
50
100
both funcs
30 20 10 0 10 20 30
µ
50
0
50
100
150
200
sum both funcs
Σi(xi −µ)2/σ2
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 44 / 82
![Page 64: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/64.jpg)
Some Statistics and M-Estimators
σ = 4, N = 3:
30 20 10 0 10 20 30
µ
40
20
0
20
40
60
80
all 3 funcs
30 20 10 0 10 20 30
µ
100
0
100
200
300
sum all funcs
Σi(xi −µ)2/σ2
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 45 / 82
![Page 65: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/65.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
40
20
0
20
40
60
80
100
all 4 funcs
30 20 10 0 10 20 30
µ
100
0
100
200
300
400
sum all funcs
Σi(xi −µ)2/σ2
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 46 / 82
![Page 66: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/66.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
40
20
0
20
40
60
80
100
all 5 funcs
30 20 10 0 10 20 30
µ
200
100
0
100
200
300
400
500
sum all funcs
Σi(xi −µ)2/σ2
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 47 / 82
![Page 67: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/67.jpg)
Some Statistics and M-Estimators
σ = 4, N = 7 (not robust to outliers):
30 20 10 0 10 20 30
µ
40
20
0
20
40
60
80
100
all 7 funcs
30 20 10 0 10 20 30
µ
200
100
0
100
200
300
400
500
sum all funcs
Σi(xi −µ)2/σ2
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 48 / 82
![Page 68: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/68.jpg)
Some Statistics and M-Estimators
The Sample Mean as a Minimizer
Fact
x = arg minµ∈Rn
N∑i=1
∥∥∥∥ 1
σ(xi − µ)
∥∥∥∥2`2
. (24)
Outline of the proof:
(i) Show that, ∇µE(µ), the gradient of E(µ) =∑N
i=1
∥∥ 1σ (xi − µ)
∥∥2`2
w.r.t. µ =[µ1 . . . µn
]T, is proportional to[ ∑N
i=1(µ1 − xi,1) . . .∑N
i=1(µn − xi,n)]
(25)
where xi,j is the j-th entry of xi.
(ii) Solve ∇µE(µ) = 0
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 49 / 82
![Page 69: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/69.jpg)
Some Statistics and M-Estimators
The Sample Mean as a Minimizer
Fact
x = arg minµ∈Rn
N∑i=1
∥∥∥∥ 1
σ(xi − µ)
∥∥∥∥2`2
. (24)
Outline of the proof:
(i) Show that, ∇µE(µ), the gradient of E(µ) =∑N
i=1
∥∥ 1σ (xi − µ)
∥∥2`2
w.r.t. µ =[µ1 . . . µn
]T, is proportional to[ ∑N
i=1(µ1 − xi,1) . . .∑N
i=1(µn − xi,n)]
(25)
where xi,j is the j-th entry of xi.
(ii) Solve ∇µE(µ) = 0
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 49 / 82
![Page 70: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/70.jpg)
Some Statistics and M-Estimators
More generally, let Q be an SPD n-by-n matrix.
Fact
x = arg minµ∈Rn
N∑i=1
‖xi − µ‖2Q (26)
where‖xi − µ‖2Q = (xi − µ)TQ(xi − µ)
If Q = σ−2In×n ∝ In×n this reduces to the previous problem:
arg minµ∈Rn
N∑i=1
∥∥∥∥ 1
σ(xi − µ)
∥∥∥∥2`2
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 50 / 82
![Page 71: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/71.jpg)
Some Statistics and M-Estimators
Informal Definition (Statistic)
A statistic is a function that depends only on the data.
Trivially, a function of a statistic is also a statistic.
Example
S1(x1, . . . ,xN ;N) =∑N
i=1 xi ∈ Rn is a statistic of (x1, . . . ,xN ) ⊂ Rn.
The sames goes for x = 1N
∑Ni=1 xi = 1
N S1(x1, . . . ,xN ;N).
Example
S2(x1, . . . ,xN ;N) =∑N
i=1 xixTi ∈ Rn×n is a statistic of
(x1, . . . ,xN ) ⊂ Rn. The sames goes for 1N S2(x1, . . . ,xN ) and
( 1N
∑Ni=1 xix
Ti )− xxT
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 51 / 82
![Page 72: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/72.jpg)
Some Statistics and M-Estimators
Informal Definition (Statistic)
A statistic is a function that depends only on the data.
Trivially, a function of a statistic is also a statistic.
Example
S1(x1, . . . ,xN ;N) =∑N
i=1 xi ∈ Rn is a statistic of (x1, . . . ,xN ) ⊂ Rn.
The sames goes for x = 1N
∑Ni=1 xi = 1
N S1(x1, . . . ,xN ;N).
Example
S2(x1, . . . ,xN ;N) =∑N
i=1 xixTi ∈ Rn×n is a statistic of
(x1, . . . ,xN ) ⊂ Rn. The sames goes for 1N S2(x1, . . . ,xN ) and
( 1N
∑Ni=1 xix
Ti )− xxT
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 51 / 82
![Page 73: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/73.jpg)
Some Statistics and M-Estimators
Informal Definition (Statistic)
A statistic is a function that depends only on the data.
Trivially, a function of a statistic is also a statistic.
Example
S1(x1, . . . ,xN ;N) =∑N
i=1 xi ∈ Rn is a statistic of (x1, . . . ,xN ) ⊂ Rn.
The sames goes for x = 1N
∑Ni=1 xi = 1
N S1(x1, . . . ,xN ;N).
Example
S2(x1, . . . ,xN ;N) =∑N
i=1 xixTi ∈ Rn×n is a statistic of
(x1, . . . ,xN ) ⊂ Rn. The sames goes for 1N S2(x1, . . . ,xN ) and
( 1N
∑Ni=1 xix
Ti )− xxT
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 51 / 82
![Page 74: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/74.jpg)
Some Statistics and M-Estimators
Definition (k-th order statistic)
The k-th order statistic of {xi}Ni=1 ⊂ R is the k-th-smallest value among
{xi}Ni=1. It is denoted by x(k). Thus, x(1) ≤ x(2) ≤ . . . ≤ x(N).
Example
x(1) = min {x1, . . . , xN} and x(N) = max {x1, . . . , xN}.
Definition (order statistics)
The ordered N -tuple of the sorted values,
(x(1), x(2), . . . , x(N)) , (27)
is called the order statistics of {xi}Ni=1.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 52 / 82
![Page 75: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/75.jpg)
Some Statistics and M-Estimators
Definition (k-th order statistic)
The k-th order statistic of {xi}Ni=1 ⊂ R is the k-th-smallest value among
{xi}Ni=1. It is denoted by x(k). Thus, x(1) ≤ x(2) ≤ . . . ≤ x(N).
Example
x(1) = min {x1, . . . , xN} and x(N) = max {x1, . . . , xN}.
Definition (order statistics)
The ordered N -tuple of the sorted values,
(x(1), x(2), . . . , x(N)) , (27)
is called the order statistics of {xi}Ni=1.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 52 / 82
![Page 76: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/76.jpg)
Some Statistics and M-Estimators
Definition (k-th order statistic)
The k-th order statistic of {xi}Ni=1 ⊂ R is the k-th-smallest value among
{xi}Ni=1. It is denoted by x(k). Thus, x(1) ≤ x(2) ≤ . . . ≤ x(N).
Example
x(1) = min {x1, . . . , xN} and x(N) = max {x1, . . . , xN}.
Definition (order statistics)
The ordered N -tuple of the sorted values,
(x(1), x(2), . . . , x(N)) , (27)
is called the order statistics of {xi}Ni=1.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 52 / 82
![Page 77: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/77.jpg)
Some Statistics and M-Estimators
For simplicity, let us assume N is odd.
Definition (Sample median)
If N is odd, then the sample median of {xi}Ni=1 ⊂ R, is
x(N+12
) (28)
If N is even, there are several different ways to define the samplemedian; when N is large then they usually become similar.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 53 / 82
![Page 78: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/78.jpg)
Some Statistics and M-Estimators
For simplicity, let us assume N is odd.
Definition (Sample median)
If N is odd, then the sample median of {xi}Ni=1 ⊂ R, is
x(N+12
) (28)
If N is even, there are several different ways to define the samplemedian; when N is large then they usually become similar.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 53 / 82
![Page 79: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/79.jpg)
Some Statistics and M-Estimators
The Sample Median as a Minimizer
Fact
The sample median minimizes the sum of scaled `1 distances:
x(N+12 ) = arg min
m∈R
N∑i=1
|xi −m|σ
σ > 0 . (29)
We may take this as the definition of the sample median (in which case,we don’t need to worry if N is even or odd).
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 54 / 82
![Page 80: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/80.jpg)
Some Statistics and M-Estimators
σ = 2:
30 20 10 0 10 20 30
µ
5
0
5
10
15
|µ− x1|/σ
30 20 10 0 10 20 30
µ
10
5
0
5
10
15
20
|µ− x2|/σ
30 20 10 0 10 20 30
µ
10
5
0
5
10
15
20
both funcs
30 20 10 0 10 20 30
µ
10
0
10
20
30
40
sum both funcs
Σi|xi −µ|/σsample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 55 / 82
![Page 81: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/81.jpg)
Some Statistics and M-Estimators
σ = 4:
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
|µ− x1|/σ
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
10
|µ− x2|/σ
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
10
both funcs
30 20 10 0 10 20 30
µ
5
0
5
10
15
20
sum both funcs
Σi|xi −µ|/σsample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 56 / 82
![Page 82: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/82.jpg)
Some Statistics and M-Estimators
σ = 4, N = 3:
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
10
all 3 funcs
30 20 10 0 10 20 30
µ
10
0
10
20
30
sum all funcs
Σi|xi −µ|/σsample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 57 / 82
![Page 83: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/83.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
10
all 4 funcs
30 20 10 0 10 20 30
µ
10
0
10
20
30
40
sum all funcs
Σi|xi −µ|/σsample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 58 / 82
![Page 84: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/84.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
10
all 5 funcs
30 20 10 0 10 20 30
µ
20
10
0
10
20
30
40
50
sum all funcs
Σi|xi −µ|/σsample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 59 / 82
![Page 85: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/85.jpg)
Some Statistics and M-Estimators
σ = 4, N = 7 (more robust to outliers):
30 20 10 0 10 20 30
µ
4
2
0
2
4
6
8
10
all 7 funcs
30 20 10 0 10 20 30
µ
20
10
0
10
20
30
40
50
60sum all funcs
Σi|xi −µ|/σsample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 60 / 82
![Page 86: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/86.jpg)
Some Statistics and M-Estimators
The Sample Median is More Robust Than the Sample Mean
Since `1 is more robust than `2, this interpretation in terms ofoptimization problems explains why the sample median is more robustthan the sample mean.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 61 / 82
![Page 87: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/87.jpg)
Some Statistics and M-Estimators
Median Filtering
Replace pixel (i, j) with the median of the value in, say, a 5 by 5,neighborhood around it. This is a nonlinear operation.
Images taken from Wikipediawww.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 62 / 82
![Page 88: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/88.jpg)
Some Statistics and M-Estimators
Trimmed Average
The median is also an extreme case of the truncated (or trimmed) average:
1
N − 2N0
N−N0∑i=N0
x(i) (30)
We will return to the trimmed average later when we discuss Robust PCA(PCA is a dimensionality-reduction technique which we will discuss aswell).
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 63 / 82
![Page 89: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/89.jpg)
Some Statistics and M-Estimators
M-Estimators
More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N
i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):
arg maxθ
N∏i=1
p(xi;θ) = arg minθ
N∑i=1
− log p(xi;θ)
MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.
M -estimators can be defined even when the space is nonlinear — wewill see some examples.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82
![Page 90: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/90.jpg)
Some Statistics and M-Estimators
M-Estimators
More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N
i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):
arg maxθ
N∏i=1
p(xi;θ) = arg minθ
N∑i=1
− log p(xi;θ)
MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.
M -estimators can be defined even when the space is nonlinear — wewill see some examples.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82
![Page 91: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/91.jpg)
Some Statistics and M-Estimators
M-Estimators
More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N
i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):
arg maxθ
N∏i=1
p(xi;θ) = arg minθ
N∑i=1
− log p(xi;θ)
MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.
M -estimators can be defined even when the space is nonlinear — wewill see some examples.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82
![Page 92: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/92.jpg)
Some Statistics and M-Estimators
M-Estimators
More generally, estimators that are defined as minimizers of sumsfunctions of the data are called M-estimators. These include, amongother things,∑N
i=1 ρ(ri) we saw previously.Maximum-Likelihood Estimators (MLE):
arg maxθ
N∏i=1
p(xi;θ) = arg minθ
N∑i=1
− log p(xi;θ)
MLE enjoy many desired properties and satisfy various asymptotic optimalitycriteria – but may suffer from outliers. Some robust estimators achievenear-optimality when there are no outliers, and suffer little in their presence.
M -estimators can be defined even when the space is nonlinear — wewill see some examples.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 64 / 82
![Page 93: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/93.jpg)
Some Statistics and M-Estimators
The Sample Mean as an MLE
Fact
The sample mean maximizes the likelihood under a Gaussian likelihoodmodel:
x = arg maxµ∈R
N∏i=1
N (xi;µ, σ2) . (31)
Remark
∫R· · ·∫R
N∏i=1
N (xi;µ, σ2) dx1 · · · dxN = 1
∫RN (xi;µ, σ
2) dµ = 1 ∀i but
∫R
N∏i=1
N (xi;µ, σ2) dµ 6= 1 ∀N > 1
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 65 / 82
![Page 94: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/94.jpg)
Some Statistics and M-Estimators
Example
Let (x1, x2) ∼ N (x1;µ, σ2)N (x2;µ, σ
2).
x =x1 + x2
2= arg max
µN (x1;µ, σ
2)N (x2;µ, σ2) (32)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 66 / 82
![Page 95: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/95.jpg)
Some Statistics and M-Estimators
σ = 2:
30 20 10 0 10 20 30
µ
0.01
0.00
0.01
0.02
0.03
0.04
(2σ2π)−1/2exp(− (µ− x1)2/σ2)
30 20 10 0 10 20 30
µ
0.01
0.00
0.01
0.02
0.03
0.04
(2σ2π)−1/2exp(− (µ− x2)2/σ2)
30 20 10 0 10 20 30
µ
0.01
0.00
0.01
0.02
0.03
0.04
both funcs
30 20 10 0 10 20 30
µ
0.000005
0.000000
0.000005
0.000010
0.000015
product of both funcs
(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 67 / 82
![Page 96: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/96.jpg)
Some Statistics and M-Estimators
σ = 4:
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
(2σ2π)−1/2exp(− (µ− x1)2/σ2)
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
(2σ2π)−1/2exp(− (µ− x2)2/σ2)
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
both funcs
30 20 10 0 10 20 30
µ
0.00001
0.00000
0.00001
0.00002
0.00003
product of both funcs
(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 68 / 82
![Page 97: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/97.jpg)
Some Statistics and M-Estimators
σ = 4, N = 3:
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
all 3 funcs
30 20 10 0 10 20 30
µ
1
0
1
2
3
1e 7 product all funcs
(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 69 / 82
![Page 98: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/98.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
all 4 funcs
30 20 10 0 10 20 30
µ
1
0
1
2
31e 9 product all funcs
(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 70 / 82
![Page 99: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/99.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
all 5 funcs
30 20 10 0 10 20 30
µ
1.0
0.5
0.0
0.5
1.0
1.5
2.0
2.51e 11 product all funcs
(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 71 / 82
![Page 100: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/100.jpg)
Some Statistics and M-Estimators
σ = 4, N = 7 (not robust to outliers):
30 20 10 0 10 20 30
µ
0.004
0.002
0.000
0.002
0.004
0.006
0.008
0.010
all 7 funcs
30 20 10 0 10 20 30
µ
0.5
0.0
0.5
1.0
1.5
1e 46 product all funcs
(2πσ2)−n/2∏iexp(− 0. 5(xi −µ)2/σ2)
sample mean
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 72 / 82
![Page 101: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/101.jpg)
Some Statistics and M-Estimators
More generally, let Σ be an SPD n-by-n matrix.
Fact
The inverse of an SPD matrix is also SPD.
Fact
x = arg maxµ∈Rn
N∏i=1
1
(2π)n/2|Σ|1/2exp
(−1
2‖xi − µ‖2Σ−1
)(33)
where‖xi − µ‖2Σ−1 = (xi − µ)TΣ−1(xi − µ),
is called the Mahalanobis distance.
|Σ| is the determinant of Σwww.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 73 / 82
![Page 102: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/102.jpg)
Some Statistics and M-Estimators
The Sample Median as an MLE
Fact
The sample median maximizes the likelihood under a Laplace-distributionlikelihood model
x = arg maxµ∈R
N∏i=1
f(xi;µ, σ) . (34)
where
f(x;µ, σ) =1
2σexp
(−|x− µ|
σ
)
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 74 / 82
![Page 103: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/103.jpg)
Some Statistics and M-Estimators
c
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
(2σ)−1exp(− |µ− x1|/σ)
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
(2σ)−1exp(− |µ− x2|/σ)
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
both funcs
30 20 10 0 10 20 30
µ
0.001
0.000
0.001
0.002
0.003
product of both funcs
(2σ)−1/n∏iexp(− |xi −µ|/σ)
sample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 75 / 82
![Page 104: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/104.jpg)
Some Statistics and M-Estimators
σ = 4, N = 3:
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
all 3 funcs
30 20 10 0 10 20 30
µ
0.0002
0.0001
0.0000
0.0001
0.0002
0.0003
0.0004
product all funcs
(2σ)−1/n∏iexp(− |xi −µ|/σ)
sample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 76 / 82
![Page 105: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/105.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
all 4 funcs
30 20 10 0 10 20 30
µ
0.00001
0.00000
0.00001
0.00002
0.00003
product all funcs
(2σ)−1/n∏iexp(− |xi −µ|/σ)
sample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 77 / 82
![Page 106: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/106.jpg)
Some Statistics and M-Estimators
σ = 4, N = 4:
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
all 5 funcs
30 20 10 0 10 20 30
µ
0.000001
0.000000
0.000001
0.000002
0.000003
product all funcs
(2σ)−1/n∏iexp(− |xi −µ|/σ)
sample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 78 / 82
![Page 107: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/107.jpg)
Some Statistics and M-Estimators
σ = 4, N = 7 (more robust to outliers than the Gaussian):
30 20 10 0 10 20 30
µ
0.05
0.00
0.05
0.10
all 7 funcs
30 20 10 0 10 20 30
µ
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1e 13 product all funcs
(2σ)−1/n∏iexp(− |xi −µ|/σ)
sample median
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 79 / 82
![Page 108: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/108.jpg)
Some Statistics and M-Estimators
Definition (sample correlation matrix)
1N
∑Ni=1 xix
Ti ∈ Rn×n is called the sample correlation matrix of
{xi}Ni=1 ⊂ Rn
Definition (sample covariance matrix)
1N
∑Ni=1(xi − x)(xi − x)T is called the sample covariance matrix of
{xi}Ni=1 ⊂ Rn
Fact
1N
N∑i=1
(xi − x)(xi − x)T =
(1N
N∑i=1
xixTi
)− xxT
1N
N∑i=1
(xi − x)(xi − x)T = 1N
N∑i=1
(xi − x)xTi = 1N
N∑i=1
xi(xi − x)T
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 80 / 82
![Page 109: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/109.jpg)
Some Statistics and M-Estimators
Definition (sample correlation matrix)
1N
∑Ni=1 xix
Ti ∈ Rn×n is called the sample correlation matrix of
{xi}Ni=1 ⊂ Rn
Definition (sample covariance matrix)
1N
∑Ni=1(xi − x)(xi − x)T is called the sample covariance matrix of
{xi}Ni=1 ⊂ Rn
Fact
1N
N∑i=1
(xi − x)(xi − x)T =
(1N
N∑i=1
xixTi
)− xxT
1N
N∑i=1
(xi − x)(xi − x)T = 1N
N∑i=1
(xi − x)xTi = 1N
N∑i=1
xi(xi − x)T
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 80 / 82
![Page 110: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/110.jpg)
Some Statistics and M-Estimators
Fact
Both 1N
∑Ni=1 xix
Ti and 1
N
∑Ni=1(xi − x)(xi − x)T are symmetric n× n
matrices and their eigenvalues are always nonnegative.⇒ they are SPSD. If their eigenvalues are positive, then they are also SPD.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 81 / 82
![Page 111: Computer Vision: Models, Learning and Inference Convexity ...cv192/wiki.files/CV192...CV/ML people) to focus on how to: recognize the problem is convex; or transform a non-convex problem](https://reader033.vdocument.in/reader033/viewer/2022060421/5f17fc16444a1a2ec2078f73/html5/thumbnails/111.jpg)
Some Statistics and M-Estimators
Version Log
7/4/2019, ver 1.00.
www.cs.bgu.ac.il/~cv192/ Needful Things (ver. 1.00) Apr 7, 2019 82 / 82