the steepest descent method for computing riemannian ... · computing riemannian center of mass on...

Post on 19-May-2020

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The steepest descent method for computingRiemannian center of mass on Hadamard manifolds

Joao Xavier da Cruz Neto∗

Federal University of Piauı (UFPI)

Workshop on Optimization on Manifolds

Joint work with: G.C. Bento (UFG), J.C. O. Souza (UFPI),P.R. Oliveira (UFRJ) and S.D. Bitar (UFAM)

August 9, 2019

J.X. Cruz Neto Chemnitz - German August 9, 2019 1 / 39

Summary

1 Influence of the Curvature in the Convergence of Agorithms

2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms

3 Riemannian center of mass on Riemannian manifolds

4 Numerical experiments

J.X. Cruz Neto Chemnitz - German August 9, 2019 2 / 39

Steepest descent method (SDM)

We recall the steepest descent method for solving the followingminimization problem

minx∈M

f (x), (1)

where f : M → R is continuously differentiable which its gradient isLipschitz with constant L > 0. The steepest descent method generates asequence as follows:

J.X. Cruz Neto Chemnitz - German August 9, 2019 3 / 39

Steepest descent method

Algorithm 1 (Steepest Descent Method)Initialization: Choose x0 ∈ M;Stopping rule: Given xk , if xk is a critical point of f , then set xk+p = xk

for all p ∈ N. Otherwise, compute the iterative step;Iterative step: Take as the next iterate any xk+1 ∈ M such that

xk+1 = expxk (−tkgradf (xk)), (2)

where tk is some positive stepsize.

J.X. Cruz Neto Chemnitz - German August 9, 2019 4 / 39

Steepest descent method

We consider the next two possibilities of the stepsize rule:

We choose such a sequence {tk} as follows:Given δ1, δ2 > 0 such that Lδ1 + δ2 < 1, where L is the Lipschitzconstant associated to the gradient map of f . Take {tk} such that

tk ∈(δ1,

2

L(1− δ2)

), ∀k ≥ 0. (3)

The sequence {tk} as in (3) is called fixed stepsize rule.

J.X. Cruz Neto Chemnitz - German August 9, 2019 5 / 39

Steepest descent method

Let {tk} be a sequence obtained by

tk : = max{

2−j : j ∈ N, f(

expxk (2−jgradf (xk))≤

f (xk)− α2−j ‖gradf (xk)‖2},

with α ∈ (0, 1). This stepsize rule is the so-called Armijo’s search.

J.X. Cruz Neto Chemnitz - German August 9, 2019 6 / 39

In order to obtain full convergence results in:

Cruz Neto, J.X., Lima, L.L. and Oliveira, P.R., Geodesic algorithms inRiemannian geometry. Balkan J. Geom. Appl., 3 (1998), pp. 89-100.

The authors assume that M has nonnegative sectional curvature and f is aconvex function.

J.X. Cruz Neto Chemnitz - German August 9, 2019 7 / 39

Proximal Point Method

For a starting point x0 ∈ M, the proximal point method for solvingoptimization problem (1) generates a sequence {xk} ⊂ M in the followingform:

xk+1 ∈ argminy∈M

{f (y) +

λk2d2(y , xk)

}, (4)

where {λk} is a sequence of positive numbers such that 0 < a ≤ λk ≤ b.

Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm onRiemannian manifold. Optimization. 51, 257-270 (2002)

The authors assume that M is a Hadamard manifold and f is a convexfunction.

J.X. Cruz Neto Chemnitz - German August 9, 2019 8 / 39

Summary

1 Influence of the Curvature in the Convergence of Agorithms

2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms

3 Riemannian center of mass on Riemannian manifolds

4 Numerical experiments

J.X. Cruz Neto Chemnitz - German August 9, 2019 9 / 39

Kurdyka- Lojasiewicz property

Definition

A proper and lower semicontinuous function f : M → R ∪ {+∞} is said tosatisfy the Kurdyka-Lojasiewicz property at x ∈ dom ∂f iff there existη ∈]0,+∞], a neighbourhood U of x , and a continuous concave functionϕ : [0, η[→ R+, such that

ϕ(0) = 0, ϕ ∈ C 1(0, η), ϕ′(s) > 0, s ∈]0, η[;

ϕ′(f (x)− f (x))dist(0, ∂f (x)) ≥ 1, x ∈ U ∩ [f (x) < f < f (x) + η],

J.X. Cruz Neto Chemnitz - German August 9, 2019 10 / 39

Analytic manifolds

For each x ∈ M, the distance function on M with base point x , is definedby dx := d(x , ·).

Theorem

Let M be a finite dimensional, connected, complete, real analyticRiemannian manifold and x ∈ M. Then, dx is a subanalytic function.

Martin Tamm, Subanalytic sets in the calculus of variation, ActaMath. 146 (1981), no. 3-4, 167–199.

Therefore, dx satisfy the Kurdyka- Lojasiewicz property.

J.X. Cruz Neto Chemnitz - German August 9, 2019 11 / 39

Stiefel manifolds

An important class of analytical and compact manifold whose sign ofthe sectional curvature can be not constant, is the setVp(Rn) := {X ∈ Rnp| XTX = Ip} of the n × p orthonormal matrices.

T. Rapcsak, Sectional curvatures in nonlinear optimization, J. GlobalOptim. 40 (2008), no. 1-3, 375–388.

J.X. Cruz Neto Chemnitz - German August 9, 2019 12 / 39

Morse functions

Theorem

Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.

See Hirsh Theorem 1.2, page 147

M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.

Theorem

If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.

Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)

J.X. Cruz Neto Chemnitz - German August 9, 2019 13 / 39

Morse functions

Theorem

Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.

See Hirsh Theorem 1.2, page 147

M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.

Theorem

If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.

Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)

J.X. Cruz Neto Chemnitz - German August 9, 2019 13 / 39

Proximal Point Method

Proximal Point Method

Assume that x0 ∈ domf , x ∈ M is an accumulation point of the sequence{xk}, and f satisfies the Kurdyka-Lojasiewicz property at x . Then,f (xk)→ f (x) and the sequence {xk} converges to x , which is a criticalpoint of f .

G. C. Bento, J. X. Cruz Neto, and P. R. Oliveira, A new approach tothe proximal point method: convergence on general Riemannianmanifolds, J. Optim. Theory Appl. 168 (2016), no. 3, 743–755.

J.X. Cruz Neto Chemnitz - German August 9, 2019 14 / 39

Steepest Descent Method

Let M be a Hadamard manifold, x0 ∈ domf , x∗ ∈ M is an accumulationpoint of the sequence {xk}, and f satisfies the Kurdyka-Lojasiewiczproperty at x∗. Then, {xk} converges to x∗ which is a critical point of f .

J.X. Cruz Neto Chemnitz - German August 9, 2019 15 / 39

Summary

1 Influence of the Curvature in the Convergence of Agorithms

2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms

3 Riemannian center of mass on Riemannian manifolds

4 Numerical experiments

J.X. Cruz Neto Chemnitz - German August 9, 2019 16 / 39

Computing Riemannian center of mass on Riemannianmanifolds

Consider the problem of computing (global) Riemannian Lp center of massof the data set {ai}ni=1 ⊂ M on a Riemannian manifold with respect toweights 0 ≤ wi ≤ 1, such that

∑ni=1 wi = 1. The Riemannian Lp center of

mass is defined as the solution set of the following problem

minx∈M

fp(x) :=1

p

n∑i=1

widp(x , ai ), (5)

for 1 ≤ p <∞. If p =∞, the center of mass is defined as the minimizersof maxi d(x , ai ) in M.

J.X. Cruz Neto Chemnitz - German August 9, 2019 17 / 39

Computing Riemannian center of mass on Riemannianmanifolds

The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:

Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)

Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)

Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)

Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)

J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39

Computing Riemannian center of mass on Riemannianmanifolds

The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:

Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)

Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)

Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)

Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)

J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39

Computing Riemannian center of mass on Riemannianmanifolds

The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:

Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)

Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)

Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)

Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)

J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39

Computing Riemannian center of mass on Hadamardmanifolds

Proposition 1

Let {ai}ni=1 ⊂ M be the data set and let γ be a unit speed geodesic suchthat γ(0) = x , where x 6= ai , for i = 1, . . . , n. Then, there exists aconstant α ≥ 0 such that

α ≤ d2

dt2(f1 ◦ γ)(t)|t=0.

Furthermore, if the points a1, . . . , an are not collinear, then α > 0.

Recall that the points a1, . . . , an are said to be collinear if they reside onthe same geodesic, i.e., there exist y ∈ M, v ∈ TyM and ti ∈ R,i = 1, . . . , n, such that ai = expy tiv , for each i = 1, . . . , n.

J.X. Cruz Neto Chemnitz - German August 9, 2019 19 / 39

Computing Riemannian center of mass on Hadamardmanifolds

Theorem

Let M be a simply connected, complete Riemann manifold of nonpositivesectional curvature. Assume the points Pi ∈ M, i = 1, . . . , n belong to ageodesic σ : [0, 1]→ M such that Pi = σ(ti ) with 0 ≤ ti . . . ≤ 1. Then:

1 the unique minimum point for f1 is Pn/2 whenever n is odd;

2 the minimum points for f1 are situated on σ, between Pn/2 andPn/2+1 whenever n is even.

Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)

J.X. Cruz Neto Chemnitz - German August 9, 2019 20 / 39

Computing Riemannian center of mass on Hadamardmanifolds

Proposition 2

Let C be a compact set such that ai /∈ C , for each i = 1, . . . , n. Then, thevector field grad f1 : M → TM is Lipschitz continuous on C .

J.X. Cruz Neto Chemnitz - German August 9, 2019 21 / 39

Computing Riemannian center of mass on Hadamardmanifolds

Proposition 3

The following statements hold:

(a) The function f1(x) =∑n

i=1 wid(x , ai ) is convex;

(b) The problem (5), for p = 1, always has a solution. Furthermore, if thepoints a1, . . . , an are not collinear, then the solution is unique;

(c) Let i0 ∈ {1, . . . , n} be an index such that f1(ai0) = mini=1,...,n f1(ai ).Then, ai0 is a minimizer of f1 on M if and only if∣∣∣∣∣∣

∣∣∣∣∣∣n∑

i=1,i 6=i0

wi

exp−1ai0ai

d(ai , ai0)

∣∣∣∣∣∣∣∣∣∣∣∣ ≤ wi0 .

J.X. Cruz Neto Chemnitz - German August 9, 2019 22 / 39

Computing Riemannian center of mass on Hadamardmanifolds

Proposition 4

The following statements hold:

(a) The function f2(x) = 12

∑ni=1 wid

2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;

(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.

Proposition 5

The function f2(x) = 12

∑ni=1 wid

2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.

J.X. Cruz Neto Chemnitz - German August 9, 2019 23 / 39

Computing Riemannian center of mass on Hadamardmanifolds

Proposition 4

The following statements hold:

(a) The function f2(x) = 12

∑ni=1 wid

2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;

(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.

Proposition 5

The function f2(x) = 12

∑ni=1 wid

2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.

J.X. Cruz Neto Chemnitz - German August 9, 2019 23 / 39

Computing Riemannian center of mass on Hadamardmanifolds

We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in

{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,

we have that ai /∈ Lf (x0), for each i = 1, . . . , n.

Theorem 2

The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.

Theorem 3

The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.

J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39

Computing Riemannian center of mass on Hadamardmanifolds

We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in

{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,

we have that ai /∈ Lf (x0), for each i = 1, . . . , n.

Theorem 2

The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.

Theorem 3

The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.

J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39

Computing Riemannian center of mass on Hadamardmanifolds

We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in

{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,

we have that ai /∈ Lf (x0), for each i = 1, . . . , n.

Theorem 2

The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.

Theorem 3

The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.

J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39

Summary

1 Influence of the Curvature in the Convergence of Agorithms

2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms

3 Riemannian center of mass on Riemannian manifolds

4 Numerical experiments

J.X. Cruz Neto Chemnitz - German August 9, 2019 25 / 39

Numerical experiments

Let M := (Sm++, 〈 , 〉) be the Riemannian manifold endowed with theRiemannian metric induced by the Euclidean Hessian ofΨ(X ) = − ln detX ,

〈U,V 〉 = tr (VΨ′′(X )U) = tr (VX−1UX−1), X ∈ M, U,V ∈ TXM,

where Sm++ be the cone of the symmetric positive definite matrices bothm ×m.In this case, for any X ,Y ∈ M the unique geodesic joining those twopoints is given by:

γ(t) = X 1/2(X−1/2YX−1/2

)tX 1/2, t ∈ [0, 1].

J.X. Cruz Neto Chemnitz - German August 9, 2019 26 / 39

Numerical experiments

Thus, for each X ∈ M, expX : TXM → M and exp−1X : M → TXM aregiven, respectively, by

expX V = X 1/2e(X−1/2YX−1/2)X 1/2, exp−1X Y = X 1/2 ln(X−1/2YX−1/2

)X 1/2.

d2(X ,Y ) = tr(

ln2 X−1/2YX−1/2)

=n∑

i=1

ln2 λi

(X−

12YX−

12

), (6)

J.X. Cruz Neto Chemnitz - German August 9, 2019 27 / 39

Numerical experiments

In our simulations, we consider different scenes taking into account threeparameters: the number of matrices n in the data set {Qi}ni=1, the sizem ×m of the matrices and the stopping rule ε > 0. The random matriceswe use for our test are generated with an uniform (well conditioned) andnon-uniform (ill conditioned) distribution of the eigenvalues of each matrixof the data set. The ill conditioned data set is generated as follows:

Hence, the non-uniform distribution satisfiesλmax

λmin> 102, where λmax and

λmin denote the largest and the smallest eigenvalues of each matrix of thedata set, respectively.

J.X. Cruz Neto Chemnitz - German August 9, 2019 28 / 39

Numerical experiments

Next figures plot some results for m ×m matrices (m = 5, 10, 20, 40) fordifferent data sets (n = 25, 50, 75, 100).

Figure: Uniform : m = 5 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 29 / 39

Numerical experiments

Figure: Non-uniform - m = 5 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 30 / 39

Numerical experiments

Figure: Uniform : m = 10 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 31 / 39

Numerical experiments

Figure: Non-uniform - m = 10 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 32 / 39

Numerical experiments

Figure: Uniform : m = 20 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 33 / 39

Numerical experiments

Figure: Non-uniform - m = 20 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 34 / 39

Numerical experiments

Figure: Uniform : m = 40 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 35 / 39

Numerical experiments

Figure: Non-uniform - m = 40 and ε = 10−8

J.X. Cruz Neto Chemnitz - German August 9, 2019 36 / 39

Numerical experiments

J.X. Cruz Neto Chemnitz - German August 9, 2019 37 / 39

Numerical experiments

J.X. Cruz Neto Chemnitz - German August 9, 2019 38 / 39

Thank you for you attencion!

J.X. Cruz Neto Chemnitz - German August 9, 2019 39 / 39

top related