the steepest descent method for computing riemannian ... · computing riemannian center of mass on...
TRANSCRIPT
The steepest descent method for computingRiemannian center of mass on Hadamard manifolds
Joao Xavier da Cruz Neto∗
Federal University of Piauı (UFPI)
Workshop on Optimization on Manifolds
Joint work with: G.C. Bento (UFG), J.C. O. Souza (UFPI),P.R. Oliveira (UFRJ) and S.D. Bitar (UFAM)
August 9, 2019
J.X. Cruz Neto Chemnitz - German August 9, 2019 1 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 2 / 39
Steepest descent method (SDM)
We recall the steepest descent method for solving the followingminimization problem
minx∈M
f (x), (1)
where f : M → R is continuously differentiable which its gradient isLipschitz with constant L > 0. The steepest descent method generates asequence as follows:
J.X. Cruz Neto Chemnitz - German August 9, 2019 3 / 39
Steepest descent method
Algorithm 1 (Steepest Descent Method)Initialization: Choose x0 ∈ M;Stopping rule: Given xk , if xk is a critical point of f , then set xk+p = xk
for all p ∈ N. Otherwise, compute the iterative step;Iterative step: Take as the next iterate any xk+1 ∈ M such that
xk+1 = expxk (−tkgradf (xk)), (2)
where tk is some positive stepsize.
J.X. Cruz Neto Chemnitz - German August 9, 2019 4 / 39
Steepest descent method
We consider the next two possibilities of the stepsize rule:
We choose such a sequence {tk} as follows:Given δ1, δ2 > 0 such that Lδ1 + δ2 < 1, where L is the Lipschitzconstant associated to the gradient map of f . Take {tk} such that
tk ∈(δ1,
2
L(1− δ2)
), ∀k ≥ 0. (3)
The sequence {tk} as in (3) is called fixed stepsize rule.
J.X. Cruz Neto Chemnitz - German August 9, 2019 5 / 39
Steepest descent method
Let {tk} be a sequence obtained by
tk : = max{
2−j : j ∈ N, f(
expxk (2−jgradf (xk))≤
f (xk)− α2−j ‖gradf (xk)‖2},
with α ∈ (0, 1). This stepsize rule is the so-called Armijo’s search.
J.X. Cruz Neto Chemnitz - German August 9, 2019 6 / 39
In order to obtain full convergence results in:
Cruz Neto, J.X., Lima, L.L. and Oliveira, P.R., Geodesic algorithms inRiemannian geometry. Balkan J. Geom. Appl., 3 (1998), pp. 89-100.
The authors assume that M has nonnegative sectional curvature and f is aconvex function.
J.X. Cruz Neto Chemnitz - German August 9, 2019 7 / 39
Proximal Point Method
For a starting point x0 ∈ M, the proximal point method for solvingoptimization problem (1) generates a sequence {xk} ⊂ M in the followingform:
xk+1 ∈ argminy∈M
{f (y) +
λk2d2(y , xk)
}, (4)
where {λk} is a sequence of positive numbers such that 0 < a ≤ λk ≤ b.
Ferreira, O.P., Oliveira, P.R.: Proximal point algorithm onRiemannian manifold. Optimization. 51, 257-270 (2002)
The authors assume that M is a Hadamard manifold and f is a convexfunction.
J.X. Cruz Neto Chemnitz - German August 9, 2019 8 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 9 / 39
Kurdyka- Lojasiewicz property
Definition
A proper and lower semicontinuous function f : M → R ∪ {+∞} is said tosatisfy the Kurdyka-Lojasiewicz property at x ∈ dom ∂f iff there existη ∈]0,+∞], a neighbourhood U of x , and a continuous concave functionϕ : [0, η[→ R+, such that
ϕ(0) = 0, ϕ ∈ C 1(0, η), ϕ′(s) > 0, s ∈]0, η[;
ϕ′(f (x)− f (x))dist(0, ∂f (x)) ≥ 1, x ∈ U ∩ [f (x) < f < f (x) + η],
J.X. Cruz Neto Chemnitz - German August 9, 2019 10 / 39
Analytic manifolds
For each x ∈ M, the distance function on M with base point x , is definedby dx := d(x , ·).
Theorem
Let M be a finite dimensional, connected, complete, real analyticRiemannian manifold and x ∈ M. Then, dx is a subanalytic function.
Martin Tamm, Subanalytic sets in the calculus of variation, ActaMath. 146 (1981), no. 3-4, 167–199.
Therefore, dx satisfy the Kurdyka- Lojasiewicz property.
J.X. Cruz Neto Chemnitz - German August 9, 2019 11 / 39
Stiefel manifolds
An important class of analytical and compact manifold whose sign ofthe sectional curvature can be not constant, is the setVp(Rn) := {X ∈ Rnp| XTX = Ip} of the n × p orthonormal matrices.
T. Rapcsak, Sectional curvatures in nonlinear optimization, J. GlobalOptim. 40 (2008), no. 1-3, 375–388.
J.X. Cruz Neto Chemnitz - German August 9, 2019 12 / 39
Morse functions
Theorem
Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.
See Hirsh Theorem 1.2, page 147
M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.
Theorem
If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.
Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)
J.X. Cruz Neto Chemnitz - German August 9, 2019 13 / 39
Morse functions
Theorem
Let M be a manifold and denote by C r (M,R), the set of all C r functionsg : M → R. The collection of all the Morse functions f : M → R form adense and open set in C r (M,R), 2 ≤ r ≤ +∞.
See Hirsh Theorem 1.2, page 147
M. W. Hirsch. 1976. Differential Topology . Spring - Verlag. NewYork.
Theorem
If f : M → R is a Morse function, then f satisfy the Kurdyka- Lojasiewiczproperty.
Cruz Neto, J. X., Oliveira, P. R., Soares Junior, P. A.: Soubeyran, A.Learning how to play Nash and Alternating minimization method forstructured nonconvex problems on Riemannian manifolds. J. ConvexAnal., 20, 395-438 (2013)
J.X. Cruz Neto Chemnitz - German August 9, 2019 13 / 39
Proximal Point Method
Proximal Point Method
Assume that x0 ∈ domf , x ∈ M is an accumulation point of the sequence{xk}, and f satisfies the Kurdyka-Lojasiewicz property at x . Then,f (xk)→ f (x) and the sequence {xk} converges to x , which is a criticalpoint of f .
G. C. Bento, J. X. Cruz Neto, and P. R. Oliveira, A new approach tothe proximal point method: convergence on general Riemannianmanifolds, J. Optim. Theory Appl. 168 (2016), no. 3, 743–755.
J.X. Cruz Neto Chemnitz - German August 9, 2019 14 / 39
Steepest Descent Method
Let M be a Hadamard manifold, x0 ∈ domf , x∗ ∈ M is an accumulationpoint of the sequence {xk}, and f satisfies the Kurdyka-Lojasiewiczproperty at x∗. Then, {xk} converges to x∗ which is a critical point of f .
J.X. Cruz Neto Chemnitz - German August 9, 2019 15 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 16 / 39
Computing Riemannian center of mass on Riemannianmanifolds
Consider the problem of computing (global) Riemannian Lp center of massof the data set {ai}ni=1 ⊂ M on a Riemannian manifold with respect toweights 0 ≤ wi ≤ 1, such that
∑ni=1 wi = 1. The Riemannian Lp center of
mass is defined as the solution set of the following problem
minx∈M
fp(x) :=1
p
n∑i=1
widp(x , ai ), (5)
for 1 ≤ p <∞. If p =∞, the center of mass is defined as the minimizersof maxi d(x , ai ) in M.
J.X. Cruz Neto Chemnitz - German August 9, 2019 17 / 39
Computing Riemannian center of mass on Riemannianmanifolds
The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)
Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)
Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39
Computing Riemannian center of mass on Riemannianmanifolds
The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)
Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)
Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39
Computing Riemannian center of mass on Riemannianmanifolds
The problem of computing the Riemannian center of mass has beenextensively studied in both theory and applications:
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
Afsari, B., Tron, R., Vidal, R. : On the convergence of gradientdescent for finding the riemannian center of mass. SIAM J. ControlOptim. 51 2230-2260 (2013)
Bacak,M.: Computing medians and means in Hadamard spaces.SIAM J. Optim. 24 1542-1566 (2014)
Bento, G. C., Bitar, S., Cruz Neto, J. X., Oliveira, P. R., Souza, J. C.O.: Computing Riemannian center of mass on Hadamard manifolds.J. Optim. Theory Appl. (to appear 2019)
J.X. Cruz Neto Chemnitz - German August 9, 2019 18 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 1
Let {ai}ni=1 ⊂ M be the data set and let γ be a unit speed geodesic suchthat γ(0) = x , where x 6= ai , for i = 1, . . . , n. Then, there exists aconstant α ≥ 0 such that
α ≤ d2
dt2(f1 ◦ γ)(t)|t=0.
Furthermore, if the points a1, . . . , an are not collinear, then α > 0.
Recall that the points a1, . . . , an are said to be collinear if they reside onthe same geodesic, i.e., there exist y ∈ M, v ∈ TyM and ti ∈ R,i = 1, . . . , n, such that ai = expy tiv , for each i = 1, . . . , n.
J.X. Cruz Neto Chemnitz - German August 9, 2019 19 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Theorem
Let M be a simply connected, complete Riemann manifold of nonpositivesectional curvature. Assume the points Pi ∈ M, i = 1, . . . , n belong to ageodesic σ : [0, 1]→ M such that Pi = σ(ti ) with 0 ≤ ti . . . ≤ 1. Then:
1 the unique minimum point for f1 is Pn/2 whenever n is odd;
2 the minimum points for f1 are situated on σ, between Pn/2 andPn/2+1 whenever n is even.
Kristaly, A., Morosanu, G., Roth, A.: Optimal placement of a depositbetween markets: Riemannian-Finsler geometrical approach. J.Optim. Theory Appl. 139(2), 263-276 (2008)
J.X. Cruz Neto Chemnitz - German August 9, 2019 20 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 2
Let C be a compact set such that ai /∈ C , for each i = 1, . . . , n. Then, thevector field grad f1 : M → TM is Lipschitz continuous on C .
J.X. Cruz Neto Chemnitz - German August 9, 2019 21 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 3
The following statements hold:
(a) The function f1(x) =∑n
i=1 wid(x , ai ) is convex;
(b) The problem (5), for p = 1, always has a solution. Furthermore, if thepoints a1, . . . , an are not collinear, then the solution is unique;
(c) Let i0 ∈ {1, . . . , n} be an index such that f1(ai0) = mini=1,...,n f1(ai ).Then, ai0 is a minimizer of f1 on M if and only if∣∣∣∣∣∣
∣∣∣∣∣∣n∑
i=1,i 6=i0
wi
exp−1ai0ai
d(ai , ai0)
∣∣∣∣∣∣∣∣∣∣∣∣ ≤ wi0 .
J.X. Cruz Neto Chemnitz - German August 9, 2019 22 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 4
The following statements hold:
(a) The function f2(x) = 12
∑ni=1 wid
2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;
(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.
Proposition 5
The function f2(x) = 12
∑ni=1 wid
2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.
J.X. Cruz Neto Chemnitz - German August 9, 2019 23 / 39
Computing Riemannian center of mass on Hadamardmanifolds
Proposition 4
The following statements hold:
(a) The function f2(x) = 12
∑ni=1 wid
2(x , ai ) is strictly convex andcontinuously differentiable with its gradient Lipschitz on compact sets;
(b) The problem of computing Riemannian L2 center of mass always hasa unique solution.
Proposition 5
The function f2(x) = 12
∑ni=1 wid
2(x , ai ) satisfies the Kurdyka- Lojasiewiczinequality at every point of M.
J.X. Cruz Neto Chemnitz - German August 9, 2019 23 / 39
Computing Riemannian center of mass on Hadamardmanifolds
We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in
{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,
we have that ai /∈ Lf (x0), for each i = 1, . . . , n.
Theorem 2
The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.
Theorem 3
The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.
J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39
Computing Riemannian center of mass on Hadamardmanifolds
We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in
{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,
we have that ai /∈ Lf (x0), for each i = 1, . . . , n.
Theorem 2
The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.
Theorem 3
The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.
J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39
Computing Riemannian center of mass on Hadamardmanifolds
We have that for any x0 ∈ M, xk ∈ Lf (x0), for all k ∈ N, and Lf (x0) is anonempty and compact set. Then, we consider a direction dq and tq smallenough such that f (expaq tqdq) < f (aq), where q denotes the index in
{1, . . . , n} such that f (aq) = mini=1,...,n f (ai ). Setting x0 := expaq tqdq,
we have that ai /∈ Lf (x0), for each i = 1, . . . , n.
Theorem 2
The sequence {xk} converges to the unique Riemannian L1 center of massof the data set {ai}ni=1 as long as the points ai , for i = 1, . . . , n, are notcollinear.
Theorem 3
The sequence {xk} converges to the unique Riemannian L2 center of massof the data set {ai}ni=1.
J.X. Cruz Neto Chemnitz - German August 9, 2019 24 / 39
Summary
1 Influence of the Curvature in the Convergence of Agorithms
2 Influence of the Kurdyka- Lojasiewicz property in the Convergence ofAlgorithms
3 Riemannian center of mass on Riemannian manifolds
4 Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 25 / 39
Numerical experiments
Let M := (Sm++, 〈 , 〉) be the Riemannian manifold endowed with theRiemannian metric induced by the Euclidean Hessian ofΨ(X ) = − ln detX ,
〈U,V 〉 = tr (VΨ′′(X )U) = tr (VX−1UX−1), X ∈ M, U,V ∈ TXM,
where Sm++ be the cone of the symmetric positive definite matrices bothm ×m.In this case, for any X ,Y ∈ M the unique geodesic joining those twopoints is given by:
γ(t) = X 1/2(X−1/2YX−1/2
)tX 1/2, t ∈ [0, 1].
J.X. Cruz Neto Chemnitz - German August 9, 2019 26 / 39
Numerical experiments
Thus, for each X ∈ M, expX : TXM → M and exp−1X : M → TXM aregiven, respectively, by
expX V = X 1/2e(X−1/2YX−1/2)X 1/2, exp−1X Y = X 1/2 ln(X−1/2YX−1/2
)X 1/2.
d2(X ,Y ) = tr(
ln2 X−1/2YX−1/2)
=n∑
i=1
ln2 λi
(X−
12YX−
12
), (6)
J.X. Cruz Neto Chemnitz - German August 9, 2019 27 / 39
Numerical experiments
In our simulations, we consider different scenes taking into account threeparameters: the number of matrices n in the data set {Qi}ni=1, the sizem ×m of the matrices and the stopping rule ε > 0. The random matriceswe use for our test are generated with an uniform (well conditioned) andnon-uniform (ill conditioned) distribution of the eigenvalues of each matrixof the data set. The ill conditioned data set is generated as follows:
Hence, the non-uniform distribution satisfiesλmax
λmin> 102, where λmax and
λmin denote the largest and the smallest eigenvalues of each matrix of thedata set, respectively.
J.X. Cruz Neto Chemnitz - German August 9, 2019 28 / 39
Numerical experiments
Next figures plot some results for m ×m matrices (m = 5, 10, 20, 40) fordifferent data sets (n = 25, 50, 75, 100).
Figure: Uniform : m = 5 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 29 / 39
Numerical experiments
Figure: Non-uniform - m = 5 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 30 / 39
Numerical experiments
Figure: Uniform : m = 10 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 31 / 39
Numerical experiments
Figure: Non-uniform - m = 10 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 32 / 39
Numerical experiments
Figure: Uniform : m = 20 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 33 / 39
Numerical experiments
Figure: Non-uniform - m = 20 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 34 / 39
Numerical experiments
Figure: Uniform : m = 40 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 35 / 39
Numerical experiments
Figure: Non-uniform - m = 40 and ε = 10−8
J.X. Cruz Neto Chemnitz - German August 9, 2019 36 / 39
Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 37 / 39
Numerical experiments
J.X. Cruz Neto Chemnitz - German August 9, 2019 38 / 39
Thank you for you attencion!
J.X. Cruz Neto Chemnitz - German August 9, 2019 39 / 39