optimization on manifolds and data processing · 2006. 8. 27. · eigenvalue problems as...
TRANSCRIPT
![Page 1: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/1.jpg)
Optimization on manifolds and data
processingRodolphe Sepulchre
Department of Electrical Engineering and Computer Science
University of Liège, Belgium
Collaborators: P.A. Absil (U Louvain and U Cambridge)
Robert Mahony (Australian National U)
Michel Journée (U Liège)
Andrew Teschendorff (U Cambridge)
Principal manifolds workshop – Leicester – August 2006 – p. 1/??
![Page 2: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/2.jpg)
Algorithms on manifolds
Principal manifolds: lines (or surfaces) passing through themiddle of the data distribution.
Question: How to define and compute such things when thedata are not points in IRn but points on abstract manifolds?
Motivation: SYMMETRYIn many problems, data represent geometric objects thatare invariant under certain transformations.
Principal manifolds workshop – Leicester – August 2006 – p. 2/??
![Page 3: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/3.jpg)
A three-step approach
An optimization-based formulation of thecomputational problem
Generalization of optimization algorithms on abstractmanifolds
Exploit flexibility and additional structure to buildnumerically efficient algorithms
Optimization algorithms on matrix manifolds, book in preparationP.-A. Absil, R. Mahony, R. Sepulchre.
Principal manifolds workshop – Leicester – August 2006 – p. 3/??
![Page 4: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/4.jpg)
Applications
Eigenvalue problems(Invariant subspace calculation, PCA, SVD, . . . )
Statistical problems(Matrix approximations, ICA, . . . )
Pose estimation and motion recovery
. . .
Principal manifolds workshop – Leicester – August 2006 – p. 4/??
![Page 5: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/5.jpg)
Outline
Part I: a quick illustration of the three steps
Part II: ICA and gene expression data analysis
Principal manifolds workshop – Leicester – August 2006 – p. 5/??
![Page 6: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/6.jpg)
Eigenvalue problems as optimization
Let A a n × n symmetric matrix.Find an eigenvalue λ ∈ IR and an eigenvector y ∈ IRn suchthat Ay = λy
FACT: Eigenvectors are critical points of the Rayleighquotient
f : IRn∗ → IR : f(y) =
yT Ay
yT y
The global minimum is the leftmost eigenvector.
Principal manifolds workshop – Leicester – August 2006 – p. 6/??
![Page 7: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/7.jpg)
Manifolds associated to eigenvectors
SYMMETRY: f(µy) = f(y) ∀µ ∈ IR∗
⇒ critical points are not isolated in IRn.
REMEDY:Impose a normalization constraint ‖ y ‖= 1
⇒ Optimization on the sphere Sn−1
ortreat yIR∗ as one point in the projective space
Pn−1 = {yIR∗ : y ∈ IRn∗}
⇒ Optimization on the projective space Pn−1
Principal manifolds workshop – Leicester – August 2006 – p. 7/??
![Page 8: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/8.jpg)
Generalized eigenvalue problems
Let A,B n × n symmetric and B positive definite.Find (λ, y) such that Ay = λBy
The cost function is now defined over the full rank n× p
matrices:
f(Y ) = trace(YTAY(YTBY)−1)
Y∗ is a global minimizer of f iff Y∗ span the leftmostp-dimensional invariant subspace of B−1A.
Principal manifolds workshop – Leicester – August 2006 – p. 8/??
![Page 9: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/9.jpg)
Manifolds for invariant subspaces
SYMMETRY: f(Y M) = f(Y ) for all full rank p × p matrix M⇒ critical points are not isolated in IRn×p.
REMEDY:Impose a normalization constraint ‖ Y T Y ‖= Ip
⇒ Optimization on the Stiefel manifold St(p, n)ortreat yGL(p) as one point in the Grassmann manifoldGr(p, n) of p-dimensional subspaces of IRn.
Principal manifolds workshop – Leicester – August 2006 – p. 9/??
![Page 10: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/10.jpg)
Important matrix manifolds
Sn−1, St(p, n) are examples of embedded manifolds invector spaces.
Pn−1, Gr(p, n) are examples of quotient manifolds invector spaces
The linear structure of the total vector space is very helpfulfor computations!
Principal manifolds workshop – Leicester – August 2006 – p. 10/??
![Page 11: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/11.jpg)
A three-step approach
An optimization-based formulation of thecomputational problem
Generalization of optimization algorithms on abstractmanifolds
Exploit flexibility and additional structure to buildnumerically efficient algorithms
How different is an algorithm in a vector space and on amanifold?Illustration: line-search algorithm
Principal manifolds workshop – Leicester – August 2006 – p. 11/??
![Page 12: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/12.jpg)
Line search in a vector space
xk+1 = xk + tkηk
The vector ηk is a search directionThe scalar tk dictates the step length≈ discretized version of the continuous-time descentgradient flow
x = −gradf(x)
Principal manifolds workshop – Leicester – August 2006 – p. 12/??
![Page 13: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/13.jpg)
Line search on a manifold
Let M an abstract Riemannian manifold.
xk+1 = Expxk(tkξ) = γ(tk : xk, ξ)
Start at xk; choose a direction ξ in the tangent space TxM ;follow for tk units the geodesic passing at xk and tangent toξ.( Luenberger, 73; Gabay, 82).Conceptually elegant and useful; numerically unpractical.
Principal manifolds workshop – Leicester – August 2006 – p. 13/??
![Page 14: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/14.jpg)
Optimization on manifolds
Newton method (Smith 93, Mahony 94)
Conjugated gradients (Edelman 96)
Trust region method (Absil et al. 04)
. . .
Translation of corresponding algorithms in vector spaces +convergence theory.
Principal manifolds workshop – Leicester – August 2006 – p. 14/??
![Page 15: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/15.jpg)
A three-step approach
An optimization-based formulation of thecomputational problem
Generalization of optimization algorithms on abstractmanifolds
Exploit flexibility and additional structure to buildnumerically efficient algorithms
Does this approach lead to competitive numerical algorithms?Illustration: line-search algorithm
Principal manifolds workshop – Leicester – August 2006 – p. 15/??
![Page 16: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/16.jpg)
Retractions
xk+1 = Rxk(tkξ)
The convergence theory of line search methods still holds ifthe exponential mapping is replaced by ANY mappingR : TM → M satisfying Rx(0x) = x and DRx(0x) = idTxM .
Principal manifolds workshop – Leicester – August 2006 – p. 16/??
![Page 17: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/17.jpg)
Examples of retractions
Use the linear structure of the total space:On Sn−1: Rx(ξ) = x+ξ
‖x+ξ‖
On Gr(p, n): RspanY (ξ) = span(Y + ξY ) with ξY thehorizontal lift of ξ
Good retractions may turn the algorithm into a numericallyefficient procedure.
Principal manifolds workshop – Leicester – August 2006 – p. 17/??
![Page 18: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/18.jpg)
State of the art
Brute force trust-region algorithms applied to the Rayleighquotient cost on Gr(p, n) (Absil et al, 04) compete with thebest available numerical algorithms for large-scaleproblems.
Principal manifolds workshop – Leicester – August 2006 – p. 18/??
![Page 19: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/19.jpg)
Some benefits of the approach
A solid framework for convergence analysis;
A geometric interpretation of existing heuristics;
Sometimes, new and competitive algorithms
More inOptimization algorithms on matrix manifolds,Princeton University Press, 2007.P.-A. Absil, R. Mahony, R. Sepulchre.
Principal manifolds workshop – Leicester – August 2006 – p. 19/??
![Page 20: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/20.jpg)
Extracting Independent Components of
Gene Expression Data
Michel Journee, Rodolphe Sepulchre, Pierre-Antoine Absil
Department of Electrical Engineering and Computer ScienceUniversity of Liege, Belgium
Workshop on Principal Manifolds, Leicester, August 2006
![Page 21: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/21.jpg)
Independent Component Analysis
• Blind source separation based on the statistical independence of the sources.
• It assumes a linear, instantaneous and noisy mixture of sources,
x = Hs + v, H ∈ Rn×p.
➠ Given the observations x, identify the mixing matrix H and the independentsources s.
Workshop on Principal Manifolds, Leicester, August 2006 1
![Page 22: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/22.jpg)
Outline
• ICA algorithms are optimization algorithms on manifolds.
• The application of ICA to gene expression data raises central issues.(Cost function, manifold, optimization algorithm?)
Workshop on Principal Manifolds, Leicester, August 2006 2
![Page 23: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/23.jpg)
The basic ICA algorithm
1. Let assume a linear demixing model: z = WTx, W ∈ R
n×p.
2. Measure the statistical independence of the estimated sources zi (⇒ contrast).
3. Select the W ∗ that maximizes that measure.
➠ Two main features: the contrast and the optimization algorithm.
Workshop on Principal Manifolds, Leicester, August 2006 3
![Page 24: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/24.jpg)
The contrast
• Definition:
A function γ(·) : W ∈ M → γ(W ) ∈ R that measures the statisticalindependence of the zi.
• Different types of contrast:
➠ Based on the mutual information (MI is zero at the independence andotherwise always positive).
➠ Diagonalization of the rth-order cumulant tensor (usually r=4).
➠ Joint approximate diagonalization of a set of matrices (SOBI, JADE, etc.).
➠ The constrained covariance: supf,g
cov(f(z1), g(z2)).
➠ ...
Workshop on Principal Manifolds, Leicester, August 2006 4
![Page 25: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/25.jpg)
The optimization algorithm
• Optimization on a matrix manifold: W ∗ = argmaxW∈M
γ(W ).
• Which manifold M ?
Inherent symmetries of ICA:
➠ Continuous symmetry: W ∼ WΛ, with Λ an invertible diagonal matrix.
➠ Discrete symmetry: W ∼ WP , with P a permutation matrix.
Workshop on Principal Manifolds, Leicester, August 2006 5
![Page 26: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/26.jpg)
Choice of a manifold
• Optimization on the orthogonal group:
Op = {Y ∈ Rp×p : Y TY = Ip}.
➠ Jacobi algorithms (JADE, SOBI, RADICAL), KernelICA.
• Optimization on the orthogonal Stiefel manifold:
St(n, p) = {Y ∈ Rn×p : Y TY = Ip}.
➠ FastICA (one-unit algorithm used in a deflation scheme).
• Optimization on the oblique manifold [P.-A. Absil and K.A. Gallivan, 2006]:
OB(n, p) = {Y ∈ Rn×p : diag(Y TY ) = Ip}.
➠ Trust region optimization.
Workshop on Principal Manifolds, Leicester, August 2006 6
![Page 27: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/27.jpg)
Prewhitening in ICA
• ICA is usually used in conjunction with PCA.
PCA ICAx x z
• Motivations for prewhitening:
➠ Good-conditioning of the ICA problem.
➠ Reduction of the dimensions of the ICA problem.
➠ Restriction of the ICA optimization to the orthogonal Stiefel manifold(prewhitening-based algorithms).
Workshop on Principal Manifolds, Leicester, August 2006 7
![Page 28: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/28.jpg)
Discussion about prewhitening
• The prewhitening step is biased in the presence of noise and outliers.
Optimization on orthogonal manifolds is not able to compensate for theseerrors.
Optimization on non-orthogonal manifolds is more accurate.
• Optimization algorithms on orthogonal manifolds are usually betterconditioned.
Optimization on non-orthogonal manifolds might be less robust.
• The compromise between performance and robustness is rarely discussed inthe literature, especially for high-dimensional problems.
Workshop on Principal Manifolds, Leicester, August 2006 8
![Page 29: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/29.jpg)
Outline
• ICA algorithms are optimization algorithms on manifolds.
• The application of ICA to gene expression data raises central issues.(Cost function, manifold, optimization algorithm?)
Workshop on Principal Manifolds, Leicester, August 2006 9
![Page 30: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/30.jpg)
What are gene expression data?
• Gene expression denotes the relevance of a specific gene on the biologicalfunctions to be fulfilled in the cell.
• DNA microarrays are intensively used in biochemistry and biomedicine toestimate the gene expression levels.
• They provide a huge amount of data (typically, ∼10.000 genes and ∼100experiments).
➠ Dimensionality reduction methods are needed for the analysis of these data.
Workshop on Principal Manifolds, Leicester, August 2006 10
![Page 31: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/31.jpg)
Dimensionality reduction by ICA: Motivation
• Each biological function relies on a subset of genes (expression mode).
• Gene expression levels result from several biological processes that take placeindependently.
• Gene expression is assumed to be a linear function of the expression modes.
➠ Independence and linearity are the basic requisites for ICA1.
1First application of ICA to microarrays:
W. Liebermeister, Linear modes of gene expression determined by independent component analysis, Bioinformatics18 (2002), 51–60.
Workshop on Principal Manifolds, Leicester, August 2006 11
![Page 32: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/32.jpg)
ICA for the analysis of gene expression data
Experiments
Gen
es
Expression modes
Gen
es
Experiments
Exp
ress
ion
mod
es
=
X H S= .
∼ 104
∼ 102
Workshop on Principal Manifolds, Leicester, August 2006 12
![Page 33: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/33.jpg)
Preliminary results
• Application of standard ICA algorithms to breast cancer databases2.
• Performance:ICA seems to outperform PCA in relating expression modes to biologicalpathways (i.e., groups of genes that participate together when a certainbiological function is required).
2In collaboration with A.E. Teschendorff, Department of Oncology, University of Cambridge
Workshop on Principal Manifolds, Leicester, August 2006 13
![Page 34: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/34.jpg)
Challenges
Standard ICA algorithms are not well adapted for gene expression data.(i.e., few experiments, many observations, lot of outliers and noise.)
➠ New algorithmic developments are needed, i.e, cost functions, manifolds andoptimization algorithms specially dedicated to this kind of data sets.
Workshop on Principal Manifolds, Leicester, August 2006 14
![Page 35: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W](https://reader036.vdocument.in/reader036/viewer/2022071408/60ffde498de212023e31db70/html5/thumbnails/35.jpg)
Conclusion
• ICA performs dimensionality reduction by assuming that the observations arisefrom several independent sources.
• ICA algorithms are optimization-based algorithms on manifolds.
• ICA seems promising for the analysis of microarrays but raises central robustnessand performance issues.
Workshop on Principal Manifolds, Leicester, August 2006 15