incremental_svd_missingdata (1).pdf
TRANSCRIPT
-
8/10/2019 Incremental_SVD_missingdata (1).pdf
1/14
MERL A MITSUBISHI ELECTRIC RESEARCH LABORATORY
http://www.merl.com
Incremental singular value decomposition of
uncertain data with missing values
Matthew Brand
TR-2002-24 May 2002
Abstract
We introduce an incremental singular value decomposition (SV D) of incomplete data. The SV D
is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated un-
certainty across rows or columns of the measurement matrix, and user priors. Since incomplete
data does not uniquely specify an SV D, the procedure selects one having minimal rank. For
a dense p qmatrix of low rankr , the incremental method has time complexity O(pqr) andspace complexity O((p+ q)r)better than highly optimized batch algorithms such as MAT-LA Bs svd(). In cases of missing data, it produces factorings of lower rank and residual than
batch SV D algorithms applied to standard missing-data imputations. We show applications in
computer vision and audio feature extraction. In computer vision, we use the incremental S VD to
develop an efficient and unusually robust subspace-estimating flow-based tracker, and to handle
occlusions/missing points in structure-from-motion factorizations.
First circulated spring 2001.
This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in
part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies
include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an
acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying,
reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information
Technology Center America. All rights reserved.
Copyright cMitsubishi Electric Information Technology Center America, 2002
201 Broadway, Cambridge, Massachusetts 02139
-
8/10/2019 Incremental_SVD_missingdata (1).pdf
2/14
Proceedings of the 2002 European Conference on Computer Vision (ECCV2002 Copenhagen), Springer
Lecture Notes in Computer Science volume 2350.
-
8/10/2019 Incremental_SVD_missingdata (1).pdf
3/14
Incremental singular value decomposition of uncertain data with
missing values
Matthew Brand
Mitsubishi Electric Research Labs, 201 Broadway, Cambridge 02139 MA, USA
To appear, Proc. European Conference on Computer Vision (ECCV), May 2002.
Abstract. We introduce an incremental singular value decomposition ( SV D) of incomplete data. The
SV D is developed as data arrives, and can handle arbitrary missing/untrusted values, correlated uncer-
tainty across rows or columns of the measurement matrix, and user priors. Since incomplete data does not
uniquely specify an S VD, the procedure selects one having minimal rank. For a dense pqmatrix of lowrankr, the incremental method has time complexity O(pqr) and space complexity O((p + q)r)better
than highly optimized batch algorithms such as MATLAB
s svd
(). In cases of missing data, it producesfactorings of lower rank and residual than batch S VD algorithms applied to standard missing-data imputa-
tions. We show applications in computer vision and audio feature extraction. In computer vision, we use
the incremental S VD to develop an efficient and unusually robust subspace-estimating flow-based tracker,
and to handle occlusions/missing points in structure-from-motion factorizations.
1 Introduction
Many natural phenomena can be faithfully modeled with multilinear functions, or closely approximated as
such. Examples include the combination of lighting and pose [20] and shape and motion [12,3] in image
formation, mixing of sources in acoustic recordings [6], and word associations in collections of documents
[1,23]. Multilinearity means that a matrix of such a phenomenons measured effects can be factored into
low-rank matrices of (presumed) causes. The celebrated singular value decomposition ( SV D) [8] provides a
bilinear factoring of a data matrix M,
Uprdiag(sr1)VrqSVDr Mpq, r min(p,q) (1)
whereUandVare unitary orthogonal matrices whose columns give a linear basis forMs columns and rows,
respectively. For low-rank phenomena, rtrue min(p,q), implying a parsimonious explanation of the data.Sincertrue is often unknown, it is common to wastefully compute a large rapprox rtrueSV Dand estimate anappropriate smaller valuerempiricalfrom the distribution of singular values in s. All butrempiricalof the smallest
singular values in s are then zeroed to give a thin truncated SV D that closely approximates the data. This
forms the basis of a broad range of algorithms for data analysis, dimensionality reduction, compression,
noise-suppression, and extrapolation.
The S VD is usually computed by a batch O(pq2 +p2q + q3)time algorithm [8], meaning that all the datamust be processed at once, and S VDs of very large datasets are essentially unfeasible. Lanczos methods yield
thin S VDs in O(pqr2
)time [8], butrtrueshould be known in advance since Lanczos methods are known to beinaccurate for the smaller singular values [1]. A more pressing problem is that the S VDrequirescompletedata,
whereas in many experimental settings some parts of the measurement matrix may be missing, contaminated,
or otherwise untrusted. Consequently, a single missing value forces the modeler to discard an entire row or
column of the data matrix prior to the S VD. The missing value may be imputed from neighboring values, but
such imputations typically mislead the S VD away from the most parsimonious (low-rank) decompositions.
-
8/10/2019 Incremental_SVD_missingdata (1).pdf
4/14
2
We consider how an S VD may be updated by adding rows and/or columns of data, which may be missing
values and/or contaminated with correlated (colored) noise. The size of the data matrix need not be known:
The SV D is developed as the data comes in and handles missing values in a manner that minimizes rank.The resulting algorithms have better time and space complexity than full-data batch SV D methods and can
produce more informative results (more parsimonious factorings of incomplete data). In the case of dense
low-rank matrices, the time complexity is linear in the size and the rank of the data O(pqr)while thespace complexity is sublinearO((p + q)r).
2 Related work
SV D updating has a literature spread over three decades [5,4,1,10,7,23] and is generally based on Lanczos
methods, symmetric eigenvalue perturbations, or identities similar to equation 2 below. Zha and Simon [23]
use such an identity but their update is approximate and requires a dense SV D. Chandrasekaran et alia [7]
begin similarly but their update is limited to single vectors and is vulnerable to loss of orthogonality. Levy
and Lindenman [14] exploit the relationship between the QR-decomposition and the SV D to incrementally
compute the left singular vectors in O(pqr2)time; ifp,q,and rare known in advance and p q r, thenthe expected complexity falls to O(pqr). However, this is also vulnerable to loss of orthogonality and resultshave only been reported for matrices having a few hundred columns.
None of this literature contemplates missing or uncertain values, except insofar as they can be treated
as zeros (e.g., [1]), which is arguably incorrect. In batch- SV D contexts, missing values are usually handled
via subspace imputation, using an expectation-maximization-like procedure: Perform an S VD of all complete
columns, regress incomplete columns against theS VD to estimate missing values, then re-factor and re-impute
the completed data until a fixpoint is reached (e.g., [21]). This is extremely slow (quartic time) and only works
if very few values are missing. It has the further demerit that the imputation does not minimize effective rank.
Other heuristics simply fill missing values with row- or column-means [19].
In the special case where a matrix M is nearly dense, its normalized scatter matrix m,n.
= Mi,mMi,nimay be fully dense due to fill-in. In that case s eigenvectors areMs right singular vectors [13]. However,this method does not lead to the left singular vectors, and it often doesnt work at all because is frequently
incomplete as well, with undefined eigenvectors.
3 Updating an SVD
We begin with an existing rank-r SV D as in equation 1. We have a matrix Cpc whose columns containadditional multivariate measurements. LetL
.= U\C = UCbe the projection ofC onto the orthogonal basis
U, also known as its eigen-coding. Let H .= (IUU)C=CULto be the component ofC orthogonal
to the subspace spanned by U. (I is the identity matrix.) Finally, let J be an orthogonal basis ofH and let
K .= J\H = JHbe the projection ofConto the subspace orthogonal toU. For example,JK QRHcould be
a QR-decomposition ofH. Consider the following identity:
U J
diag(s) L0 K
V0
0 I
= U(IUU
)C/K
diag(s) UC0 K
V0
0 I
=
U diag(s)V C
=
M C
(2)
Like an S VD, the left and right matrices in the product are unitary and orthogonal. The middle matrix, which
we denoteQ, is diagonal with ac-column border. To update the S VD we must diagonalize Q. Let
U diag(s)V SVD Q (3)
-
8/10/2019 Incremental_SVD_missingdata (1).pdf
5/14
3
Fig. 1. A vector is decomposed into components within and orthogonal to an SV D-derived subspace. The
parallel component causes the singular vectors to be rotated (see figure2), while the orthogonal component
increases the rank of the S VD.
U U J U; s s; V
V0
0 I
V (4)
Then the updated S VD is
U diag(s)V=
U diag(s)V C
=
M C. (5)
The whole update procedure takes O((p + q)r2 +pc2) time1, spent mostly in the subspace rotations ofequation4. To add rows one simply swaps U for V and Ufor V .
In practice, some care must be taken to counter numerical error that may make J and U not quite orthog-
onal. We found that applying modified Gram-Schmidt orthogonalization to U when the inner product of its
first and last columns is more than some small away from zero makes the algorithm numerically robust. Amuch more efficient scheme will be developed below.
Fig. 2.Visualization of theS VD update in equation2. The quasi-diagonalQ matrix at left is diagonalized and
the subspaces are counter-rotated to preserve equality.
3.1 Automatic truncation
Definev .
=det
(K
K)
, which is the volume ofC that is orthogonal to U. Ifv