low-rank matrix completion via convex...
TRANSCRIPT
![Page 1: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/1.jpg)
Low-rank Matrix Completion via Convex Optimization
Ben RechtCenter for the Mathematics of Information
Caltech
![Page 2: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/2.jpg)
Recommender Systems
![Page 3: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/3.jpg)
Netflix Prize
• One million big ones!
• Given 100 million ratings on a scale of 1 to 5, predict 3 million ratings to highest accuracy
• 17770 total movies x 480189 total users• Over 8 billion total ratings
• How to fill in the blanks?
![Page 4: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/4.jpg)
Abstract Setup: Matrix Completion
• How do you fill in the missing data?
Xij known for black cellsXij unknown for white cells
Rows index moviesColumns index users
X =
X LR*
k x r r x nk x n
kn entries r(k+n) entries
=
![Page 5: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/5.jpg)
Low-rank Matrix Completion
• How do you fill in the missing data?
Xij known for black cellsXij unknown for white cells
X =
![Page 6: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/6.jpg)
G
K
ControllerDesign
Constraints involving the rank of the Hankel Operator, Matrix, or Singular Values
Model Reduction
SystemIdentification
Multitask Learning
EuclideanEmbedding
Rank of: Matrix of Classifiers
GramMatrix
RecommenderSystems
DataMatrix
![Page 7: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/7.jpg)
Affine Rank Minimization
• PROBLEM: Find the matrix of lowest rank that satisfies/approximates the underdetermined linear system
• NP-HARD:– Reduce to finding solutions to polynomial systems– Hard to approximate– Exact algorithms are awful
![Page 8: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/8.jpg)
Proposed Heuristic
• Proposed by Fazel (2002).• Nuclear norm is the “numerical rank” in numerical
analysis• The “trace heuristic” from controls if X is p.s.d.
Convex Relaxation:
Affine Rank Minimization:
![Page 9: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/9.jpg)
• Search for best linear combination of fewest atoms• “rank” = fewest atoms needed to describe the model
Parsimonious Models
atomsmodel weights
rank
![Page 10: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/10.jpg)
• 2x2 matrices• plotted in 3d
rank 1x2 + z2 + 2y2 = 1
Convex hull:
![Page 11: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/11.jpg)
• 2x2 matrices• plotted in 3d
• Projection onto x-zplane is l1 ball
![Page 12: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/12.jpg)
w1
w2
A(X)=b
![Page 13: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/13.jpg)
So how do we compute it? And when does it work?
• 2x2 matrices• plotted in 3d
• Not polyhedral…
![Page 14: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/14.jpg)
Equivalent Formulations
• Semidefinite embedding:
• Low rank parametrization:
![Page 15: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/15.jpg)
Computationally: Gradient Descent!
• “Method of multipliers”• Schedule for
controls the noise in the data• Same global minimum as nuclear norm• Dual certificate for the optimal solution
• When will this fail and when it might succeed?
![Page 16: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/16.jpg)
First theory result
• If m > c0 r(k+n-r)log(kn), the heuristic succeeds for most A
• Number of measurements c0 r(k+n-r) log(kn)
• Approach: Show that a random A
is nearly an isometry on the manifold of low-rank matrices.
• Stable to noise in measurement vector b and returns as good an answer as a truncated SVD of the true X.
constant intrinsic dimension
ambient dimension
Recht, Fazel, and Parrilo. 2007.
![Page 17: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/17.jpg)
Low-rank Matrix Completion
• How do you fill in the missing data?
Xij known for black cellsXij unknown for white cells
X =
![Page 18: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/18.jpg)
Which matrices?
• Any subset of entries that misses the (1,1) component tells you nothing!
• Still need to see the entire first row
• Want each entry to provide nearly the same amount of information
X =
X =
![Page 19: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/19.jpg)
Incoherence
• Let U be a subspace of Rn of dimension r and PU be the orthogonal projection onto U. Then the coherence of U (with respect to the standard basis ei ) is defined to be
• (U) ≥
1– e.g., span of r columns of the Fourier transform
• (U) ≤
n/r– e.g., any subspace that contains a standard basis element
• (U) = O(1)– sampled from the uniform distribution with r > log n
![Page 20: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/20.jpg)
Matrix Completion
• Suppose X is k x n (k≤
n) has rank r and has row and column spaces with incoherence bounded above by . Then the nuclear norm heuristic recovers X from most subsets of entries
with cardinality at least
• If, in addition, r ≤
-1 n1/5,
then entries suffice.
Candès and Recht. 2008
![Page 21: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/21.jpg)
Proof Tools
• Convex Analysis– KKT Conditions: Find dual certificate proving minimum
nuclear norm solution is the hidden low rank matrix– Compressed Sensing: Use ansatz for multiplier and bound
its norm
• Probability on Banach Spaces– Moment bounds for norms of matrix valued random
variables [Rudelson]– Decoupling [Bourgain-Tzafiri, de la Pena et al]: Indicators
variables can be treated as independent– Non-commutative Khintchine Inequality [Lust-Piquard]:
Tightly bound the operator norm in terms of the largest entry.
![Page 22: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/22.jpg)
… … … …Gradient descent on low-rank nuclear norm
parameterization
Mixture of hundreds of
models, including nuclear norm
![Page 23: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/23.jpg)
Parsimonious Modeling: A road map
• Open Problems in rank minimization: optimal bounds, noise performance, faster algorithms, more mining of connections with compressed sensing
• Expanding the parsimony catalog: dynamical systems, nonlinear models, tensors, completely positive matrices, Jordan Algebras, and beyond
• Automatic parsimonious programming: computational complexity of norms. algorithm and proof generation
• Broad applied impact: data mining time series in biology, medicine, social networks, and human computer interfaces
![Page 24: Low-rank Matrix Completion via Convex Optimizationarchive.dimacs.rutgers.edu/Workshops/UnifyingTheory/Slides/recht.pdf · • If m > c 0 r(k+n-r)log(kn), the heuristic succeeds for](https://reader035.vdocument.in/reader035/viewer/2022071019/5fd27463672bf13d685a0265/html5/thumbnails/24.jpg)
Acknowledgements
• See:
http://www.ist.caltech.edu/~brecht/publications.html
for all references
• Results developed in collaboration with Emmanuel Candès, John Doyle, Babak Hassibi, and Weiyu Xu at Caltech, Ali Rahimi at Intel Research, Harvey Cohen, Lawrence Recht, and John Whitin at Stanford, Maryam Fazel at U Washington, and Pablo Parrilo and the RealNose team at MIT.