low-rank matrix approximations in python by christian thurau pydata 2014

Low-rank matrix approximations with Python

Christian Thurau

Table of Contents

1 Intro

2 The Basics

3 Matrix approximation

4 Some methods

5 Matrix Factorization with Python

6 Example & Conclusion

2

For Starters...

Observations

• Data matrix factorization has become an important tool ininformation retrieval, data mining, and pattern recognition

• Nowadays, typical data matrices are HUGE

• Examples include:• Gene expression data and microarrays• Digital images• Term by document matrices• User ratings for movies, products, ...• Graph adjacency matrices

3

Matrix Factorization

• given a matrix

V

• determine matrices

W and H

• such that

V = WH or V ≈ WH

• characteristics such as entries, shape, rank of V ,W , and H willdepend on application context

4

The Basics

matrix factorization allows for:

• solving linear equations

• transforming data

• compressing data

matrix factorization facilitates subsequent processing in:

• information retrieval

• pattern recognition

• data mining

5

Low-rank Matrix Approximations

• Aapproximate V

V ≈ WH

• where

V ∈ Rm×n

W ∈ Rm×k

H ∈ Rk×n

• and

rank(W ) ≪ rank(V )

k ≪ min(m, n)

V

=

W H

6

Matrix Approximation

• If

V = WH

• then

vi ,j = wi ,∗h∗,j

=k∑

x=1

wi ,xhx ,j

V

=

W H

7

Matrix Approximation

• More importantly:

v∗,j = Wh∗,j

=k∑

x=1

w∗,xhx ,j

• therefore

W ↔ ”basis” matrix

H ↔ coefficient matrix

V

=

W H

= + +

8

On Matrix Factorization Methods

• matrix factorization ↔ data transformation

• matrix rank reduction ↔ data compression

• Common form: V = WH• Broad range of methods:

• K-means clustering• SVD/PCA• Non-negative Matrix Factorization• Archetypal Analysis• Binary matrix factorization• CUR decomposition• ...

• Each method yields a unique view on data . . .

• . . . and is suited for different tasks

9

K-means Clustering1

• Baseline clustering method

• Constrained quadradic optimization problem:

minW ,H

∥V − WH∥2

s.t. H = [0; 1],∑k

hk,i = 1

• Find W ,H using expectation maximization

• Optimal k-means partitioning is np-hard

• Goal: group similar data points

• Interesting: K-means clustering is matrix factorization

1J.B. MacQueen, Some Methods for classification and Analysis of MultivariateObservations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967

10

K-means Clustering is Matrix Factorization!

x1,1 x1,2 x1,3 . . . x1,nx2,1 x2,2 x2,3 . . . x2,nx3,1 x3,2 x3,3 . . . x3,n...

......

. . ....

xm,1 xm,2 xm,3 . . . xm,n

b1,1 b1,2 b1,3b2,1 b2,2 b2,3b3,1 b3,2 b2,3...

......

bn,1 bn,2 bn,3

0 1 1 . . . 01 0 0 . . . 00 0 0 . . . 1

• i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, theproduct

XBA = MA

realizes an assignment

xi → mj , where mj = Xbj

11

Example: K-means

≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 =

• Similar images are grouped into k groups

• Approximate data by mapping each data point onto the mean of acluster regions

12

Python Matrix Factorization Toolbox (PyMF)2

• Started in 2010 at Fraunhofer IAIS/University of Bonn

• Vast number of different methods!

• Supports hdf5/h5py and sparse matrices

How to factorize a data matrix V :

>>>import pymf

>>>import numpy as np

>>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]])

>>>mdl = pymf.kmeans.Kmeans(data, num_bases=2)

>>>mdl.factorize(niter=10) # optimize for WH>>>V_approx = np.dot(mdl.W, mdl.H) # V = WH

2http://github.com/cthurau/pymf13

Python Matrix Factorization Toolbox (PyMF)2

• Restarted development a few weeks back ;)

• Looking for contributors!

How to map data onto W :

>>>import pymf


>>>test_data = np.array([[1.0], [0.3]])

>>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2)

>>>mdl_test.W = mdl.W # mdl.W -> existing basis W>>>mdl_test.factorize(compute_w=False)

>>>test_datx_approx = np.dot(mdl.W, mdl_test.H)

2http://github.com/cthurau/pymf14

PCA

Principal Component Analysis (PCA)3

• SVD/PCA are baseline matrix factorization methods

• Optimize:

minW ,H

∥V − WH∥2

s.t. W TW = I

• Restrict W to singular vectors of V (orthogonal matrix)

• Can (usually does) violate non-negativity

• Goal: best possible matrix approximation for a given k

• Great for compression or filtering out noise!

3K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space,Philosophical Magazine, 1901.

15

Example PCA

>>>from pymf.pca import PCA


>>>mdl = PCA(data, num_bases=2)

>>>mdl.factorize()

>>>V_approx = np.dot(mdl.W, mdl.H)

• Usage for data analysis questionable

• Basis vectors usually not interpretable

V

≈

Vapprox

W = . . .

16

Non-negative Matrix Factorization4

• For V ≥ 0 constrained quadradic optimization problem:

minW ,H

∥V − WH∥2

s.t. W ≥ 0

H ≥ 0

• a globally optimal solution provably exists; algorithms guaranteed tofind it remain elusive; exact NMF is NP hard

• Often W converges to partial representations

• Active area of research

• Goal: reconstruct data by independent parts

4D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative MatrixFactorization, Nature, 401(6755), 1999

17

Example NMF

>>>from pymf.nmf import NMF


>>>mdl = NMF(data, num_bases=2, iter=50)

>>>mdl.factorize()


• Additive combination of parts

• Interesting options for data analysis

V

≈

Vapprox

W = . . .

18

Archetypal Analysis5

• Convexity constrained quadratic optmization problem:

minW ,H

∥V − VWH∥2

s.t. wl ,i ≥ 0,∑l

wl ,i = 1

hk,i ≥ 0,∑k

hk,i = 1

• Reconstruct data by its archetypes, i.e. convex combinations of polaropposites

• Yields novel and intuitive insights into data

• Great for interpretable data representations!

• O(n2), but: efficient approximations for large data exist5A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994

19

Example Archetypal Analysis

>>>from pymf.aa import AA


>>>mdl = AA(data, num_bases=2, iter=50)

>>>mdl.factorize()


• Existent data points as basis vectors

• Convex combination allows aprobablilist interpretation

V

≈

Vapprox

W = . . .

20

Method Summary

• Common form: V = WH (or V = VWH)

W constraint H constraint Outcome

PCA - - compressed VK-means - H = [0; 1],

∑k hk,i = 1 groups

NMF W ≥ 0 H ≥ 0 partsAA W ≥ 0,

∑l wl,i = 1 H ≥ 0,

∑k hk,i = 1 opposites

• Doesn’t only work for images ;)

• More complex constraints usually result in more complex solvers

• Active area of research deals with approximations for large data

21

Large matrices: PyMF and h5py

>>> import h5py

>>> import numpy as np

>>> from pymf.sivm import SIVM # uses [6]

>>> file = h5py.File(’myfile.hdf5’, ’w’)

>>> file[’dataset’] = np.random.random((100,1000))

>>> file[’W’] = np.random.random((100,10))

>>> file[’H’] = np.random.random((10,1000))

>>> sivm_mdl = SIVM(file[’dataset’], num_bases=10)

>>> sivm_mdl.W = file[’W’]

>>> sivm_mdl.H = file[’H’]

>>> sivm_mdl.factorize()

6Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptiveweb scale matrix factorization”, CIKM’2010

22

7Science, 2010: Vol. 330

Take Home Message

• Most clustering, and data analysis methods are matrixapproximations

• Imposed constraints shape the factorization

• Imposed constraints yield different views on data

• One of the most effective and versatile tools for data exploration!

• Python implementation → http://github.com/cthurau/pymf

24

Thank you for your attention!

[email protected]

low-rank matrix approximations in python by christian thurau pydata 2014

Data & Analytics

data matrix v

np data

matrices w

data analysis v vapprox

h v wh2

w existing basis w mdl

h v vwh2

rank of v