numerical linear algebra methods for data miningsaad/pdf/wm_oct31_2014.pdf · supervised learning:...

60
Numerical Linear algebra methods for data mining Yousef Saad Department of Computer Science and Engineering University of Minnesota William & Mary Oct. 31, 2014

Upload: others

Post on 20-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Numerical Linear algebra methods for datamining

Yousef SaadDepartment of Computer Science

and Engineering

University of Minnesota

William & MaryOct. 31, 2014

Page 2: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Introduction: a few factoids

ä Data is growing exponentially at an “alarming” rate:

• 90% of data in world today was created in last two years

• Every day, 2.3 Million terabytes (2.3×1018 bytes) created

ä Mixed blessing: Opportunities & big challenges.

ä Trend is re-shaping & energizing many research areas ...

ä ... including my own: numerical linear algebra

W & M 10-31-2014 p. 2

Page 3: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Introduction: What is data mining?

Set of methods and tools to extract meaningful informationor patterns from data. Broad area : data analysis, machinelearning, pattern recognition, information retrieval, ...

ä Tools used: linear algebra; Statistics; Graph theory; Approx-imation theory; Optimization; ...

ä This talk: brief introduction – emphasis on linear algebraviewpoint

ä + our initial work on materials.

ä Focus on “Dimension reduction methods”

W & M 10-31-2014 p. 3

Page 4: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Major tool of Data Mining: Dimension reduction

ä Goal is not as much to reduce size (& cost) but to:

• Reduce noise and redundancy in data before performing atask [e.g., classification as in digit/face recognition]

• Discover important ‘features’ or ‘paramaters’

The problem: Given: X = [x1, · · · , xn] ∈ Rm×n, find a

low-dimens. representation Y = [y1, · · · , yn] ∈ Rd×n of X

ä Achieved by a mapping Φ : x ∈ Rm −→ y ∈ Rd so:

φ(xi) = yi, i = 1, · · · , n

W & M 10-31-2014 p. 4

Page 5: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

m

n

X

Y

x

y

i

id

n

ä Φ may be linear : yi = W>xi , i.e., Y = W>X , ..

ä ... or nonlinear (implicit).

ä Mapping Φ required to: Preserve proximity? Maximizevariance? Preserve a certain graph?

W & M 10-31-2014 p. 5

Page 6: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Example: Principal Component Analysis (PCA)

In Principal Component Analysis W is computed to maxi-mize variance of projected data:

maxW∈Rm×d;W>W=I

d∑i=1

∥∥∥∥∥∥yi − 1

n

n∑j=1

yj

∥∥∥∥∥∥2

2

, yi = W>xi.

ä Leads to maximizing

Tr[W>(X − µe>)(X − µe>)>W

], µ = 1

nΣni=1xi

ä SolutionW = { dominant eigenvectors } of the covariancematrix≡ Set of left singular vectors of X = X − µe>

W & M 10-31-2014 p. 6

Page 7: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Unsupervised learning

“Unsupervised learning” : meth-ods that do not exploit known labelsä Example of digits: perform a 2-Dprojectionä Images of same digit tend tocluster (more or less)ä Such 2-D representations arepopular for visualizationä Can also try to find natural clus-ters in data, e.g., in materialsä Basic clusterning technique: K-means

−6 −4 −2 0 2 4 6 8−5

−4

−3

−2

−1

0

1

2

3

4

5PCA − digits : 5 −− 7

567

SuperhardPhotovoltaic

Superconductors

Catalytic

Ferromagnetic

Thermo−electricMulti−ferroics

W & M 10-31-2014 p. 7

Page 8: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Example: Digit images (a random sample of 30)

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

5 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

205 10 15

10

20

W & M 10-31-2014 p. 8

Page 9: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

2-D ’reductions’:

−10 −5 0 5 10−6

−4

−2

0

2

4

6PCA − digits : 0 −− 4

01234

−0.2 −0.1 0 0.1 0.2−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15LLE − digits : 0 −− 4

0.07 0.071 0.072 0.073 0.074−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2K−PCA − digits : 0 −− 4

01234

−5.4341 −5.4341 −5.4341 −5.4341 −5.4341

x 10−3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1ONPP − digits : 0 −− 4

W & M 10-31-2014 p. 9

Page 10: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Supervised learning: classification

Problem: Given labels(say “A” and “B”) for eachitem of a given set, find amechanism to classify anunlabelled item into eitherthe “A” or the “B” class.

?

?ä Many applications.

ä Example: distinguish SPAM and non-SPAM messages

ä Can be extended to more than 2 classes.

W & M 10-31-2014 p. 10

Page 11: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Supervised learning: classification

ä Best illustration: written digits recognition example

Given: a set oflabeled samples(training set), andan (unlabeled) testimage.Problem: find

label of test image

��������

������������

������ ��

Training data Test data

Dim

ensio

n red

uctio

n

Dig

it 0

Dig

fit

1

Dig

it 2

Dig

it 9

Dig

it ?

?

Dig

it 0

Dig

fit

1

Dig

it 2

Dig

it 9

Dig

it ?

?

ä Roughly speaking: we seek dimension reduction so thatrecognition is ‘more effective’ in low-dim. space

W & M 10-31-2014 p. 11

Page 12: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Basic method: K-nearest neighbors (KNN) classification

ä Idea of a voting system: getdistances between test sampleand training samples

ä Get the k nearest neighbors(here k = 8)

ä Predominant class amongthese k items is assigned to thetest sample (“∗” here)

?kk

k k

k

n

n

nn

n

nnn

k

t

t

t

tt

tt

t

t

t

k

W & M 10-31-2014 p. 12

Page 13: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Supervised learning: Linear classification

Linear classifiers: Find a hy-perplane which best separatesthe data in classes A and B.

Linear

classifier

ä Note: The world in non-linear. Often this is combined withKernels – amounts to changing the inner product

W & M 10-31-2014 p. 13

Page 14: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Linear classifiers and Fisher’s LDA

ä Idea for two classes: Find a hyperplane which best sepa-rates the data in classes A and B.

Linear

classifier

W & M 10-31-2014 p. 14

Page 15: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

A harder case:

−2 −1 0 1 2 3 4 5−4

−3

−2

−1

0

1

2

3

4Spectral Bisection (PDDP)

ä Use kernels to transform

Page 16: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

−0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

Projection with Kernels −− σ2 = 2.7463

Transformed data with a Gaussian Kernel

W & M 10-31-2014 p. 16

Page 17: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Fisher’s Linear Discriminant Analysis (LDA)

Goal: Use label information to define a good projector, i.e.,one that can ‘discriminate’ well between given classes

ä Define “between scatter”: a measure of how well separatedtwo distinct classes are.

ä Define “within scatter”: a measure of how well clustereditems of the same class are.

ä Objective: make “between scatter” measure large and “withinscatter” small.

Idea: Find projector that maximizes the ratio of the “betweenscatter” measure over “within scatter” measure

W & M 10-31-2014 p. 17

Page 18: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Define:

SB =c∑

k=1

nk(µ(k) − µ)(µ(k) − µ)T ,

SW =c∑

k=1

∑xi ∈Xk

(xi − µ(k))(xi − µ(k))T

Where:

• µ = mean (X)• µ(k) = mean (Xk)• Xk = k-th class• nk = |Xk|

H

GLOBAL CENTROID

CLUSTER CENTROIDS

H

X3

1X

µ

X2

W & M 10-31-2014 p. 18

Page 19: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ä Consider 2ndmoments for a vec-tor a:

aTSBa =c∑i=1

nk |aT(µ(k) − µ)|2,

aTSWa =c∑

k=1

∑xi ∈ Xk

|aT(xi − µ(k))|2

ä aTSBa ≡ weighted variance of projected µj’s

ä aTSWa ≡ w. sum of variances of projected classes Xj’s

ä LDA projects the data so as to maxi-mize the ratio of these two numbers:

maxa

aTSBa

aTSWa

ä Optimal a = eigenvector associated with the largest eigen-value of: SBui = λiSWui .

W & M 10-31-2014 p. 19

Page 20: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

LDA – Extension to arbitrary dimensions

ä Criterion: maximize the ratioof two traces:

Tr [UTSBU ]

Tr [UTSWU ]

ä Constraint: UTU = I (orthogonal projector).

ä Reduced dimension data: Y = UTX.

Common viewpoint: hard to maximize, therefore ...

ä ... alternative: Solve insteadthe (‘easier’) problem:

maxUTSWU=I

Tr [UTSBU ]

ä Solution: largest eigenvectors of SBui = λiSWui .

W & M 10-31-2014 p. 20

Page 21: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

LDA – Extension to arbitrary dimensions (cont.)

ä Consider the originalproblem:

maxU ∈ Rn×p, UTU=I

Tr [UTAU ]

Tr [UTBU ]

Let A,B be symmetric & assume that B is semi-positivedefinite with rank(B) > n− p. Then Tr [UTAU ]/Tr [UTBU ]has a finite maximum value ρ∗. The maximum is reached for acertain U∗ that is unique up to unitary transforms of columns.

ä Considerthe function:

f(ρ) = maxV TV=I

Tr [V T(A− ρB)V ]

ä Call V (ρ) the maximizer for an arbitrary given ρ.

ä Note: V (ρ) = Set of eigenvectors - not unique

W & M 10-31-2014 p. 21

Page 22: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ä Define G(ρ) ≡ A− ρB and its n eigenvalues:

µ1(ρ) ≥ µ2(ρ) ≥ · · · ≥ µn(ρ) .

ä Clearly:

f(ρ) = µ1(ρ) + µ2(ρ) + · · ·+ µp(ρ) .

ä Can express this differently. Define eigenprojector:

P (ρ) = V (ρ)V (ρ)T

ä Then:f(ρ) = Tr [V (ρ)TG(ρ)V (ρ)]

= Tr [G(ρ)V (ρ)V (ρ)T ]

= Tr [G(ρ)P (ρ)].

W & M 10-31-2014 p. 22

Page 23: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ä Recall [e.g.Kato ’65] that:

P (ρ) =−1

2πi

∫Γ

(G(ρ)− zI)−1 dz

Γ is a smooth curve containing the p eivenvalues of interest

ä Hence: f(ρ) =−1

2πiTr∫

ΓG(ρ)(G(ρ)− zI)−1 dz = ...

=−1

2πiTr∫

Γz(G(ρ)− zI)−1 dz

ä With this, can prove :

1. f is a non-increasing function of ρ;2. f(ρ) = 0 iff ρ = ρ∗;3. f ′(ρ) = −Tr [V (ρ)TBV (ρ)]

W & M 10-31-2014 p. 23

Page 24: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Can now use Newton’s method.

ρnew = ρ−Tr [V (ρ)T(A− ρB)V (ρ)]

−Tr [V (ρ)TBV (ρ)]=

Tr [V (ρ)TAV (ρ)]

Tr [V (ρ)TBV (ρ)]

ä Newton’s method to find the zero of f ≡ a fixed point

iteration with g(ρ) =Tr [V T(ρ)AV (ρ)]

Tr [V T(ρ)BV (ρ)],

ä Idea: Compute V (ρ) by a Lanczos-type procedure

ä Note: Standard problem - [not generalized]→ inexpensive!

ä See T. Ngo, M. Bellalij, and Y.S. 2010 for details

W & M 10-31-2014 p. 24

Page 25: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

GRAPH-BASED TECHNIQUES

Page 26: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Graph-based methods

ä Start with a graph of data. e.g.: graphof k nearest neighbors (k-NN graph)Want: Perform a projection which pre-

serves the graph in some sense

ä Define a graph Laplacean:

L = D −W

x

xj

i

yi

yj

e.g.,: wij =

{1 if j ∈ Adj(i)0 else D = diag

dii =∑j 6=i

wij

with Adj(i) = neighborhood of i (excluding i)

W & M 10-31-2014 p. 26

Page 27: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Example: The Laplacean eigenmaps approach

Laplacean Eigenmaps [Belkin-Niyogi ’01] *minimizes*

F(Y ) =n∑

i,j=1

wij‖yi − yj‖2 subject to Y DY > = I

Motivation: if ‖xi − xj‖ is small(orig. data), we want ‖yi − yj‖ to bealso small (low-Dim. data)ä Original data used indirectlythrough its graphä Leads to n× n sparse eigenvalueproblem [In ‘sample’ space]

x

xj

i

yi

yj

W & M 10-31-2014 p. 27

Page 28: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Locally Linear Embedding (Roweis-Saul-00)

ä Very similar to Eigenmaps - but ...

ä ... Graph Laplacean is replaced by an ‘affinity’ graph

Graph: Each xi written as a convexcombination of its k nearest neighbors:xi ≈ Σwijxj,

∑j∈Adj(i)wij = 1

ä Optimal weights computed(’local calculation’) by minimizing‖xi − Σwijxj‖ for i = 1, · · · , n

x

xj

i

ä Mapped data (Y ) computed by minimizing∑‖yi − Σwijyj‖2

W & M 10-31-2014 p. 28

Page 29: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Implicit vs explicit mappings

ä In PCA the mapping Φ from high-dimensional space (Rm)to low-dimensional space (Rd) is explicitly known:

y = Φ(x) ≡ V Tx

ä In Eigenmaps and LLE we only know

yi = φ(xi), i = 1, · · · , n

ä Mapping φ is complex, i.e.,

ä Difficult to get φ(x) for an arbitrary x not in the sample.

ä Inconvenient for classification

ä “The out-of-sample extension” problem

W & M 10-31-2014 p. 29

Page 30: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ONPP (Kokiopoulou and YS ’05)

ä Orthogonal Neighborhood Preserving Projections

ä A linear (orthogonoal) version of LLE obtained by writing Yin the form Y = V >X

ä Same graph as LLE. Objective: preserve the affinity graph(as in LEE) *but* with the constraint Y = V >X

ä Problem solved to obtain mapping:

minV

Tr[V >X(I −W>)(I −W )X>V

]s.t. V TV = I

ä In LLE replace V >X by Y

W & M 10-31-2014 p. 30

Page 31: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Face Recognition – background

Problem: We are given a database of images: [arrays of pixelvalues]. And a test (new) image.

W & M 10-31-2014 p. 31

Page 32: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Face Recognition – background

Problem: We are given a database of images: [arrays of pixelvalues]. And a test (new) image.

↖ ↑ ↗

Question: Does this new image correspond to one of thosein the database?

W & M 10-31-2014 p. 32

Page 33: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Example: Eigenfaces [Turk-Pentland, ’91]

ä Idea identical with the one we saw for digits:

– Consider each picture as a (1-D) column of all pixels– Put together into an arrayA of size # pixels×# images.

. . . =⇒ . . .

︸ ︷︷ ︸A

– Do an SVD ofA and perform comparison with any test imagein low-dim. space

W & M 10-31-2014 p. 33

Page 34: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Graph-based methods in a supervised setting

Graph-based methods can be adapted to supervised mode.Idea: Build G so that nodes in the same class are neighbors.If c = # classes, G consists of c cliques.

ä Weight matrixW =block-diagonalä Note: rank(W ) = n− c.ä As before, graph Laplacean:

Lc = D −WW =

W1

W2. . .

Wc

ä Can be used for ONPP and other graph based methods

ä Improvement: add repulsion Laplacean [Kokiopoulou, YS09]

W & M 10-31-2014 p. 34

Page 35: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Class 1 Class 2

Class 3

Leads to eigenvalue problemwith matrix:

Lc − ρLR

• Lc = class-Laplacean,• LR = repulsion Laplacean,• ρ = parameter

Test: ORL 40 subjects, 10 sample images each – example:

# of pixels : 112× 92; TOT. # images : 400

W & M 10-31-2014 p. 35

Page 36: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

10 20 30 40 50 60 70 80 90 1000.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98ORL −− TrainPer−Class=5

onpppcaolpp−Rlaplacefisheronpp−Rolpp

ä Observation: some values of ρ yield better results thanusing the optimum ρ obtained from maximizing trace ratio

W & M 10-31-2014 p. 36

Page 37: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

LINEAR ALGEBRA METHODS: EXAMPLES

Page 38: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

IR: Use of the Lanczos algorithm (J. Chen, YS ’09)

ä Lanczos algorithm = Projection method on Krylov subspaceSpan{v,Av, · · · , Am−1v}

ä Can get singular vectors with Lanczos, & use them in LSI

ä Better: Use the Lanczos vectors directly for the projection

ä K. Blom and A. Ruhe [SIMAX, vol. 26, 2005] perform aLanczos run for each query [expensive].

ä Proposed: One Lanczos run- random initial vector. Thenuse Lanczos vectors in place of singular vectors.

ä In short: Results comparable to those of SVD at a muchlower cost.

W & M 10-31-2014 p. 38

Page 39: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Tests: IR

Informationretrievaldatasets

# Terms # Docs # queries sparsityMED 7,014 1,033 30 0.735CRAN 3,763 1,398 225 1.412

Pre

proc

essi

ngtim

es

Med dataset.

0 50 100 150 200 250 3000

10

20

30

40

50

60

70

80

90Med

iterations

prep

roce

ssin

g tim

e

lanczostsvd

Cran dataset.

0 50 100 150 200 250 3000

10

20

30

40

50

60Cran

iterations

prep

roce

ssin

g tim

e

lanczostsvd

W & M 10-31-2014 p. 39

Page 40: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Average retrieval precision

Med dataset

0 50 100 150 200 250 3000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Med

iterations

aver

age

prec

isio

n

lanczostsvdlanczos−rftsvd−rf

Cran dataset

0 100 200 300 400 500 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Cran

iterations

aver

age

prec

isio

n

lanczostsvdlanczos−rftsvd−rf

Retrieval precision comparisons

W & M 10-31-2014 p. 40

Page 41: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Updating the SVD (E. Vecharynski and YS’13)

ä In applications, data matrix X often updated

ä Example: Information Retrieval (IR), can add documents,add terms, change weights, ..

Problem

Given the partial SVD of X, how to get a partial SVD of Xnew

ä Will illustrate only with update of the form Xnew = [X,D](documents added in IR)

W & M 10-31-2014 p. 41

Page 42: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Updating the SVD: Zha-Simon algorithm

ä Assume A≈UkΣkVTk and AD = [A,D] , D ∈ Rm×p

ä Compute Dk = (I − UkUTk )D and its QR factorization:

[Up, R] = qr(Dk, 0), R ∈ Rp×p, Up ∈ Rm×p

Note: AD≈[Uk, Up]HD

[Vk 00 Ip

]T; HD ≡

[Σk U

TkD

0 R

]ä Zha–Simon (’99): Compute the SVD of HD & get approxi-mate SVD from above equation

ä It turns out this is a Rayleigh-Ritz projection method for theSVD [E. Vecharynski & YS 2013]

ä Can show optimality properties as a result

W & M 10-31-2014 p. 42

Page 43: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Updating the SVD

ä When the number of updates is large this becomes costly.

ä Idea: Replace Up by a low dimensional approximation:

ä Use U of the form U = [Uk, Zl] instead of U = [Uk, Up]

ä Zl must capture the range of Dk = (I − UkUTk )D

ä Simplest idea : best rank–l approximation using the SVD.

ä Can also use Lanczos vectors from the Golub-Kahan-Lanczosalgorithm.

W & M 10-31-2014 p. 43

Page 44: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

An example

ä LSI - with MEDLINE collection: m = 7, 014 (terms), n =1, 033 (docs), k = 75 (dimension), t = 533 (initial # docs),nq = 30 (queries)

ä Adding blocks of 25 docs at a time

ä The number of singular triplets of (I−UkUTk )D using SVD

projection (“SV”) is 2.

ä For GKL approach (“GKL”) 3 GKL vectors are used

ä These two methods are compared to Zha-Simon (“ZS”).

ä We show average precision then time

W & M 10-31-2014 p. 44

Page 45: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

500 600 700 800 900 1000 11000.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

Avera

ge p

recis

ion

Number of documents

Retrieval accuracy, p = 25

ZS

SV

GKL

ä Experiments show: gain in accuracy is rather consistent

W & M 10-31-2014 p. 45

Page 46: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

500 600 700 800 900 1000 11000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Tim

e (

se

c.)

Updating time, p = 25

Number of documents

ZS

SV

GKL

ä Times can be significantly better for large sets

W & M 10-31-2014 p. 46

Page 47: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

APPLICATION TO MATERIALS

Page 48: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Data mining for materials: Materials Informatics

ä Huge potential in exploiting two trends:

1 Improvements in efficiency and capabilities in computa-tional methods for materials

2 Recent progress in data mining techniques

ä Current practice: “One student, one alloy, one PhD” [seespecial MRS issue on materials informatics]→ Slow ..

ä Data Mining: can help speed-up process, e.g., by exploringin smarter ways

Issue 1: Who will do the work? Few researchers are familiarwith both worlds

W & M 10-31-2014 p. 48

Page 49: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Issue 2: databases, and more generally sharing, not too com-mon in materials

The inherently fragmented and multidisciplinary nature ofthe materials community poses barriers to establishingthe required networks for sharing results and information.One of the largest challenges will be encouraging scien-tists to think of themselves not as individual researchersbut as part of a powerful network collectively analyzingand using data generated by the larger community.These barriers must be overcome.

NSTC report to the White House, June 2011.

ä Materials genome initiative [NSF]

W & M 10-31-2014 p. 49

Page 50: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Unsupervised learning

ä 1970s: Unsupervisedlearning “by hand”: Findcoordinates that will clus-ter materials according tostructureä 2-D projection fromphysical knowledgeä ‘Anomaly Detection’:helped find that compoundCu F does not exist

see: J. R. Chelikowsky, J. C. Phillips, Phys Rev. B 19 (1978).

W & M 10-31-2014 p. 50

Page 51: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Question: Can modern data mining achieve a similar dia-grammatic separation of structures?

ä Should use only information from the two constituent atoms

ä Experiment: 67 binary ‘octets’.

ä Use PCA – exploit only data from 2 constituent atoms:

1. Number of valence electrons;

2. Ionization energies of the s-states of the ion core;

3. Ionization energies of the p-states of the ion core;

4. Radii for the s-states as determined from model potentials;

5. Radii for the p-states as determined from model potentials.

W & M 10-31-2014 p. 51

Page 52: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ä Result:

−2 −1 0 1 2 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

BN

SiC

MgS

ZnO

MgSe

CuCl

MgTe

CuBr

CuICdTe

AgI

ZWDRZWWR

W & M 10-31-2014 p. 52

Page 53: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Supervised learning: classification

Problem: classify an unknown binary compound into itscrystal structure class

ä 55 compounds, 6 crystal structure classesä “leave-one-out” experiment

Case 1: Use features 1:5 for atom A and 2:5 for atom B. Noscaling is applied.

Case 2: Features 2:5 from each atom + scale features 2 to 4by square root of # valence electrons (feature 1)

Case 3: Features 1:5 for atom A and 2:5 for atom B. Scalefeatures 2 and 3 by square root of # valence electrons.

W & M 10-31-2014 p. 53

Page 54: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Three methods tested

1. PCA classification. Project and do identification in space ofreduced dimension (Euclidean distance in low-dim space).

2. KNN K-nearest neighbor classification –

3. Orthogonal Neighborhood Preserving Projection (ONPP) - agraph based method - [see Kokiopoulou, YS, 2005]

Recognition rates for 3different methods usingdifferent features

Case KNN ONPP PCACase 1 0.909 0.945 0.945Case 2 0.945 0.945 1.000Case 3 0.945 0.945 0.982

W & M 10-31-2014 p. 54

Page 55: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Recent work

ä Some data is becoming available

W & M 10-31-2014 p. 55

Page 56: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Recent work

ä Exploit Band-sctructues - in thesame way we useimages..ä For now we doclustering.

W & M 10-31-2014 p. 56

Page 57: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ä Work in progressä 3-way clusteringobtained with dim. re-duction + k-means→ä Working on unrav-eling the info & explor-ing classification withthe data

−30 −20 −10 0 10 20 30−30

−25

−20

−15

−10

−5

0

5

10

15

20

W & M 10-31-2014 p. 57

Page 58: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

Conclusion

ä Many, interesting new matrix problems in areas that involvethe effective mining of data

ä Among the most pressing issues is that of reducing compu-tational cost - [SVD, SDP, ..., too costly]

ä Many online resources available

ä Huge potential in areas like materials science though inertiahas to be overcome

ä On the + side: materials genome project is starting to ener-gize the field

ä To a researcher in computational linear algebra : big tide ofchange on types or problems, algorithms, frameworks, culture,...

Page 59: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

ä But change should be welcome

ä In the words of “Who Moved My Cheese?” [ Spencer John-son, 2002]:

“If you do not change, you can become extinct !”

W & M 10-31-2014 p. 59

Page 60: Numerical Linear algebra methods for data miningsaad/PDF/WM_Oct31_2014.pdf · Supervised learning: Linear classification Linear classifiers: Find a hy-perplane which best separates

“If you do not change, you can become extinct !”

“The quicker you let go of old cheese, the sooner you find newcheese.”

Thank you !

W & M 10-31-2014 p. 60