latent semantic indexing by singular value decomposition

26
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

Upload: clara-lester

Post on 17-Jan-2016

250 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

LATENT SEMANTIC INDEXING

BY SINGULAR VALUE DECOMPOSITION

Page 2: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

PROBLEMS IN LEXICAL MATCHING

Synonymy - widespread synonym occurances -decrease recall. Polysemy - retrieval of irrelevant documents - poor precision Noise - Boolean search on specific words - Retrieval o contently unrelated

documents

Page 3: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

MOTIVATION FOR LSI To find and fit a useful model of the

relationships between terms and documents. To find out what terms "really" are implied

by a query . LSI allow the user to search for concepts

rather than specific words. LSI can retrieve documents related to a

user's query even when the query and the documents do not share any common terms.

Page 4: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

EXAMPLE Q : “Light waves.” D1: “Particle and wave

models of light.” D2: “Surfing on the waves

under star lights.”

D3: “Electro-magnetic models for fotons.”

Page 5: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

HOW LSI WORKS? uses multidimensional vector space to place

all documents and terms. Each dimension in that space corresponds to

a concept existing in the collection. Thus underlying topics of the document is

encoded in a vector. Common related terms in a document and

query will pull document and query vector close to each other.

Page 6: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

DRAWBACK!

The complexity of the LSI model obtained from truncated SVD is costly.

Its execution efficiency lag far behind the execution efficiency of the simpler, Boolean models, especially on large data sets.

Page 7: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

SVD

The key to working with SVD of any rectangular matrix A is to consider AAT and ATA.

The columns of U, that is t by t, are eigenvectors of AAT,

The columns of V, that is d by d, are eigenvectors of ATA.

The singular values on the diagonal of S, that is t by d, are the positive square roots of the nonzero eigenvalues of both AAT and ATA.

Page 8: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

SVD

Eigenvalue-eigenvector factorization A = USVT - UUT=I

-VVT=I -S singular values

Page 9: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

SVD-PROPERTY Diagonals are ordered in magnitude: s1 >= s2 ....>= sr > sr+1

=...=sr=0. Truncated Ak is best approximation.

Page 10: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

COMPUTING SVD

T = AAT and D = ATA : Eigenvector and Eigenvalue computation for

T and D

Page 11: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

COMPUTING SVD(2)

Page 12: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

TRUNCATED-SVD Create a rank-k

approximation to A,

k < rA or k = rA ,

Ak = Uk Sk VTk

Page 13: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

TRUNCATED-SVD

Using truncated SVD, underlying latent structure is represented in reduced-k dimensional space.

Noise in word usage is eliminated,

Page 14: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

LSI-PROCEDURE Obtain term-document matrix. Compute the SVD. Truncate-SVD into reduced-k LSI space. -k-dimensional semantic structure -similarity on reduced-space: -term-term -term-document -document-document

Page 15: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

QUERY PROCESSING

Map the query to reduced k-space q’=qTUkS

-1k,

Retrieve documents or terms within a proximity.

-cosine -best m

Page 16: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

UPDATING

Folding-in d’=dTUkS

-1k

- similar to query projection

SVD re-computation

Page 17: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

EXAMPLE:COLLECTION

Label Course Title C1 Parallel Programming Languages Systems

C2 Parallel Processing for Noncommercial Applications

C3 Algorithm Design for Parallel Computers C4 Networks and Algorithms for Parallel Computation C5 Application of Computer GraphicsC6 Database Theory C7 Distributed Database Systems C8 Topics in Database Management Systems C9 Data Organization and Management C10 Network Theory

C11 Computer Organization  

Page 18: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

A VERSUS A2

Page 19: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

OBSERVATIONS

Lower entry values. Higher values. Negative Entries.

Page 20: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

MAPPING

-2,0

-1,5

-1,0

-0,5

0,0

0,5

0,0

0,2

0,4

0,6

0,8

1,0

1,2

1,4

1,6

1,8

2,0

Series1

parallel

comput

systems

algorithm

networkapplication

database

theory

management

organization

C1

C2

C3C4C5

C6

C7

C8

C11

C10

C9 •

•• •

• •

words

• courses

Page 21: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

EXAMPLE:QUERY AND NEW TERMS

Query:computer database organizations qT = [ 0 1 0 0 0 0 1 0 0 1 ]. Update: Label Course Title

C12 Parallel Programming for Scientific Computations C13 Data Structures for Parallel Programming

Page 22: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

QUERY

-2,0

-1,5

-1,0

-0,5

0,0

0,5

0,0

0,2

0,4

0,6

0,8

1,0

1,2

1,4

1,6

1,8

2,0

Series1

parallel

comput

systems

algorithm

networkapplication

database

theory

management

organizatio

n

C1

C2

C3C4C5

C6

C7

C8

C11

C10

C9 •

•• •

• •

words

• courses

rele

vanc

e sp

ace

Q

Page 23: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

COMPARISON WITH LEXICAL MATCHING

Page 24: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

FOLD-IN

-2,0

-1,5

-1,0

-0,5

0,0

0,5

0,0

0,2

0,4

0,6

0,8

1,0

1,2

1,4

1,6

1,8

2,0

Series1

parallel

comput

systems

algorithm

networkapplication

database

theory

management

organization

C1

C2

C3C4C5

C6

C7

C8

C11

C10

C9 •

•• •

• •

words

• courses

•C12•C13

programming--data

Page 25: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

RECOMPUTED SPACE

Page 26: LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION

SOME APPLICATIONS

Information Retrieval Information Filtering Relevance Feedback Cross-language retrieval