jerome r. bellegarda
DESCRIPTION
Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling. Jerome R. Bellegarda. Outline. Introduction LSM Applications Conclusions. Introduction. LSA in IR: Words of queries and documents Recall and precision - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/1.jpg)
1
Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal
Continuous Parameter Modeling
Jerome R. Bellegarda
![Page 2: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/2.jpg)
2
Outline
• Introduction
• LSM
• Applications
• Conclusions
![Page 3: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/3.jpg)
3
Introduction
• LSA in IR:– Words of queries and documents– Recall and precision
• Assumption: There is some underlying latent semantic structure in the data– Latent structure is conveyed by correlation patterns– Documents: bag-of-words model
• LSA improves separability among different topics
![Page 4: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/4.jpg)
4
Introduction
![Page 5: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/5.jpg)
5
Introduction
• Success of LSA:– Word clustering– Document clustering– Language modeling– Automated call routing– Semantic Inference for spoken interface control
• These solutions all leverage LSA’s ability to expose global relationships in context and meaning
![Page 6: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/6.jpg)
6
Introduction
• Three unique factors for LSA:– The mapping of discrete entries– The dimensionality reduction– The intrinsically global outlook
• The change of terminology to latent semantic mapping (LSM) to convey increased reliance on the general properties
![Page 7: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/7.jpg)
7
Latent Semantic Mapping
• LSA defines a mapping between the discrete sets– M: an inventory of M individual units, such as words– N: an collection of N meaningful compositions of units,
such as documents– L: a continuous vector space
– ri: unit in M
– cj: composition in N
![Page 8: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/8.jpg)
8
Feature Extraction
• Construction of a matrix W of co-occurrences between units and compositions
• The cell of W:
,,
,
(1 )
: the number of times occurs in
: the total number of units present in
: the normalized entropy of in the collection
i ji j i
j
i j i j
j j
i i
w
r c
c
r N
![Page 9: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/9.jpg)
9
Feature Extraction
• The entropy of ri:
• Value of Entropy Close to 0 means that the unit is present only in a few specific compositions.
• The global weight is therefore a measure of the indexing power of the unit ri
, ,
1
,
, ,
1log
log
0 1 with equality if and only if and
Ni j i j
ij i i
i i jj
i i j i i j i
N
N
1 i
![Page 10: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/10.jpg)
10
Singular Value Decomposition
• The MxN unit-composition matrix W defines two vector representations for the units and the compositions
• ri: a row factor of dimension N
• cj: a column factor of dimension M
• Unpractical:– M,N can be extremely large
– Vector ri, cj are typically sparse
– Two spaces are distinct from each other
![Page 11: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/11.jpg)
11
Singular Value Decomposition
• Employ SVD:• U: MxR left singular matrix with row vectors u i
• S: RxR diagonal matrix of singular values
• V: NxR right singular matrix
with row vector vj • U, V are column-orthonormal
– UTU=VTV=IR
• R<min(M, N)
ˆ TW W USV
1 2 ... 0Rs s s
![Page 12: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/12.jpg)
12
Singular Value Decomposition
![Page 13: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/13.jpg)
13
Singular Value Decomposition
• captures the major structural associations in and ignores higher order effects
• The closeness of vector in L:– Unit-unit comparison– Composition-composition comparison– Unit-Composition comparison
WW
![Page 14: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/14.jpg)
14
Closeness Measure
• WWT: co-occurrences between units• WTW: co-occurrences between compositions
• ri, rj: units which have similar pattern of occurrence across the composition
• ci, cj: compositions which have similar pattern of occurrence across the unit
![Page 15: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/15.jpg)
15
Closeness Measure
• Unit-Unit Comparisons:
• Cosine measure:
• Distance: [0, π]
TT UUSWW 2
2
( , ) cos( , )T
i ji j i j i
j
u S uK r r u S u S
u S u S
1( , ) cos ( , )i j i jD r r K r r
![Page 16: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/16.jpg)
16
Unit-Unit Comparisons
![Page 17: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/17.jpg)
17
Closeness Measure
• Composition-Composition Comparisons:
• Cosine measure:
• Distance: [0, π]
2T TW W VS V
2
( , ) cos( , )T
i ji j i j i
j
v S vK c c v S v S
v S v S
1( , ) cos ( , )i j i jD c c K c c
![Page 18: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/18.jpg)
18
Closeness Measure
• Unit-Composition Comparisons:
• Cosine measure:
• Distance: [0, π]
TW USV
1/ 2 1/ 2
1/ 2 1/ 2( , ) cos( , )
Ti j
i j i j ij
u SvK r c u S v S
u S v S
1( , ) cos ( , )i j i jD r c K r c
![Page 19: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/19.jpg)
19
LSM Framework Extension
• Observe a new composition , p>N, the tilde symbol reflects the fact that the composition was not part of the original N
• , a column vector of dimension M, can be thought of as an additional column of the matrix W
• U, S do not change: Tp pc USv
pc
pc
![Page 20: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/20.jpg)
20
LSM Framework Extension
: pseudo-composition
: pseudo-composition vector
• If the addition of causes the major structural associations in W to shift in some substantial manner, the singular vectors will become inadequate.
is similar to a composition vector
Tp p p
p
v v S c U
v
pc
pv
pc
![Page 21: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/21.jpg)
21
LSM Framework Extension
• It would be necessary to re-compute SVD to find a proper representation for pc
![Page 22: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/22.jpg)
22
Salient Characteristics of LSM
• A single vector embedding for both units and compositions in the same continuous vector space L
• A relatively low dimensionality, which make operations such as clustering meaningful and practical
• An underlying structure reflecting globally meaningful relationships, with natural similarity metrics to measure the distance between units, between compositions or between units and compositions in L
![Page 23: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/23.jpg)
23
Applications
• Semantic classification
• Multi-span language modeling
• Junk e-mail filtering
• Pronunciation modeling
• TTS Unit Selection
![Page 24: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/24.jpg)
24
Semantic Classification
• Semantic classification refers to determine which one of predefined topic a given document is most closely aligned with
• The centroid of each clusters can be viewed as the semantic representation of this outcome in LSM space– Semantic anchor
• A newly observed word sequence measures by computing the distance between the document and semantic anchor, and pick minimum
1( , ) cos ( , )i j i jD c c K c c
![Page 25: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/25.jpg)
25
Semantic Classification
• Domain knowledge is automatically encapsulated in the LSM space in a data-driven fashion
• For Desktop interface control:– Semantic inference
![Page 26: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/26.jpg)
26
Semantic Inference
![Page 27: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/27.jpg)
27
Multi-Span Language Modeling
• In a standard n-gram , the history is string
• In LSM language modeling, the history is the current document up to word
• Pseudo-document:– Continually updated as q increases
( )1 1 2 1...n
q q q q nH r r r
1qr
( )1 1lq qH c
1
1( 1) (1 )q q q q i i
q
c c S n c rSn
![Page 28: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/28.jpg)
28
Multi-Span Language Modeling
• An Integrated n-gram + LSM formulation for the overall language model probability:
– Different syntactic constructs can be used to carry the same meaning (content words)
( ) ( ) ( )1 1 1Pr( | ) Pr( | , )n l n l
q q q q qr H r H H
1 2 1 1( )1
1 2 1 1
Pr( | ... ) Pr( | )Pr( | )
Pr( | ... ) Pr( | )i
q q q q n q qn lq q
i q q q n q qr M
r r r r c rr H
r r r r c r
![Page 29: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/29.jpg)
29
Multi-Span Language Modeling( ) ( ) ( )
1 1 1
( ) ( ) ( ) ( )1 1 1 1
( ) ( ) ( ) ( )1 1 1 1
( ) ( ) ( )1 1 1
( ) ( )1 1 1
Pr( | ) Pr( | , )
Pr( , | ) Pr( , | )
Pr( | ) Pr( , | )
Pr( | ) Pr( | , )
Pr( | ) Pr( | ,
i
n l l nq q q q q
l n l nq q q q q q
l n l nq q i q q
r M
n l nq q q q q
n li q q i q
r H r H H
r H H r H H
H H r H H
r H H r H
r H H r H
( )
1 2 1 1 1 2 1
1 2 1 1 1 2 1
1 2 1 1
1 2 1 1
)
Pr( | ... ) Pr( | , ... )
Pr( | ... ) Pr( | , ... )
Pr( | ... ) Pr( | )
Pr( | ... ) Pr( | )
i
i
i
n
r M
q q q q n q q q q q n
i q q q n q i q q q nr M
q q q q n q q
i q q q n q ir M
r r r r c r r r r
r r r r c r r r r
r r r r c r
r r r r c r
Assume that the probability of the document History given the current word is not affected by immediate context preceding it
![Page 30: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/30.jpg)
30
Multi-Span Language Modeling
1 2 1 1( )1
1 2 1 1
11 2 1
11 2 1
1 2
Pr( | ... ) Pr( | )Pr( | )
Pr( | ... ) Pr( | )
Pr( , )Pr( | ... )
Pr( )
Pr( , )Pr( | ... )
Pr( )
Pr( | .
i
i
q q q q n q qn lq q
i q q q n q ir M
q qq q q q n
q
q ii q q q n
r M i
q q q
r r r r c rr H
r r r r c r
c rr r r r
r
c rr r r r
r
r r r
1 11
1 11 2 1
Pr( | ) Pr( ).. )
Pr( )
Pr( | ) Pr( )Pr( | ... )
Pr( )i
q q qq n
q
i q qi q q q n
r M i
r c cr
r
r c cr r r r
r
![Page 31: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/31.jpg)
31
Junk E-mail Filtering
• It can be viewed as a degenerate case of semantic classification (two categories)– Legitimate – Junk
• M: an inventory of words, symbols• N: a binary collection of email messages• Two semantic anchors
![Page 32: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/32.jpg)
32
Pronunciation Modeling
• Also called grapheme-to-phoneme conversion (GPC)
• Orthographic anchors – (one for each in-vocabulary word)
• Orthographic neighborhood– In-vocabulary word with High closeness for out-
vocabulary word
![Page 33: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/33.jpg)
33
Pronunciation Modeling
![Page 34: Jerome R. Bellegarda](https://reader035.vdocument.in/reader035/viewer/2022062408/56813a38550346895da2231c/html5/thumbnails/34.jpg)
34
Conclusions
• Descriptive Power– Forgoing local constraints is not acceptable in some si
tuations
• Domain Sensitivity– Depend on the quality of the training data– polysemy
• Updating the LSM Space– SVD on the fly is not practical
• Success of LSM for three characteristics