local linear matrix factorization for document modeling institute of computing technology, chinese...
TRANSCRIPT
![Page 1: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/1.jpg)
Local Linear Matrix Factorization for Document
Modeling
Institute of Computing Technology, Chinese Academy of
Sciences
Lu Bai, Jiafeng Guo, Yanyan Lan, Xueqi Cheng
![Page 2: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/2.jpg)
Outline
Introduction Our approach Experimental results Conclusion
![Page 3: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/3.jpg)
Introduction
classification
ranking
recommendation
classification
ranking
recommendation
![Page 4: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/4.jpg)
Background
The low dimensional representations can be produced from decomposing the document-word matrix into low rank matrices
Preserving local geometric
relations can improve the low dimensional representation
Smoothing the low dimensional representation
Improving the model’s generalization
Avoiding over fitting
DT∈ RN×M = θ∈ RN×K β∈ RK×M×
D : document-word matrix θ : document topic matrix β : term-topic matrix
L
![Page 5: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/5.jpg)
Previous work
No local geometric regularization• None or global
regularization only • e.g. SVD, PLSA,
LDA, NMF, etc.• Over-fitting & poor
generalization
Pairwise Neighborhood Smoothing• Increasing the low
dimensional affinity over nearby document pairs
• e.g. LapPLSA, LTM, DTM, etc.
• Losing the geometric information among pairs, especially in unbalanced document distribution
Heuristic similarity measure & neighbors • Empirical similarity
threshold and neighbor numbers
• e.g. LapPLSA, LTM• Improper similarity
measure or number of neighbors hurts the representation
A new low dimensional representation mining method by better exploiting the geometric relationship among
documents
![Page 6: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/6.jpg)
Our approach
Basic ideas
• Factorizing document-word matrix in NMF way
Mining low dimensional semantic representation
• Modeling document’s relationships with local linear combination
Preserving rich local geometric information
• Regularizing local linear combination weights with norm
Selecting neighbors without similarity measure and threshold
![Page 7: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/7.jpg)
Local Linear Matrix Factorization(LLMF)
Factorizing the document-term matrix as NMF
, are used for reducing over-fitting Factorizing the matrix with neighbors
denotes the normalized document-word matrix , avoids the bias of long documents
denotes the linear combination weight weights the norm of
Picking document neighbors Learning salient combination weights
min
min
![Page 8: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/8.jpg)
Cont’
Combining matrix factorization and local neighbor factorization ,
,
Final object function
min
![Page 9: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/9.jpg)
Graphic Model of LLMF
![Page 10: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/10.jpg)
LLMF vs Others
Comparing models without geometric information
E.g. NMF, PLSA, LDA LLMF smoothes document
representation with its neighbors
Comparing models with geometric constraints
E.g. LapPLSA, LTM LLMF is free of similarity
measure and neighborhood threshold
LLMF is more robust in preserving local geometric structure in unbalanced data distribution
φ ABφ AD
φ ACA
B
C
D
E
F
![Page 11: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/11.jpg)
Model fitting
Estimating firstly Not differentiable, because of the
norm OWL-QN
Estimating , are bi-convex on Coordinate gradient descent
![Page 12: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/12.jpg)
Experimental Settings
Data set 20news & la1(from Weka) Word Stemming Stop words removing
Data sets
Num. Of Document
Num. of word
Num. of category
20news 18,744 26, 214 20
la1 2,850 13,195 5
![Page 13: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/13.jpg)
Cont’
Baseline method PLSA, LDA, NMF, LapPLSA
Parameter setting Low Dimension , , for norm for norm
Document classification Libsvm, linear kernel Training set : testing set = 3:2
![Page 14: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/14.jpg)
Experimental Results
† Topic lablels are assigned according to top words in them manually
Topics Learned by LLMF over the Two Datasets
![Page 15: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/15.jpg)
Cont’ Document classification
LapPLSA and LLMF are better than NMF, PLSA, LDA
LLMF achieves highest accuracy than all baseline methods in both datasets
LLMF with different s is consistently better than pure NMF
![Page 16: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/16.jpg)
Conclusion
Conclusions We propose a novel method, namely LLMF for learning low
dimensional representations of document with local linear constraints.
LLMF can better capture the rich geometric information among documents than those based on independent pairwise relationships.
Experiments on benchmark of 20news and la1 show the proposed approach can learn better semantic representations compared to other baseline methods
Future works We would extend LLMF to paralleled and distributed
settings It is promising to apply LLMF in recommendation systems
![Page 17: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/17.jpg)
References
D. M. Blei, A. Y. Ng, M. I. Jordan, and J. Lafferty. Latent dirichlet allocation. JMLR, 3:2003, 2003.
D. Cai, X. He, and J. Han. Locally consistent concept factorization for document clustering. TKDE, 23(6):902–913,2011
D. Cai, Q. Mei, J. Han, and C. Zhai. Modeling hidden topics on document manifold. CIKM ’08, 911–920,, NY, USA, 2008. ACM
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. In Machine Learning, page 2001, 2001
S. Huh and S. E. Fienberg. Discriminative topic modeling based on manifold learning. KDD ’10, pages 653–662, New York, NY, USA, 2010. ACM
![Page 18: Local Linear Matrix Factorization for Document Modeling Institute of Computing Technology, Chinese Academy of Sciences bailu@software.ict.ac.cn Lu Bai,](https://reader036.vdocument.in/reader036/viewer/2022081603/5697c0051a28abf838cc5177/html5/thumbnails/18.jpg)
Thanks!!
Q&A