clustering tagged documents with labeled and unlabeled documents
DESCRIPTION
Clustering tagged documents with labeled and unlabeled documents. Presenter : Jian-Ren Chen Authors : Chien -Liang Liu*, Wen -Hoar Hsaio , Chia -Hoang Lee, Chun- Hsien Chen 2013 , IPM. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/1.jpg)
Intelligent Database Systems Lab
Presenter : JIAN-REN CHEN
Authors : Chien-Liang Liu*, Wen-Hoar Hsaio, Chia-Hoang Lee,
Chun-Hsien Chen
2013 , IPM
Clustering tagged documents with labeled and unlabeled documents
![Page 2: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/2.jpg)
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
![Page 3: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/3.jpg)
Intelligent Database Systems Lab
MotivationTags can provide semantic information about the resources and
they can help machines perform the classification or clustering
tasks accurately.
Probabilistic latent semantic analysis (PLSA)
- aspect model
- statistical clustering model
![Page 4: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/4.jpg)
Intelligent Database Systems Lab
ObjectivesThis study employs Constrained-PLSA to cluster tagged documents
with a small amount of seeds.
The Constrained-PLSA is based on statistical clustering model
rather than aspect model.
![Page 5: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/5.jpg)
Intelligent Database Systems Lab
Methodology - PLSA
Terms (keywords) of the document collection
documents
E-step
M-step
![Page 6: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/6.jpg)
Intelligent Database Systems Lab
Methodology - Constrained-PLSAE-step
M-step
![Page 7: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/7.jpg)
Intelligent Database Systems Lab
Experiments - Data set A (CiteULike)
![Page 8: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/8.jpg)
Intelligent Database Systems Lab
Experiments (Data set A)
![Page 9: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/9.jpg)
Intelligent Database Systems Lab
Experiments - Data set B (CiteULike)
![Page 10: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/10.jpg)
Intelligent Database Systems Lab
Experiments (Data set B)
![Page 11: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/11.jpg)
Intelligent Database Systems Lab
Conclusions• The performance of ‘‘tags as words’’ representation scheme is
more stable than ‘‘words + tags’’ representation scheme.
• Unsupervised learning methods fail to function properly in
the data set with noisy information, but Constrained-PLSA
function properly and stable even though only a small amount
of labeled data is available.
![Page 12: Clustering tagged documents with labeled and unlabeled documents](https://reader036.vdocument.in/reader036/viewer/2022081514/5681662a550346895dd98948/html5/thumbnails/12.jpg)
Intelligent Database Systems Lab
Comments• Advantages
- Constrained-PLSA outperforms the other methods• Disadvantage
- too much artificial processing in experiment• Applications- text mining- tagged document clustering