semantic wordfication of document collections presenter: yingyu wu
TRANSCRIPT
![Page 1: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/1.jpg)
Semantic Wordfication of Document Collections
Presenter: Yingyu Wu
![Page 2: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/2.jpg)
Outline
• Introduction• ProjCloud Technique• Results and Comparisons• Discussion and Limitations• Conclusion
![Page 3: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/3.jpg)
Introduction
• Word Cloud
![Page 4: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/4.jpg)
• Two issues of word cloud:
• (1) Existing methods do not yet provide an intuitive visual representation that allows to link words on the layout to the documents they are meant to represent.
• (2) The construction of word clouds inside general polygons with semantical preservation between words.
![Page 5: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/5.jpg)
• Contributions:• A novel word cloud-based visualization technique, named ProjCloud.
• (1) combine multidumensional projection and word clouds, which enables to visualize the similarity among documents as well as their corresponding word clouds, extend the exploratory capabilities of the word clouds.
• (2) A new approach for building word clouds inside polygons while still preserving the semantic relationship among keywords.
• (3) A mechanism based on spectral sorting that allows arranging words according to their semantic relationship as well as highlighting the most important words in the cloud.
![Page 6: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/6.jpg)
ProjCloud Technique
• Overview of the sequence of steps
![Page 7: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/7.jpg)
• Steps:• (1) Mapping document collection into the visual space using a
multidimensional projection technique(LSP).
• (2) Points in the visual space are clustered(polygons). Two versions: automatically and user interactive.
• (3) Keywords extracted (most frequent words). Compute their relevance in order to guide the semantic preserving placement of words • (4) The scaling step take place, keyword are size based on their relevance and on the area of the containing polygon.
• (5) The optimization algorithm take places to generate the word cloud.
![Page 8: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/8.jpg)
![Page 9: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/9.jpg)
Keyword Relevance and Semantic Relation
• Let M be the document x tem frequency matrix.• Covariance matrix C obtained from M.• Build a graph G where each node corresponds to a keyword and an edge
eij connects between two keywords ( Wi and Wj ) if only if the covariance Cij is among the k-largest ones.
• Assuming that edge eij has weight Cij, it used Fiedler vector, assigns a scalar value aij to each keyword that minimizes:
• If Cij is big then the Wi and Wj will receive similar values when they are closely related.
![Page 10: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/10.jpg)
• The most relevant keyword:• Cijmax is the largest covariance in C and Wi and Wj are the corresponding
words.• The most relevant keyword is Wi if the average covariance between Wi
and Wk (k = 1,2,3,..n) is larger than the average covariance of Wj.
• Once we get the most relevant keyword (Wr), the keyword are sorted in increasing order according to
• In ProjCloud, the order given by Fiedler vector dictates the position of words into the cloud.
![Page 11: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/11.jpg)
• Sizing keywords• (1) bounding boxes. • (2) the size of keyword is set to the scale value
which fits in the interval [fmin, fmax](12,50).• (3) If the areas of all keyword bounding boxes is
smaller than the area of polygon P, fmax is increased and the values are re-scaled. This process is repeated until the sum of areas of the keywords exceeds the area or P.
![Page 12: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/12.jpg)
• The optimization Problem
![Page 13: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/13.jpg)
Results
![Page 14: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/14.jpg)
![Page 15: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/15.jpg)
Comparisons
![Page 16: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/16.jpg)
![Page 17: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/17.jpg)
Discussion and Limitations
• ProjCloud is largely dependent on the clustering process.
• If the clustering performs poorly, it will make the word cloud very hard to fit and reed.
• Empty space between clusters.
![Page 18: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/18.jpg)
Conclusion
![Page 19: Semantic Wordfication of Document Collections Presenter: Yingyu Wu](https://reader035.vdocument.in/reader035/viewer/2022062408/56649f225503460f94c3a825/html5/thumbnails/19.jpg)
Thank you