![Page 1: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/1.jpg)
Comparing Twitter Summarization Al-gorithms for Multiple Post SummariesDavid Inouye and Jugal K. KalitaSocialCom 2011
2013 May 10Hyewon Lim
![Page 2: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/2.jpg)
Outline Introduction Related Work Problem Definition Selected Approaches for Twitter Summaries Experimental Setup Results and Analysis Conclusion
2/24
![Page 3: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/3.jpg)
Introduction Motivation of the summarizer
3/24
![Page 4: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/4.jpg)
Introduction Prior work
– “A torch extinguished: Ted Kennedy dead at 77.”“A legend gone: Ted Kennedy died of brain cancer.”“Ted Kennedy was a leader.”“Ted Kennedy died today.”
B. Sharifi et al., “Automatic Summarization of Twitter Topics” 4/24
![Page 5: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/5.jpg)
Introduction Prior work (cont.)
– “A torch extinguished: Ted Kennedy dead at 77.”“A legend gone: Ted Kennedy died of brain cancer.”“Ted Kennedy was a leader.”“Ted Kennedy died today.”
Best final summary: Ted Kennedy died
B. Sharifi et al., “Automatic Summarization of Twitter Topics” 5/24
![Page 6: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/6.jpg)
Introduction We create summaries that contain multiple posts
– Several sub-topics or themes in a specified topic
6/24
![Page 7: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/7.jpg)
Outline Introduction Related Work Problem Definition Selected Approaches for Twitter Summaries Experimental Setup Results and Analysis Conclusion
7/24
![Page 8: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/8.jpg)
Related Work Text summarization
– Reduce the amount of content to read– Reduce the number of features required for classifying or clustering
Multi-document summarization– Potential redundancy
Algorithms – SumBasic, Centroid, LexRank, TextRank, MEAD, …
8/24
![Page 9: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/9.jpg)
Related Work SumBasic
Centroid
“A torch extinguished: Ted Kennedy dead at 77.”“A legend gone: Ted Kennedy died of brain cancer.”“Ted Kennedy was a leader.”“Ted Kennedy died today.”
Ted Kennedy died
(D. R. Radev et al., “Centroid-based summarization of multiple documents”)
9/24
![Page 10: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/10.jpg)
Related Work LexRank
– Adjacency matrix for computing the relative importance of sentences
TextRank– Find the most highly ranked sentences using the PageRank
Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are consid-ered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corre-sponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types.
10/24
![Page 11: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/11.jpg)
Outline Introduction Related Work Problem Definition Selected Approaches for Twitter Summaries Experimental Setup Results and Analysis Conclusion
11/24
![Page 12: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/12.jpg)
Problem Definition Given
– A topic keyword or phrase T– Length k for the summary
Output– A set of representative posts S with a cardinality of k
such that1) ∀s ∈ S, T is in the text of s2) ∀si, ∀sj ∈ S, si ≁ sj
12/24
![Page 13: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/13.jpg)
Selected Approaches for Twitter Summaries TF-IDF
(Term frequency) * (Inverse document frequency)
A microblog post is not a traditional document– Define a single document that encompass all the posts => IDF↓– Define each post as a document => TF↓
A…….A……………………A……......................……………………….A……………………………
A A
A
13/24
![Page 14: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/14.jpg)
Selected Approaches for Twitter Summaries Hybrid TF-IDF
– Define a document as a single post– Computing the term frequencies
Assume the document is the entire collection of posts
Select the top k most weighted posts– Cosine similarity for avoiding redundancy
14/24
![Page 15: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/15.jpg)
Selected Approaches for Twitter Summaries Cluster summarizer
1. Cluster the tweets into k clusters based on a similarity measure2. Summarize each cluster by picking the most weighted post
Bisecting k-means++ algorithm– Bisecting k-means
– k-means++ Chooses the next centroid ci, selecting ci = v’ ∈ V with probability
15/24
![Page 16: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/16.jpg)
Selected Approaches for Twitter Summaries k-means++
k-means
Outlier problem
k-means++
http://blog.sragent.pe.kr/ 16/24
![Page 17: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/17.jpg)
Selected Approaches for Twitter Summaries Algorithms to compare results
– Baseline Random summarizer Most recent summarizer
– SumBasic Depends only on the frequency of words
– MEAD Comparison between the more structured document domain and Twitter
– Graph-based method LexRank TextRank
17/24
![Page 18: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/18.jpg)
Outline Introduction Related Work Problem Definition Selected Approaches for Twitter Summaries Experimental Setup Results and Analysis Conclusion
18/24
![Page 19: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/19.jpg)
Experimental Setup Data collection
– 5 consecutive days– Top ten currently trending topics every day– Approximately 1500 tweets for each topic
ROUGE– Automated summary vs. manual summaries
Choice of k
19/24
![Page 20: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/20.jpg)
Results and Analysis Average F-measure, precision and recall
20/24
![Page 21: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/21.jpg)
Results and Analysis Average score for human evaluation
21/24
![Page 22: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/22.jpg)
Results and Analysis Paired two-sided T-test
LexR
ank
Text
Rank
Hybrid
TF-
IDF
Sum
Basic
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Recall
Precision
22/24
![Page 23: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/23.jpg)
Outline Introduction Related Work Problem Definition Selected Approaches for Twitter Summaries Experimental Setup Results and Analysis Conclusion
23/24
![Page 24: Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom 2011 2013 May 10 Hyewon Lim](https://reader035.vdocument.in/reader035/viewer/2022062518/56649ccf5503460f9499bcdc/html5/thumbnails/24.jpg)
Conclusion The best techniques for summarizing Twitter topics
– Simple word frequency – Redundancy reduction
Simple algorithms seem to perform well – Not clear that added complexity will improve the quality of the summaries
Extension– Extrinsic evaluations (e.g., user survey)– Dynamically discovering a good value for k for k-means– Detect named entities and events in the documents
24/24