text summarization jagadish m(07305050) annervaz k m (07305063) joshi prasad(07305047) ajesh...
Post on 22-Dec-2015
214 views
TRANSCRIPT
![Page 1: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/1.jpg)
Text Summarization
Jagadish M(07305050)Annervaz K M (07305063)
Joshi Prasad(07305047)Ajesh Kumar S(07305065)Shalini Gupta(07305R02)
![Page 2: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/2.jpg)
Introduction
Summary: Brief but accurate representation of the contents of a document
Goal: Take an information source, extract the most important content from it and present it to the user in a condensed form and in a manner sensitive to the user’s needs.
Compression: Amount of text to present or the length of the summary to the length of the source.
![Page 3: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/3.jpg)
MSWord AutoSummarize
![Page 4: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/4.jpg)
![Page 5: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/5.jpg)
Presentation Outline Motivation Different Genres Simple Statistical Techniques Degree Centrality Lex Rank Lexical/Co-reference Chains Rhetorical Structure Theory WordNet Based Methods DUC/TAC
![Page 6: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/6.jpg)
Motivation
Abstracts for Scientific and other articles
News summarization (mostly Multiple document summarization)
Classification of articles and other written data
Web pages for search engines Web access from PDAs, Cell phones Question answering and data gathering
![Page 7: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/7.jpg)
Genres
Indicative vs. informative used for quick categorization vs. content
processing. Extract vs. abstract
lists fragments of text vs. re-phrases content coherently.
Generic vs. query-oriented provides author’s view vs. reflects user’s interest.
Background vs. just-the-news assumes reader’s prior knowledge is poor vs. up-
to-date. Single-document vs. multi-document source
based on one text vs. fuses together many texts.
![Page 8: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/8.jpg)
Statistical scoring
Scoring techniques Word frequencies throughout the
text(Luhn58) Position in the text(Edmundson69) Title Method(Edmundson69) Cue phrases in sentences (Edmundson69)
![Page 9: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/9.jpg)
Luhn58
Important words occur fairly frequently
Earliest work in field
![Page 10: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/10.jpg)
Statistical Approaches(contd..)
Degree Centrality LexRank Continuous LexRank
![Page 11: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/11.jpg)
Degree Centrality
Problem Formulation Represent each sentence by a vector Denote each sentence as the node of a
graph Cosine similarity determines the edges
between nodes
![Page 12: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/12.jpg)
Degree Centrality
Since we are interested in significant similarities, we can eliminate some low values in this matrix by defining a threshold.
![Page 13: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/13.jpg)
Degree Centrality
Compute the degree of each sentence
Pick the nodes (sentences) with high degrees
![Page 14: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/14.jpg)
Degree Centrality
Disadvantage in Degree Centrality approach
![Page 15: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/15.jpg)
LexRank
Centrality vector p which will give a lexrank of each sentence (similar to page rank) defined by :
![Page 16: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/16.jpg)
What Should B Satisfy?
Stochastic Matrix and Markov Chain property.
Irreducible. Aperiodic
![Page 17: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/17.jpg)
Perron-Frobenius Theorem
An irreducible and aperiodic Markov chain is guaranteed to converge to a stationary distribution
![Page 18: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/18.jpg)
Reducibility
![Page 19: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/19.jpg)
Aperiodicity
![Page 20: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/20.jpg)
LexRank
B is a stochastic matrix Is it an irreducible and aperiodic
matrix? Dampness (Page et al. 1998)
![Page 21: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/21.jpg)
Matrix Form of p for Dampening
Solve for p using Power method
![Page 22: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/22.jpg)
Continuous LexRank
![Page 23: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/23.jpg)
Linguistic/Semantic Methods
Co-reference /Lexical Chain Rhetorical Analysis
![Page 24: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/24.jpg)
Co-reference/Lexical Chains
Assumption/Observation :- Important parts in a text will be more related in a semantic interpretation
Co-reference / Lexical Chains (Object-Action, Part-of relation, Semantically related)
Important sentences will be traversed by more number of such chains
![Page 25: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/25.jpg)
Co-reference/Lexical Chains
Mr. Kenny is the person that invented the anesthetic machine which uses micro-computers to control the rate at which an anesthetic is pumped into the blood. Such machines are nothing new. But his device uses two micro-computers to achieve much closer monitoring of the pump feeding the anesthetic into the patient
![Page 26: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/26.jpg)
Rhetorical Structure Theory
Mann & Thompson 88 Rhetoric Relation
Between two non-overlapping text snippets
Nucleus - Core Idea, Writers Purpose Satellite - Referred in context to nucleus
for Justifying, Evidencing, Contradicting etc
![Page 27: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/27.jpg)
Rhetorical Structure Theory Nucleus of a rhetorical relation is
comprehensible independent of the satellite, but not vice versa
All rhetoric relations are not nucleus-satellite relations, Contrast is a multinuclear relationship
Example: evidence [The truth is that the pressure to smoke in 'junior high' is greater than it will be any other time of one’s life:][ we know that 3,000 teens start smoking each day.]
![Page 28: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/28.jpg)
Rhetorical Structure Theory
Rhetoric Parsing Breaks into elementary units Uses cue phrases(discourse markers) and
notion of semantic similarity in order to hypothesize rhetorical relations
Rhetorical relations can be assembled into rhetorical structure trees (RS-trees) by recursively applying individual relations across the whole text
![Page 29: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/29.jpg)
2Elaboration
2Elaboration
8Example
2BackgroundJustification
3Elaboration
8Concession
10Antithesis
Mars experiences
frigid weather
conditions(2)
Surface temperatures typically average
about -60 degrees
Celsius (-76 degrees
Fahrenheit) at the
equator and can dip to -
123 degrees C near the
poles(3)
4 5Contrast
Although the atmosphere
holds a small
amount of water, and water-ice
clouds sometimes develop,
(7)
Most Martian weather involves
blowing dust and carbon monoxide.
(8)
Each winter, for example, a blizzard of
frozen carbon dioxide
rages over one pole, and a few meters of
this dry-ice snow
accumulate as
previously frozen carbon dioxide
evaporates from the opposite
polar cap.(9)
Yet even on the summer pole, where
the sun remains in the sky all day long,
temperatures never warm
enough to melt frozen
water.(10)
With its distant orbit (50 percent farther from the sun than Earth) and
slim atmospheric
blanket,(1)
Only the midday sun at tropical latitudes is
warm enough to
thaw ice on occasion,
(4)
5Evidence
Cause
but any liquid water formed in this way would
evaporate almost
instantly(5)
because of the low
atmospheric pressure
(6)
![Page 30: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/30.jpg)
RST Based Summarization
Multiple RS-trees A built RS-tree captures relations in the
text and can be used for high quality summarization
Picking up the ‘K’ nodes nearest to the root
Disadvantages
![Page 31: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/31.jpg)
WordNet based Approach for Summarization
Preprocessing of text Constructing sub-graph from WordNet Synset Ranking Sentence Selection Principal Component Analysis
![Page 32: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/32.jpg)
Preprocessing
Break text into sentences Apply POS tagging Identify collocations in the text Remove the stop words
Sequence is important
![Page 33: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/33.jpg)
Constructing sub-graph from WordNet Mark all the words and collocations in
the WordNet graph which are present in the text
Traverse the generalization edges up to a fixed depth, and mark the synsets you visit
Construct a graph, containing only the marked synsets
![Page 34: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/34.jpg)
Synset Ranking
Rank synsets based on their relevance to text
Construct a Rank vector, corresponding to each node of the graph, initialized to 1/√ (no_of_nodes, n in graph)
Create an authority matrix, A(i,j) = 1/(num_of_predecessors(j)), if j is a child of i.
![Page 35: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/35.jpg)
Synset Ranking
Update the R vector iteratively as,
Higher value implies better rank and higher relevance
![Page 36: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/36.jpg)
Sentence Selection
Construct a matrix, M with m rows and n columns
m is number of sentences and n is number of nodes
For each sentence Si
Traverse graph G, starting with words present in Si and following generalization edges
Find set of reachable synsets, SYi
For each syij ∈ SYi
set M[Si][syij] to rank of syij calculated in previous step
![Page 37: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/37.jpg)
Principal Component Analysis
Apply PCA on matrix M and get set of principal components or eigen vectors
Eigen value of each eigen vector is measure of relevance of eigen vector to the meaning
Sort Eigen vectors according to Eigen values
For each Eigen vector, find its projection on each sentence
![Page 38: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/38.jpg)
Principal Component Analysis
Select top nnumselect sentences for each eigen vector
nnumselect is proportional to the eigen values of the eigen vectors
nnumselect = i/∑j(j)) where i is the eigen value corresponding to the eigen vector, i
![Page 39: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/39.jpg)
Document Understanding Conference(DUC) Text Analysis Conference(TAC)
Interest and activity aimed at building powerful multi-purpose information systems
Evaluation results of various summarization techniques
www-nlpir.nist.gov/projects/duc/data.html
![Page 40: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/40.jpg)
Human Summary of Our Presentation :) What is Text Summarization? Why Text Summarization? Methods to Summarization
LexRank Lexical Chains Rhetorical Structure Theory Wordnet Based
![Page 41: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/41.jpg)
Challenges ahead..
Ensuring text coherency Sentences may have dangling
anaphors Summarizing non-textual data Handling multiple sources effectively High reduction rates are needed Achieving human quality
summarization!!
![Page 42: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/42.jpg)
References
Erkan, Radev, 2004. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. Vol: 22, 457 – 479, Journal of Artificial Intelligence Research
Barzilay, R. and M. Elhadad. 1997. Using Lexical Chains for Text Summarization. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the ACL/EACL Conference, 10–17. Madrid, Spain.
Mann, W.C. and S.A. Thompson. 1988. Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text 8(3), 243–281. Also available as USC/Information Sciences Institute Research Report RR-87-190.
![Page 43: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/43.jpg)
References
Baldwin, B. and T. Morton. 1998. Coreference-Based Summarization. In T. Firmin Hand and B. Sundheim (eds). TIPSTER-SUMMAC Summarization Evaluation. Proceedings of the TIPSTER Text Phase III Workshop. Washington.
Marcu, D. 1998. Improving Summarization Through Rhetorical Parsing Tuning. Proceedings of the Workshop on Very Large Corpora. Montreal, Canada.
Ramakrishnan and Bhattacharya, 2003. Text representation with wordnet synsets. Eighth International Conference on Applications of Natural Language to Information Systems (NLDB2003)
![Page 44: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/44.jpg)
References
Bellare,Anish S., Atish S., Loiwal, Bhattacharya, Mehta, Ramakrishnan, 2004. Generic Text Summarization using WordNet
Inderjeet Mani and Mark T. Maybury (eds). Advances in Automatic Text. Summarization. MIT Press, 1999. ISBN 0-262-13359-8.
www.wikipedia.com
![Page 45: Text Summarization Jagadish M(07305050) Annervaz K M (07305063) Joshi Prasad(07305047) Ajesh Kumar S(07305065) Shalini Gupta(07305R02)](https://reader035.vdocument.in/reader035/viewer/2022062516/56649d7f5503460f94a632b5/html5/thumbnails/45.jpg)
Thank You