neural summarization by extracting sentences and words
TRANSCRIPT
Neural Summarization by Extracting Sentences
and WordsJianpeng Cheng and Mirella Lapata
ACL 2016
Presentator: Tomonori Kodaira
1
Intro
• Task: Single document summarization (extracting sentences or words)
• Model: a neural network-based hierarchical document reader or encoder and an attention-based content extractor.
• Data: DailyMail news
2
Problem Formulation• Sentence Extraction
a summary from D by selecting a subset of j sentences. (predicting label: yL ∈ {0, 1})
• Word Extractiona language generation task with out put vocabulary restricted to the original document.
3
Data• They make two large-scale datasets.
•
4
Data (sentence extraction)
• They retrieved hundreds of thousands of news articles and their corresponding highlights from DailyMail.
• They designed a rule-based system that determines whether a document sentence matches a highlight.(Woodsend and Lapata, 2010)
5
Data (word extraction dataset)
• In cases where all highlights words come from the original document, the pair is added the dataset.
• For OOV words, they check if a neighbor, represented by pre-trained embeddings, is in the original document.
• If they cannot find any substitutes, they discard the pair.
• word extraction dataset containing 170K articles.
6
Neural Summarization Model
• Key components:
• neural network-based hierarchical document reader
• attention-based hierarchical content extractor.
7
8
kernel K ∈ Rc x d of width cW ∈ Rn x d
Document Reader (Convolutional Sentence Encoder)
sum these sentence vectors
• Long Short-Term Memory (LSTM) activation unit for ameliorating the vanishing gradient problem when training long sequences (Hochreiter and Schmidhuber, 1997)
9
Document Reader (Recurrent Document Encoder)
• Their sentence extractor applies attention to directly extract salient sentences after reading them.
• at the beginning, they set pt-t to the true label of the previous sentence; as training goes on, they gradually shift its value to the predicted label.
10
Sentence Extractor
Word Extractor
• a sequential labeling model
• use n-gram features collected from the document to rerank candidate summaries obtained via beam decoding.
• incorporate the features in a log-linear reranker whose feature weights are optimized with minimum error rate training (Och, 2003)
11
Experimental Setup
• Datasets:
• two datasets created from DailyMail news: 90% for training, 5% for validation and 5% for testing
• DUC-2002 single document summarization task.
12
• Parameters:
• Adam (learning rate 0.01)
• The two momentum parameters: 0.99 and 0.999.
• batch size of 20 documents
• The size of word, sentence, document embedding: 150, 300, and 750.(word embedding is pre-trained)
• Kernel sizes {1, 2, 3, 4, 5, 6, 7}
• drop out 0.5
• The depth of each LSTM module: 113
Experimental Setup
• LEAD (leading three sents.)
• LREG (logistic regression)
• ILP
• NN-ABS (Rush et al. 2015)
• TGRAPH (Parveen et al., 2015)
• URANK (Wan, 2010)
• NN-SE (Sentence extractor)
• NN-WE (Word extractor)14
Results
Results
15
• evaluate the generated summaries by eliciting human judgments for 20 randomly sampled DUC 2002 test documents.
• Subjects were asked to rank the summaries from best to wrost (with ties allow)
• collect 5 responses per document.
Results
16
Conclusion• They developed two classes of models based on
sentence and word extractor.
• Future Work:
• combining their model with a tree-based algorithm (Cohn and Lapata, 2009)
• or phrase-based(Lebret et al., 2015).
17