neural summarization by extracting sentences and words

Neural Summarization by Extracting Sentences

and WordsJianpeng Cheng and Mirella Lapata

ACL 2016

Presentator: Tomonori Kodaira

1

Intro

• Task: Single document summarization (extracting sentences or words)

• Model: a neural network-based hierarchical document reader or encoder and an attention-based content extractor.

• Data: DailyMail news

2

Problem Formulation• Sentence Extraction

a summary from D by selecting a subset of j sentences. (predicting label: yL ∈ {0, 1})

• Word Extractiona language generation task with out put vocabulary restricted to the original document.

3

Data• They make two large-scale datasets.

•

4

Data (sentence extraction)

• They retrieved hundreds of thousands of news articles and their corresponding highlights from DailyMail.

• They designed a rule-based system that determines whether a document sentence matches a highlight.(Woodsend and Lapata, 2010)

5

Data (word extraction dataset)

• In cases where all highlights words come from the original document, the pair is added the dataset.

• For OOV words, they check if a neighbor, represented by pre-trained embeddings, is in the original document.

• If they cannot find any substitutes, they discard the pair.

• word extraction dataset containing 170K articles.

6

Neural Summarization Model

• Key components:

• neural network-based hierarchical document reader

• attention-based hierarchical content extractor.

7

8

kernel K ∈ Rc x d of width cW ∈ Rn x d

Document Reader (Convolutional Sentence Encoder)

sum these sentence vectors

• Long Short-Term Memory (LSTM) activation unit for ameliorating the vanishing gradient problem when training long sequences (Hochreiter and Schmidhuber, 1997)

9

Document Reader (Recurrent Document Encoder)

• Their sentence extractor applies attention to directly extract salient sentences after reading them.

• at the beginning, they set pt-t to the true label of the previous sentence; as training goes on, they gradually shift its value to the predicted label.

10

Sentence Extractor

Word Extractor

• a sequential labeling model

• use n-gram features collected from the document to rerank candidate summaries obtained via beam decoding.

• incorporate the features in a log-linear reranker whose feature weights are optimized with minimum error rate training (Och, 2003)

11

Experimental Setup

• Datasets:

• two datasets created from DailyMail news: 90% for training, 5% for validation and 5% for testing

• DUC-2002 single document summarization task.

12

• Parameters:

• Adam (learning rate 0.01)

• The two momentum parameters: 0.99 and 0.999.

• batch size of 20 documents

• The size of word, sentence, document embedding: 150, 300, and 750.(word embedding is pre-trained)

• Kernel sizes {1, 2, 3, 4, 5, 6, 7}

• drop out 0.5

• The depth of each LSTM module: 113

Experimental Setup

• LEAD (leading three sents.)

• LREG (logistic regression)

• ILP

• NN-ABS (Rush et al. 2015)

• TGRAPH (Parveen et al., 2015)

• URANK (Wan, 2010)

• NN-SE (Sentence extractor)

• NN-WE (Word extractor)14

Results

Results

15

• evaluate the generated summaries by eliciting human judgments for 20 randomly sampled DUC 2002 test documents.

• Subjects were asked to rank the summaries from best to wrost (with ties allow)

• collect 5 responses per document.

Results

16

Conclusion• They developed two classes of models based on

sentence and word extractor.

• Future Work:

• combining their model with a tree-based algorithm (Cohn and Lapata, 2009)

• or phrase-based(Lebret et al., 2015).

17

neural summarization by extracting sentences and words

Science