modern history for text summarization and beyondpfliu.com/talk/summarization.pdfsummarization and...

Post on 28-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Modern History for Text Summarization and Beyond

Pengfei LiuPostdoc at LTI of CMUpfliu.com

Outline

Overview of Modern History

Some Highlighted Topics

Our Recent Work

Outline

Overview of Modern History

Some Highlighted Topics

Our Recent Work

Preparation: Research Papers

Summarization Papers:

• Year: 2013-Now

• Conference: ACL / EMNLP / NAACL / ICML / ICLR / AAAI / IJCAI / NeurIPS

Preparation: Research Concepts

rich of task settings!

Holistic Analysis

0

5

10

15

20

25

2013 2014 2015 2016 2017 2018 2019

# Pa

persACL EMNLP NAACL

2013-2015: increasing!

2015-2016: trough!

2016-2019: rapid increasing

Fine-grained Analysis: Overview (2013-19)

2013-2014 Task-setting Generation way

1) No NNs!2) First RL Paper3) Multimodal

Top10 Cited Papers

2015 Neural Arch.Task-setting Generation way

1) Finally, NNs arrived!

2) New dataset for NN!

Top10 Cited Papers

2016 ArchitectureTask-setting Generation way

1) Relatively more NNs2) Few papers

3) Important Techniques!

2016 ArchitectureTask-setting Generation way

1) Relatively more NNs2) Few papers

3) Important Techniques!0

10

20

2013 2014 2015 2016 2017 2018 2019

# Pa

pers

ACL EMNLP NAACL

Top10 Cited Papers

CNNDM

Copy

Coverage

Copy

Neural Ext

Neural Arch.2017

2) VAE arrived!3) More NNs but not booming … lag behind …

1) GNNs arrived!

0

10

20

2013 2014 2015 2016 2017 2018 2019

# Pa

pers

ACL EMNLP NAACL

Top10 Cited Papers

2018: booming!

1) Booming of NNs2) Booming of RL!3) Booming of Abs!

Top10 Cited Papers

2019

1) Booming of Pre-training2) Booming of GNNs3) Booming of Evaluation

Top10 Cited Papers

Lessons from History

• The development of deep NNs lags behind other tasks

• Summarization tasks requires some customized techniques (e.g. copy)

• Only technique-ready is not enough … dataset also matters!

• A good match between “techniques” and “datasets”

Outline

Overview of Modern History

Some Highlighted Topics

Our Recent Work

Outline

Overview of Modern History

Some Highlighted Topics

Our Recent Work

FactualitySemantic equivalencePre-trained models

Evaluation Metric: Factuality

• Event factuality prediction• the degree to which an event mentioned in a sentence has happened

• Social Media• fake news detection

• Dialog• consistency

• Machine Translation• semantic divergence

• Summarization

Factuality in Text Summarization

• Motivation

Whether facts in the generated summaries can be covered

• GoalGiven a document , and generated summary: the purpose is to learn a function fact checker:

d 1 2g { , , , }

ms s s L

Fact(d,g)

Born in Honolulu, Hawaii, Obama is a US Citizen

Obama is American

Document Sentence?

A Case Study on “FactCC” (Kryscinski et al.)

cnndm xsum arxiv pubmed bigpatent_b tifu_long0.413981 0.198641 0.438764 0.475539 0.637429 0.413868

Setting:

• Evaluating “FactCC” on (document, references) (Ideally, 100%)• Not good at predicting positive pairs.

A Case Study on “FactCC” (Kryscinski et al.) Setting:

• Cross-dataset• Extractive model: higher factuality score (but not 100%)• Abstractive model: using a in-domain training set is not a

guarantee.

Factuality in Text Summarization

• Challenges

• How to define a fact? (sentence? triple?)

• How to evaluate the effectiveness of your proposed

factuality checker?

• Source documents are too long!!!

• Negative predictive power

More Recent Progress …

• Asking and Answering Questions to Evaluate the Factual Consistency of Summaries (Wang et al. 2020, ACL)

More Recent Progress …

• Asking and Answering Questions to Evaluate the Factual Consistency of Summaries (Wang et al. 2020, ACL)

Evaluation Metric: Semantic equivalence

• Growing Trends:

• BLEURT: Learning Robust Metrics for Text Generation (Sellam et al. 2020, ACL)

• BERTScore: Evaluating Text Generation with BERT(Zhang et al. 2019)

• MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance (Zhao et al. 2019)

• Pre-trained (+ fine-tuned) • Semantic Matching

• Learnable

• Examples:

• Fine-grained

Growing Trend: Fine-grained Evaluation

• Fine-grained Meta-evaluation

• Evaluation metrics behave similarly (average-scoring range) strongly disagree in the higher-scoring range (Peyrard et al.2019)

• Calculating correlation in different top-K systems (Ma et al.2019)

Unsupervised Pre-trained Models

Extractive

Abstractive

Summarization

NLU (BERT-family)

NLG (BART)

Pre-trained

Unsupervised Pre-trained Models• Extractive

• Extractive -> Abstractive

• Exploring Task-specific Loss

Outline

Overview of Modern History

Some Highlighted Topics

Our Recent Work

Current Work

Interpretable Analysis

Model Refining

(1) Explicitly modelling inter-sentence interaction MATTERS!

(2) Summary-level optimization is rewarding

(1) Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)

(2) Extractive Summarization as Text Matching (ACL 2020)

Searching for Effective Neural Extractive Summarization: What Works and What’s Next (ACL 2019)

Interpretable Analysis: Why?

65

70

75

80

85

90

1 2 3 4 5 6 7 8 9 10 11

Change trend of scores on GLUE

38.539

39.540

40.541

41.542

Nal lapat i

et …

Narayan e

t…

Chen et…

Zhou et…

Xu et a

l .2019

Change trend of R-1 on a Text Summarization Dataset

The performances of many NLP tasks have begun to plateau

gradually flattening.

Interpretable Analysis: Why?

• Superior performance but low interpretability

• If the pros and cons are unknown, how could we:

• make suitable choices under different scenarios

• design more powerful methods

Interpretable Analysis: How?

• Explaining the prediction behavior

• Understanding the functionality of a neural component

• Revealing the darkness of pre-trained models

Methodology for Understanding NLP-oriented Models

Training-Testing environment:

• Training environment: different models are first generated with different specifications

• Testing environment : a model should be evaluated with different observed testbeds

Bring it into Text Summarization

• Training environment• Testing environment

Searching for Effective Neural Extractive Summarization: What Works and What’s Next (ACL 2019)

Bring it into Text Summarization

Bring it into Text Summarization

• Training environment

• Testing environment

Bring it into Text Summarization

Besides the “Rouge”

• Other metrics:• Positional bias• Repetition• Sentence length

• Evaluation testbeds:• Cross-domain evaluation (eight domains)• Sentence shuffling

observe from different aspects

Explicitly modelling inter-sentence interaction MATTERS!

The gap between datasets heavily influences the cross-datasets generalization

Summary-level optimization is rewarding

Takeaways

Explicitly modelling inter-sentence interaction MATTERS!

The gap between datasets heavily influences the cross-datasets generalization

Summary-level optimization is rewarding

Takeaways

Heterogeneous Graph Summarizer

Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)

words

sentences

Sequential order

Heterogeneous Graph Summarizer

Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)

Graph-structure

Heterogeneous Graph Summarizer

Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)

Graph:• Node: word, sentence, document• Edge: tf-idf

Advantages

1) Aware of overlapping information

2) Both words and sentences keep themselves updated

3) Flexibly extended• Relay node: entity• Satellite node: document

Results

Explicitly modelling inter-sentence interaction MATTERS!

The gap between datasets heavily influences the cross-datasets generalization

Summary-level optimization is rewarding

Takeaways

Matching-based Summarization

Extractive Summarization as Text Matching (ACL 2020)

Extracting sentences as a sequence labeling problem

Matching Summarization

Extractive Summarization as Text Matching (ACL 2020)

1) paradigm shift with regard to the way we build neural extractive summarization systems

3) bypasses the difficulty of summary-level optimization (e.g. RL) by contrastive learning

2) a good summary should be more semantically similar to the source document than the unqualified summaries

Optimization Principles

Experiment

Beyond a SOTA result

Theoretical Analysis

• On what types of datasets, the expected gain of summary-level approach is large over sentence-level approach?

• And how to characterize the expected gain?

Outlook

• The power of matching framework has not been fully exploited

• Build the connections with learnable evaluation metrics:• BERTScore• MoverScore• BLEURT

Thank you

top related