modern history for text summarization and beyondpfliu.com/talk/summarization.pdfsummarization and...
Post on 28-May-2020
6 Views
Preview:
TRANSCRIPT
Modern History for Text Summarization and Beyond
Pengfei LiuPostdoc at LTI of CMUpfliu.com
Outline
Overview of Modern History
Some Highlighted Topics
Our Recent Work
Outline
Overview of Modern History
Some Highlighted Topics
Our Recent Work
Preparation: Research Papers
Summarization Papers:
• Year: 2013-Now
• Conference: ACL / EMNLP / NAACL / ICML / ICLR / AAAI / IJCAI / NeurIPS
Preparation: Research Concepts
rich of task settings!
Holistic Analysis
0
5
10
15
20
25
2013 2014 2015 2016 2017 2018 2019
# Pa
persACL EMNLP NAACL
2013-2015: increasing!
2015-2016: trough!
2016-2019: rapid increasing
Fine-grained Analysis: Overview (2013-19)
2013-2014 Task-setting Generation way
1) No NNs!2) First RL Paper3) Multimodal
Top10 Cited Papers
2015 Neural Arch.Task-setting Generation way
1) Finally, NNs arrived!
2) New dataset for NN!
Top10 Cited Papers
2016 ArchitectureTask-setting Generation way
1) Relatively more NNs2) Few papers
3) Important Techniques!
2016 ArchitectureTask-setting Generation way
1) Relatively more NNs2) Few papers
3) Important Techniques!0
10
20
2013 2014 2015 2016 2017 2018 2019
# Pa
pers
ACL EMNLP NAACL
Top10 Cited Papers
CNNDM
Copy
Coverage
Copy
Neural Ext
Neural Arch.2017
2) VAE arrived!3) More NNs but not booming … lag behind …
1) GNNs arrived!
0
10
20
2013 2014 2015 2016 2017 2018 2019
# Pa
pers
ACL EMNLP NAACL
Top10 Cited Papers
2018: booming!
1) Booming of NNs2) Booming of RL!3) Booming of Abs!
Top10 Cited Papers
2019
1) Booming of Pre-training2) Booming of GNNs3) Booming of Evaluation
Top10 Cited Papers
Lessons from History
• The development of deep NNs lags behind other tasks
• Summarization tasks requires some customized techniques (e.g. copy)
• Only technique-ready is not enough … dataset also matters!
• A good match between “techniques” and “datasets”
Outline
Overview of Modern History
Some Highlighted Topics
Our Recent Work
Outline
Overview of Modern History
Some Highlighted Topics
Our Recent Work
FactualitySemantic equivalencePre-trained models
Evaluation Metric: Factuality
• Event factuality prediction• the degree to which an event mentioned in a sentence has happened
• Social Media• fake news detection
• Dialog• consistency
• Machine Translation• semantic divergence
• Summarization
Factuality in Text Summarization
• Motivation
Whether facts in the generated summaries can be covered
• GoalGiven a document , and generated summary: the purpose is to learn a function fact checker:
d 1 2g { , , , }
ms s s L
Fact(d,g)
Born in Honolulu, Hawaii, Obama is a US Citizen
Obama is American
Document Sentence?
A Case Study on “FactCC” (Kryscinski et al.)
cnndm xsum arxiv pubmed bigpatent_b tifu_long0.413981 0.198641 0.438764 0.475539 0.637429 0.413868
Setting:
• Evaluating “FactCC” on (document, references) (Ideally, 100%)• Not good at predicting positive pairs.
A Case Study on “FactCC” (Kryscinski et al.) Setting:
• Cross-dataset• Extractive model: higher factuality score (but not 100%)• Abstractive model: using a in-domain training set is not a
guarantee.
Factuality in Text Summarization
• Challenges
• How to define a fact? (sentence? triple?)
• How to evaluate the effectiveness of your proposed
factuality checker?
• Source documents are too long!!!
• Negative predictive power
More Recent Progress …
• Asking and Answering Questions to Evaluate the Factual Consistency of Summaries (Wang et al. 2020, ACL)
More Recent Progress …
• Asking and Answering Questions to Evaluate the Factual Consistency of Summaries (Wang et al. 2020, ACL)
Evaluation Metric: Semantic equivalence
• Growing Trends:
• BLEURT: Learning Robust Metrics for Text Generation (Sellam et al. 2020, ACL)
• BERTScore: Evaluating Text Generation with BERT(Zhang et al. 2019)
• MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance (Zhao et al. 2019)
• Pre-trained (+ fine-tuned) • Semantic Matching
• Learnable
• Examples:
• Fine-grained
Growing Trend: Fine-grained Evaluation
• Fine-grained Meta-evaluation
• Evaluation metrics behave similarly (average-scoring range) strongly disagree in the higher-scoring range (Peyrard et al.2019)
• Calculating correlation in different top-K systems (Ma et al.2019)
Unsupervised Pre-trained Models
Extractive
Abstractive
Summarization
NLU (BERT-family)
NLG (BART)
Pre-trained
Unsupervised Pre-trained Models• Extractive
• Extractive -> Abstractive
• Exploring Task-specific Loss
Outline
Overview of Modern History
Some Highlighted Topics
Our Recent Work
Current Work
Interpretable Analysis
Model Refining
(1) Explicitly modelling inter-sentence interaction MATTERS!
(2) Summary-level optimization is rewarding
(1) Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)
(2) Extractive Summarization as Text Matching (ACL 2020)
Searching for Effective Neural Extractive Summarization: What Works and What’s Next (ACL 2019)
Interpretable Analysis: Why?
65
70
75
80
85
90
1 2 3 4 5 6 7 8 9 10 11
Change trend of scores on GLUE
38.539
39.540
40.541
41.542
Nal lapat i
et …
Narayan e
t…
Chen et…
Zhou et…
Xu et a
l .2019
Change trend of R-1 on a Text Summarization Dataset
The performances of many NLP tasks have begun to plateau
gradually flattening.
Interpretable Analysis: Why?
• Superior performance but low interpretability
• If the pros and cons are unknown, how could we:
• make suitable choices under different scenarios
• design more powerful methods
Interpretable Analysis: How?
• Explaining the prediction behavior
• Understanding the functionality of a neural component
• Revealing the darkness of pre-trained models
Methodology for Understanding NLP-oriented Models
Training-Testing environment:
• Training environment: different models are first generated with different specifications
• Testing environment : a model should be evaluated with different observed testbeds
Bring it into Text Summarization
• Training environment• Testing environment
Searching for Effective Neural Extractive Summarization: What Works and What’s Next (ACL 2019)
Bring it into Text Summarization
Bring it into Text Summarization
• Training environment
• Testing environment
Bring it into Text Summarization
Besides the “Rouge”
• Other metrics:• Positional bias• Repetition• Sentence length
• Evaluation testbeds:• Cross-domain evaluation (eight domains)• Sentence shuffling
observe from different aspects
Explicitly modelling inter-sentence interaction MATTERS!
The gap between datasets heavily influences the cross-datasets generalization
Summary-level optimization is rewarding
Takeaways
Explicitly modelling inter-sentence interaction MATTERS!
The gap between datasets heavily influences the cross-datasets generalization
Summary-level optimization is rewarding
Takeaways
Heterogeneous Graph Summarizer
Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)
words
sentences
Sequential order
Heterogeneous Graph Summarizer
Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)
Graph-structure
Heterogeneous Graph Summarizer
Heterogeneous Graph Neural Networks for ExtractiveDocument Summarization (ACL 2020)
Graph:• Node: word, sentence, document• Edge: tf-idf
Advantages
1) Aware of overlapping information
2) Both words and sentences keep themselves updated
3) Flexibly extended• Relay node: entity• Satellite node: document
Results
Explicitly modelling inter-sentence interaction MATTERS!
The gap between datasets heavily influences the cross-datasets generalization
Summary-level optimization is rewarding
Takeaways
Matching-based Summarization
Extractive Summarization as Text Matching (ACL 2020)
Extracting sentences as a sequence labeling problem
Matching Summarization
Extractive Summarization as Text Matching (ACL 2020)
1) paradigm shift with regard to the way we build neural extractive summarization systems
3) bypasses the difficulty of summary-level optimization (e.g. RL) by contrastive learning
2) a good summary should be more semantically similar to the source document than the unqualified summaries
Optimization Principles
Experiment
Beyond a SOTA result
Theoretical Analysis
• On what types of datasets, the expected gain of summary-level approach is large over sentence-level approach?
• And how to characterize the expected gain?
Outlook
• The power of matching framework has not been fully exploited
• Build the connections with learnable evaluation metrics:• BERTScore• MoverScore• BLEURT
Thank you
top related