squad:100,000+ questions for machine comprehension of text · 2018. 4. 17. · squad:100,000+...

34
SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang Published in EMNLP 2016 Presented by Jiaming Shen April 17, 2018 1

Upload: others

Post on 29-Jul-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

SQuAD:100,000+ Questions for Machine Comprehension of Text

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang Published in EMNLP 2016

Presented by Jiaming Shen April 17, 2018

1

Page 2: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

SQuAD = Stanford Question Answering Dataset

2

Online challenge: https://rajpurkar.github.io/SQuAD-explorer/

Page 3: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Overall contribution

• A benchmark dataset with:

• Proper difficulty

• Principled curation process

• Detailed data analysis

3

Page 4: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Outline

• What are the QA datasets prior to SQuAD?

• What does SQuAD look like?

• How is SQuAD created?

• What are the properties of SQuAD?

• How well we can do on SQuAD?

4

Page 5: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

What are the QA datasets prior to SQuAD?

5

Page 6: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Related Datasets

6

Type I: Complex reading comprehension datasets

Type II: Open-domain QA datasets

Type III: Cloze datasets

Page 7: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Type I: Complex Reading Comprehension Datasets

• Require commonsense knowledge, very challenge • Dataset size too small

7

Page 8: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Type II: Open-domain QA Datasets

• Open-domain QA: answer a question from a large collection of documents.

• WikiQA: only sentence selection

• TREC-QA: free-form answer -> hard to evaluate

8

Page 9: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Type III: Cloze Datasets

9

• Automatically generated -> large scale

• Limitations are described in ACL 2016 Best Paper.

Page 10: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

What does SQuAD look like?

10

Page 11: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

SQuAD Dataset Format

A passage

One QA pair

11

Page 12: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

SQuAD Dataset Format

• One passage can have multiple question-answer pairs.

• Totally 100,000+ QA pairs from 23,215 passages.

12

Page 13: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

How is SQuAD created?

13

Page 14: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

SQuAD Dataset Collection

• Consisting three steps:

• Step1: Passage curation

• Step2: Question-answer collection

• Step3: Additional answers collection

14

Page 15: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Step 1: Passage Curation• Select top 10000 articles of English Wikipedia based on

Wikipedia’s internal PageRanks scores.

• Random sample 536 articles out of 10000 articles.

• Extract passages longer than 500 characters from all 536 articles -> 23,115 paragraphs.

• Train/dev/test datasets are split in the article level.

• Train/dev datasets are released and test dataset is holdout.

15

Page 16: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Step 2: Question-Answer Collection

• Using crowdsourcing technique

• Crowd-workers with 97% HIT acceptance rate, larger than 1000 HITs, and located in USA/Canada.

• Spend 4 minutes on each paragraph and asking up to 5 questions with answer highlighted in the text.

16

Page 17: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Step 2: Question-Answer Collection

17

Page 18: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Step 3: Additional Answers Collection

• For each question in dev/test datasets, get at least two additional answers.

• Why we do this?

• Make evaluation more robust.

• Assess human performance.

18

Page 19: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

What are the properties of SQuAD?

19

Page 20: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Data Analysis

• Diversity in answers

• Reasoning for answering questions

• Syntactic divergence

20

Page 21: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Diversity in Answers• 67.4% non-entity answers, and many answers are not

even noun -> Can be challenging.

21

Page 22: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Reasoning for answering questions

22

Page 23: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Syntactic divergence• Syntactic divergence is the minimum edit distance

(belong all unlexicalized dependency path) over all possible anchors (word-lemma pairs).

23

Page 24: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Syntactic divergence• Histogram of syntactic divergence

24

Page 25: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

How well we can do on SQuAD?

25

Page 26: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

“Baseline” method• Candidate answer generation: use constituency parser. • Feature extraction • Train a Logistic Regression Model

26

Page 27: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

27

Help to pick the correct sentence

Resolve lexical variations

Resolve syntactic variations

Page 28: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Evaluation• After ignoring punctuations and articles, using the

following two metrics:

• Exact Match

• Macro-averaged F1 score: maximum F1 over all of the ground truth answers

28

Page 29: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Experiment results

29

• Overall results

For SQuAD v1.1 Test: EM: 82.304 F1: 91.221

Page 30: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Experiment results

30

• Performance stratified by answer type

Page 31: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Experiment results

31

• Performance stratified by syntactic divergence

Page 32: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Experiment results

32

• Performance with feature ablations

Page 33: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Summary

• SQuAD is a machine reading style QA dataset.

• SQuAD consists of 100,000+ QA pairs.

• SQuAD is constructed based on crowdsourcing.

• SQuAD drives the field forward.

33

Page 34: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin

Thanks Q & A

34