squad:100,000+ questions for machine comprehension of text · 2018. 4. 17. · squad:100,000+...
TRANSCRIPT
![Page 1: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/1.jpg)
SQuAD:100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang Published in EMNLP 2016
Presented by Jiaming Shen April 17, 2018
1
![Page 2: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/2.jpg)
SQuAD = Stanford Question Answering Dataset
2
Online challenge: https://rajpurkar.github.io/SQuAD-explorer/
![Page 3: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/3.jpg)
Overall contribution
• A benchmark dataset with:
• Proper difficulty
• Principled curation process
• Detailed data analysis
3
![Page 4: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/4.jpg)
Outline
• What are the QA datasets prior to SQuAD?
• What does SQuAD look like?
• How is SQuAD created?
• What are the properties of SQuAD?
• How well we can do on SQuAD?
4
![Page 5: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/5.jpg)
What are the QA datasets prior to SQuAD?
5
![Page 6: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/6.jpg)
Related Datasets
6
Type I: Complex reading comprehension datasets
Type II: Open-domain QA datasets
Type III: Cloze datasets
![Page 7: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/7.jpg)
Type I: Complex Reading Comprehension Datasets
• Require commonsense knowledge, very challenge • Dataset size too small
7
![Page 8: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/8.jpg)
Type II: Open-domain QA Datasets
• Open-domain QA: answer a question from a large collection of documents.
• WikiQA: only sentence selection
• TREC-QA: free-form answer -> hard to evaluate
8
![Page 9: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/9.jpg)
Type III: Cloze Datasets
9
• Automatically generated -> large scale
• Limitations are described in ACL 2016 Best Paper.
![Page 10: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/10.jpg)
What does SQuAD look like?
10
![Page 11: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/11.jpg)
SQuAD Dataset Format
A passage
One QA pair
11
![Page 12: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/12.jpg)
SQuAD Dataset Format
• One passage can have multiple question-answer pairs.
• Totally 100,000+ QA pairs from 23,215 passages.
12
![Page 13: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/13.jpg)
How is SQuAD created?
13
![Page 14: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/14.jpg)
SQuAD Dataset Collection
• Consisting three steps:
• Step1: Passage curation
• Step2: Question-answer collection
• Step3: Additional answers collection
14
![Page 15: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/15.jpg)
Step 1: Passage Curation• Select top 10000 articles of English Wikipedia based on
Wikipedia’s internal PageRanks scores.
• Random sample 536 articles out of 10000 articles.
• Extract passages longer than 500 characters from all 536 articles -> 23,115 paragraphs.
• Train/dev/test datasets are split in the article level.
• Train/dev datasets are released and test dataset is holdout.
15
![Page 16: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/16.jpg)
Step 2: Question-Answer Collection
• Using crowdsourcing technique
• Crowd-workers with 97% HIT acceptance rate, larger than 1000 HITs, and located in USA/Canada.
• Spend 4 minutes on each paragraph and asking up to 5 questions with answer highlighted in the text.
16
![Page 17: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/17.jpg)
Step 2: Question-Answer Collection
17
![Page 18: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/18.jpg)
Step 3: Additional Answers Collection
• For each question in dev/test datasets, get at least two additional answers.
• Why we do this?
• Make evaluation more robust.
• Assess human performance.
18
![Page 19: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/19.jpg)
What are the properties of SQuAD?
19
![Page 20: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/20.jpg)
Data Analysis
• Diversity in answers
• Reasoning for answering questions
• Syntactic divergence
20
![Page 21: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/21.jpg)
Diversity in Answers• 67.4% non-entity answers, and many answers are not
even noun -> Can be challenging.
21
![Page 22: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/22.jpg)
Reasoning for answering questions
22
![Page 23: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/23.jpg)
Syntactic divergence• Syntactic divergence is the minimum edit distance
(belong all unlexicalized dependency path) over all possible anchors (word-lemma pairs).
23
![Page 24: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/24.jpg)
Syntactic divergence• Histogram of syntactic divergence
24
![Page 25: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/25.jpg)
How well we can do on SQuAD?
25
![Page 26: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/26.jpg)
“Baseline” method• Candidate answer generation: use constituency parser. • Feature extraction • Train a Logistic Regression Model
26
![Page 27: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/27.jpg)
27
Help to pick the correct sentence
Resolve lexical variations
Resolve syntactic variations
![Page 28: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/28.jpg)
Evaluation• After ignoring punctuations and articles, using the
following two metrics:
• Exact Match
• Macro-averaged F1 score: maximum F1 over all of the ground truth answers
28
![Page 29: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/29.jpg)
Experiment results
29
• Overall results
For SQuAD v1.1 Test: EM: 82.304 F1: 91.221
![Page 30: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/30.jpg)
Experiment results
30
• Performance stratified by answer type
![Page 31: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/31.jpg)
Experiment results
31
• Performance stratified by syntactic divergence
![Page 32: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/32.jpg)
Experiment results
32
• Performance with feature ablations
![Page 33: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/33.jpg)
Summary
• SQuAD is a machine reading style QA dataset.
• SQuAD consists of 100,000+ QA pairs.
• SQuAD is constructed based on crowdsourcing.
• SQuAD drives the field forward.
33
![Page 34: SQuAD:100,000+ Questions for Machine Comprehension of Text · 2018. 4. 17. · SQuAD:100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar, Jian Zhang, Konstantin](https://reader035.vdocument.in/reader035/viewer/2022071514/6135ac910ad5d206764786ec/html5/thumbnails/34.jpg)
Thanks Q & A
34