large scale qa-srl parsing › anthology › attachments › p18... · identify verbs with pos +...

91
Large Scale QA-SRL Parsing Nicholas FitzGerald, Julian Michael, Luheng He, and Luke Zettlemoyer ACL 2018 http://qasrl.org/

Upload: others

Post on 23-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Large Scale QA-SRL ParsingNicholas FitzGerald, Julian Michael, Luheng He, and Luke Zettlemoyer

ACL 2018

http://qasrl.org/

Page 2: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

John surreptitiously ate the burrito at 2am.

Subject Manner Verb Object Time

Semantic Role Labelling

Page 3: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

John surreptitiously ate the burrito at 2am.

Subject Manner Verb Object Time

Semantic Role Labelling

• Applied to improve state-of-the-art in NLP tasks such as Question Answering [Shen 2007] and Machine Translation [Liu and Gildea, 2010]

Page 4: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

John surreptitiously ate the burrito at 2am.

Subject Manner Verb Object Time

Semantic Role Labelling

• Applied to improve state-of-the-art in NLP tasks such as Question Answering [Shen 2007] and Machine Translation [Liu and Gildea, 2010]

• Commonly used interface to facilitate Data Exploration and Information Extraction [Stanovsky et al 2018] [Chiticariu et al. 2018]

Page 5: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

John surreptitiously ate the burrito at 2am.

Subject Manner Verb Object Time

Semantic Role Labelling

• Applied to improve state-of-the-art in NLP tasks such as Question Answering [Shen 2007] and Machine Translation [Liu and Gildea, 2010]

• Commonly used interface to facilitate Data Exploration and Information Extraction [Stanovsky et al 2018] [Chiticariu et al. 2018]

• Considerable interest in general-purpose SRL parsers

Page 6: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL

John surreptitiously ate the burrito at 2am.

Who ate s

omething?

How was

someth

ing eaten

?

What was

eaten

?

When w

as so

mething ea

ten?

Subject Manner Verb Object Time

[He et al. 2015]

Page 7: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL

John surreptitiously ate the burrito at 2am.

Who ate s

omething?

How was

someth

ing eaten

?

What was

eaten

?

When w

as so

mething ea

ten?

Subject Manner Verb Object Time

[He et al. 2015]

QA-SRL 1.0 • Small dataset • Trained annotators • Only explored sub-problems

Page 8: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Goal

A high-quality, large-scale parser for QA-SRL

Page 9: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

When was something published? In 1950

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

Where was something published? in Mind

proposed

When did someone propose something? In 1950

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Page 10: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Challenges1. Scale up QA-SRL data annotation

Page 11: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

1. Scale up QA-SRL data annotation

Challenges

Page 12: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

1. Scale up QA-SRL data annotation

Challenges

75k sentence dataset in 9 days

Page 13: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Challenges1. Scale up QA-SRL data annotation

2. Train a QA-SRL Parser

Produced

What produced something? A much larger super eruption

Where did something produce something? in Colorado

What did something produce? over 5,000 cubic kilometers of material

A much larger super eruption in Colorado produced over 5,000

cubic kilometers of material.

appeared

Where didn’t someone appear to do something? In the video

Who didn’t appear to do something? the perpetrators

When did someone appear? never

What didn’t someone appear to do?look at the camera

to look at the camera

look

Where didn't someone look at something? In the video

Who didn’t look? the perpetrators

What didn’t someone look at? the camera

In the video, the perpetrators never appeared to look

at the camera.

met

Who met someone?Some of the vegetarians

vegetarians

Who met? he

What did someone meet? members of the Theosophical Society

founded

What had been founded?members of the Theosophical Society

the Theosophical Society

When was something founded?in 1875

1875

Why has something been founded? to further universal brotherhood

devotedWhat was devoted to something? members of the Theosophical Society

What was something devoted to? the study of Buddhist and Hindu literature

Some of the vegetarians he met were members

of the Theosophical Society, which had been founded in 1875 to

further universal brotherhood, and which

was devoted to the study of Buddhist and

Hindu literature.

Page 14: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Challenges1. Scale up QA-SRL data annotation

2. Train a QA-SRL Parser

3. Improve Recall

Overgenerate Validate

+11% data+ 2% Fscore

Page 15: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

When was something published? In 1950

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

Where was something published? in Mind

proposed

When did someone propose something? In 1950

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Page 16: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Large-scale QA-SRL Parsing

1. Scale up QA-SRL data annotation

2. Train a QA-SRL Parser

3. Improve Recall

Page 17: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Easier AnnotationUCCA

[Abend and Rapaport 2013]~6k sentences 4 Trained Annotators

Semantic Proto-roles[Reisinger et al. 2015]

~7k sentences MTurk

Groningen Meaning Bank[Basile et al. 2012]

~40k sentencesTrained annotators/

GWAP

QASRL 1.0[He et al. 2015]

~3k sentences Trained annotators

QA-SRL 2.0 75k sentences MTurk

Page 18: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL

Questions:

Wh Aux Subj Verb Obj Prep Obj2WhoWhatWhereWhenWhyHow

∅did

didn’tmightwill…

∅someonesomething

∅stempast

past participlepresent

∅someonesomething

∅ontoby

from…

∅someonesomething

Page 19: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL

Questions:

John surreptitiously ate the burrito at 2am.

Wh Aux Subj Verb Obj Prep Obj2WhoWhatWhereWhenWhyHow

∅did

didn’tmightwill…

∅someonesomething

∅stempast

past participlepresent

∅someonesomething

∅ontoby

from…

∅someonesomething

Page 20: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Wh Aux Subj Verb Obj Prep Obj2WhoWhatWhereWhenWhyHow

∅did

didn’tmightwill…

∅someonesomething

∅stempast

past participlepresent

∅someonesomething

∅ontoby

from…

∅someonesomething

QA-SRL

Questions:

John surreptitiously ate the burrito at 2am.

Who ate something?

Page 21: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Wh Aux Subj Verb Obj Prep Obj2WhoWhatWhereWhenWhyHow

∅did

didn’tmightwill…

∅someonesomething

∅stempast

past participlepresent

∅someonesomething

∅ontoby

from…

∅someonesomething

QA-SRL

Questions:

John surreptitiously ate the burrito at 2am.

Who ate something?

What did someone eat?

Page 22: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL

Questions:

John surreptitiously ate the burrito at 2am.

Who ate something?

What did someone eat?

Page 23: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

John surreptitiously ate the burrito at 2am.

QA-SRL

Questions:

Who ate something?

What did someone eat?

Answers:

John

Page 24: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

John surreptitiously ate the burrito at 2am.

QA-SRL

Questions:

Who ate something?

What did someone eat?

Answers:

John

the burrito

Page 25: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Annotation Pipelinex John surreptitiously ate the burrito at 2am

Predicate detection Identify verbs with POS + heuristics

Page 26: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Annotation Pipelinex John surreptitiously ate the burrito at 2am

Question annotation

Predicate detection Identify verbs with POS + heuristics

One worker writes as many QA-SRL questions as possible,and provides the answer

Page 27: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Annotation Pipelinex John surreptitiously ate the burrito at 2am

Validation

Question annotation

Predicate detection Identify verbs with POS + heuristics

One worker writes as many QA-SRL questions as possible,and provides the answer

2 workers are shows questions, provide answers or mark as invalid

Page 28: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Annotation• Efficiency

• Recall

Page 29: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Annotation• Efficiency

• Autocomplete

Page 30: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Annotation• Efficiency

• Autocomplete

• Recall

• Autosuggest

Page 31: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Annotation• Efficiency

• Autocomplete

• Recall

• Autosuggest

• Financial Incentives

Page 32: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Validation Interface

Page 33: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Dataset

• 1 annotator provides questions

• 2 annotators validate -> 3 spans / question

• Question invalid if any annotator marks invalid

• Additional 3 validators for small dense dev and test set

Page 34: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Dataset

[He et al 2015] This work

3000 sentences 75k sentences

Several weeks 9 days

~50c / verb 33c / verb

2.43 questions / verb 2.05 questions / verb

Page 35: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

When was something published? In 1950

Where was something published? in Mind

proposed

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

When did someone propose something? In 1950

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Page 36: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

When was something published? In 1950

Where was something published? in Mind

proposed

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

When did someone propose something? In 1950

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Page 37: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

When was something published? In 1950

Where was something published? in Mind

proposed

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

When did someone propose something? In 1950

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Page 38: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Large-scale QA-SRL Parsing

1. Scale up QA-SRL data annotation

2. Train a QA-SRL Parser

3. Improve Recall

Page 39: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parser

[He et al 2018]

[He et al 2017]

Page 40: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parser

[He et al 2018]

[He et al 2017]

Page 41: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parser

• Unlabeled Argument Detection

• Question generation

Page 42: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Page 43: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Predicate detection 0 0 1 0 0 0 0

Page 44: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Argument detection “John” “surreptitiously” “the burrito” “at 2pm”

Predicate detection 0 0 1 0 0 0 0

Page 45: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Argument detection “John” “surreptitiously” “the burrito” “at 2pm”

Question generation “Who ate something?” “How did someone

eat something?”“What did someone

eat?”“When did someone

eat something?”

Predicate detection 0 0 1 0 0 0 0

Page 46: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Argument detection “John” “surreptitiously” “the burrito” “at 2pm”

Question generation “Who ate something?” “How did someone

eat something?”“What did someone

eat?”“When did someone

eat something?”

Predicate detection 0 0 1 0 0 0 0Automatic Heuristics

(same as data)

Page 47: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Argument detection “John” “surreptitiously” “the burrito” “at 2pm”

Question generation “Who ate something?” “How did someone

eat something?”“What did someone

eat?”“When did someone

eat something?”

Predicate detection 0 0 1 0 0 0 0Automatic Heuristics

(same as data)

1. BIO Model 2. Span-based Model

Page 48: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

QA-SRL Parsingx John surreptitiously ate the burrito at 2pm

Argument detection “John” “surreptitiously” “the burrito” “at 2pm”

Question generation “Who ate something?” “How did someone

eat something?”“What did someone

eat?”“When did someone

eat something?”

Predicate detection 0 0 1 0 0 0 0Automatic Heuristics

(same as data)

1. BIO Model 2. Span-based Model

1. Local 2. Sequential

Page 49: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - BIO Model

John surreptitiously ate the burrito at 2pm

0 0 1 0 0 0 0

• Alternating Bi-LSTM with Highway Connections and Recurrent Dropout [He et al 2017]

• Input includes predicate indicator

Page 50: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - BIO Model

John surreptitiously ate the burrito at 2pm

B B O B I B I

MLP+Softmax

0 0 1 0 0 0 0

Page 51: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - BIO Model

John surreptitiously ate the burrito at 2pm

B B O B I B I

MLP+Softmax

0 0 1 0 0 0 0

“John” “surreptitiously” “the burrito” “at 2pm”

Page 52: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - Span Model

John surreptitiously ate the burrito at 2pm

0 0 1 0 0 0 0

Form a representation of every possible span

Page 53: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - Span Model

John surreptitiously ate the burrito at 2pm

“John surreptitiously”

0 0 1 0 0 0 0

Form a representation of every possible span

Page 54: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - Span Model

John surreptitiously ate the burrito at 2pm

“John” “John surreptitiously”

0 0 1 0 0 0 0

Form a representation of every possible span

Page 55: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - Span Model

John surreptitiously ate the burrito at 2pm

“John” “John surreptitiously” “the burrito”“surreptitiouslyate the”

“the burrito at” “at 2pm”

0 0 1 0 0 0 0

Form a representation of every possible span

Page 56: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - Span Model

John surreptitiously ate the burrito at 2pm

“John” “John surreptitiously” “the burrito”“surreptitiouslyate the”

“the burrito at” “at 2pm”

0.9 0.1 0.2 0.8 0.1 0.75MLP+

sigmoid

0 0 1 0 0 0 0

Page 57: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection - Span Model

John surreptitiously ate the burrito at 2pm

“John” “John surreptitiously” “the burrito”“surreptitiouslyate the”

“the burrito at” “at 2pm”

0.9 0.1 0.2 0.8 0.1 0.75MLP+

sigmoid

0 0 1 0 0 0 0

Tunable Threshold

Page 58: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument Detection

• 4 layer Alternating Bi-LSTM with Highway Connections and Recurrent Dropout [He et al 2017]

• Trained to maximize log-likelihood

Page 59: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Argument DetectionSp

an F

-sco

re

70

75

80

85

90

Exact Match IOU >= 0.5

88.1

82.2

85.8

81.383.1

72.2

BIOSpan (t=0.5)Span (t=t*)

Page 60: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question GenerationWh Aux Subj Verb Obj Prep Obj2WhoWhatWhereWhenWhyHow

∅did

didn’tmightwill…

∅someonesomething

∅stempast

past participlepresent

∅someonesomething

∅ontoby

from…

∅someonesomething

Page 61: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation

• Two models:

• Local slot prediction

• Sequential Model (LSTM)

Wh Aux Subj Verb Obj Prep Obj2WhoWhatWhereWhenWhyHow

∅did

didn’tmightwill…

∅someonesomething

∅stempast

past participlepresent

∅someonesomething

∅ontoby

from…

∅someonesomething

Page 62: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - Local

John surreptitiously ate the burrito at 2pm

“John” “surreptitiously” “the burrito” “at 2pm”

0 0 1 0 0 0 0

Page 63: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - Local

Wh Aux Subj Verb Obj1 Prep Obj2

“the burrito”

WhoWhat

Where…

“What”

Page 64: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - Local

Wh Aux Subj Verb Obj1 Prep Obj2

“the burrito”

isdid

might…

“What” “did”

Page 65: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - Local

Wh Aux Subj Verb Obj1 Prep Obj2

“the burrito”

“What” “did” “someone” “past-tense” Ø Ø Ø

Page 66: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - SequentialWh Aux Subj Verb Obj1 Prep Obj2

“the burrito”

“What”

Page 67: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - SequentialWh Aux Subj Verb Obj1 Prep Obj2

“the burrito”

“What”

“What”

“did”

Page 68: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation - SequentialWh Aux Subj Verb Obj1 Prep Obj2

“the burrito”

“What”

“What”

“did”

“did” “someone” “past-tense” “Ø” “Ø”

“someone” “past-tense” “Ø” “Ø” “Ø”

Page 69: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation

• 4 layer Alternating Bi-LSTM with Highway Connections and Recurrent Dropout [He et al 2017]

• Trained to maximize log-likelihood

Page 70: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Evaluation Questions

• Paraphrasing

“Who ate something? “Who was something eaten by?

Page 71: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Evaluation Questions

“Who ate something? “Who was something eaten by?

• Paraphrasing

• Given gold span:

• Slot-level accuracy

• Exact Match (full question)

Page 72: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Question Generation

30

45

60

75

90

Slot Accuracy Exact Match

47.2

82.9

44.2

83.2LocalSequential

Page 73: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Full Parsing Accuracy

39

40

41

42

43

44

Exact match f-score (Span & Question)

42.4

40.6

Span + LocalSpan + Sequential (t=0.5)

Page 74: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Large-scale QA-SRL Parsing

1. Scale up QA-SRL data annotation

2. Train a QA-SRL Parser

3. Improve Recall

Page 75: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

When was something published? In 1950

Where was something published? in Mind

proposed

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

When did someone propose something? In 1950

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Page 76: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

When was something published? In 1950

Where was something published? in Mind

proposed

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

When did someone propose something? In 1950

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Impacts training and evaluation

Page 77: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

In 1950 Alan M. Turing published "Computing machinery and intelligence" in Mind, in which he proposed that machines could be

tested for intelligence using questions and answers.

published

Who published something? Alan M. Turing

What was published? “Computing Machinery and Intelligence”

When was something published? In 1950

Where was something published? in Mind

proposed

Who proposed something? Alan M. Turing

What did someone propose?that machines could be

tested for intelligent using questions and answers

When did someone propose something? In 1950

What did someone propose something in? “Computing Machinery and Intelligence”

testedWhat can be tested? machines

What can something be tested for? intelligence

How can something be tested? using questions and answers

usingWhat was being used? questions and answers

Why was something being used? tested for intelligence

Fill in Gaps

Page 78: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Annotation Pipelinex John surreptitiously ate the burrito at 2pm

Validation

Question generation

Predicate detection Identify verbs with POS + heuristics

One worker writes as many QA-SRL questions as possible,and provides the answer

Workers are shows questions, provide answers or mark as invalid

Page 79: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Annotation Pipelinex John surreptitiously ate the burrito at 2pm

Validation

Question generation

Predicate detection Identify verbs with POS + heuristics

One worker writes as many QA-SRL questions as possible,and provides the answer

Workers are shows questions, provide answers or mark as invalid

QA-SRL Parser

Page 80: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Data Expansion

• Overgenerate questions with low span threshold

• Validate

• +46,715 valid QA-pairs (~11% after filtering)

Page 81: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Data Expansion

30

45

60

75

90

Span Detection (F-score) Question Generation (Exact Match) Full Parsing (Exact Match)

49.150.8

84.6

47.250.5

83.7OriginalExpanded

Page 82: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Evaluation• Exact Match for Question Generation is overly harsh

• Paraphrasing

• Missing Questions

• Penalizes correct predictions missing from data

“Who ate something? “Who was something eaten by?

Page 83: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Human Evaluation

• Validate model predictions with 6 annotators

• Generated Question valid if 5 out of 6 annotators provided answers

• Predicted span correct if exactly matches any annotator’s answer

Page 84: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Human Eval - QuestionsSpanSeq + Expand

SpanSeq

SpanLocal

Decreasing t

Page 85: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Human Eval - Questions

82.64

SpanSeq + Expand

SpanSeq

SpanLocal

Decreasing t

Page 86: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Human Eval - Arguments

77.71

SpanSeq + Expand

SpanSeq

SpanLocal

Page 87: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

metWho met someone? Some of the vegetarians

Who met? heWhat did someone meet? members of the Theosophical Society

foundedWhat had been founded?

members of the Theosophical Societythe Theosophical Society

When was something founded? in 1875Why has something been founded? to further universal brotherhood

devotedWhat was devoted to something? members of the Theosophical SocietyWhat was something devoted to? the study of Buddhist and Hindu literature

Example OutputSome of the vegetarians he met were members of the Theosophical Society, which had been founded in 1875 to further universal brotherhood, and which

was devoted to the study of Buddhist and Hindu literature.

Page 88: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Conclusion

• Large crowdsourced dataset of QA-SRL annotations

• High quality QA-SRL Parser

• Techniques for data expansion

Page 89: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Code and Data available at: http://qasrl.org

Page 90: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Code and Data available at: http://demo.qasrl.org

Page 91: Large Scale QA-SRL Parsing › anthology › attachments › P18... · Identify verbs with POS + heuristics. Annotation Pipeline ... • Question invalid if any annotator marks invalid

Thank You!

http://qasrl.org/