resources: question classification schemes, graesser et al. automatic factual question generation...
Post on 15-Dec-2015
214 Views
Preview:
TRANSCRIPT
Question Generation (QG) from Text
Resources: Question Classification Schemes, Graesser et al. Automatic Factual Question Generation from Text (Chapter 3),
Michael Heilman
Facts-based Questions
Questions test factual knowledge of a learner When did Alexander invade India? Who invented small pox vaccine?
Does not involve higher order cognitive skills like inference
Question Generation Framework
Overgenerate-and-rank framework
CMU Question Generator: http://www.ark.cs.cmu.edu/mheilman/questions/
Definitions
Source sentence: sentence taken directly from the input document
Derived sentence: declarative sentence derived in stage 1
Answer phrase: possible answer to generated questions
Question phrase: phrase containing the question word replacing an answer phrase
Basic Tools
Mark clauses or phrases for NLP transformation (simplification,
compression) Answer phrase marking Tregex
Delete clauses or phases for NLP transformation Tsurgeon
Resources: Tregex and Tsurgeon: tools for querying and manipulating tree data structures, Levy and AndrewWeb: http://nlp.stanford.edu/software/tregex.shtml
NN
NP
NN
NP
NN
NN
NP
NN
NP
NN
NP
What is Tregex?
A java program for identifying patterns in trees Like regular expressions for strings Simple example: NP < NN
NN
NP
filterscigaretteitsincroco-dilite
usingstoppedfirmThe
PRP
IN
PPVBG
VPVBDDT
VP
S
NN
NN
NP
tregex.sh “NP < NN” treeFilename
Syntax (Node Descriptions)
The basic units of Tregex are Node Descriptions
Descriptions match node labels of a tree Literal string to match: NP
▪ Disjunction of literal strings separated by ‘|’: NP|PP|VP
Regular Expression (Java 5 regex): /NN.?/▪ Matches NN, NNP, NNS
Wildcard symbol: __ (two underscores)▪ Matches any node
Descriptions can be negated with !: !NP
Syntax (Relations)
Relationships between tree nodes can be specified
There are many different relations. Here are a few:
Symbol Description Symbol Description
A < B A is the parent of B A << B A is an ancestor of B
A $ B A and B are sisters A $+ B B is next sister of A
A <i B B is ith child of A A <: B B is only child of A
A <<# BB is a head of phrase A
A <<- B B is rightmost descendent
A .. B A precedes B in depth-first traversal of tree
http://nlp.stanford.edu/manning/courses/ling289/Tregex.html
Building complex expressions Relations can be strung together for
“and” All relations are relative to first node in
string NP < NN $ VP
▪ “An NP over an NN and with sister VP” & symbol is optional: NP < NN & $ VP
Nodes can be grouped with parentheses NP < (NN < dog)
▪ “An NP over an NN that is over ‘dog’ ” Not the same as NP < NN < dog
Building complex expressions Ex: NP < (NN < dog) $ (VP <<#
(barks > VBZ)) “An NP both over an NN over ‘dog’ and
with a sister VP headed by ‘barks’ under VBZ”
X
NP
VP
NN
dog
VBZ
barks
Other Operators on Relations Operators can be combined via “or” with |
Ex: NP < NN | < NNS “An NP over NN or over NNS”
By default, & takes precedence over | Ex: NP < NNS | < NN & $ VP “NP over NNS OR both over NN and w/ sister
VP” Equivalent operators are left-associative
Any relation can be negated with “!” prefix Ex: NP !<< NNP “An NP that does not dominate NNP”
Grouping relations
To specify operation order, use [ and ] Ex: NP [ < NNS | < NN ] $ VP “An NP either over NNS or NN, and w/ sister VP”
Grouped relations can be negated Just put ! before the [
Already we can build very complex expressions! NP <- /NN.?/ > (PP <<# (IN ![ < of | < on])) “An NP with rightmost child matching /NN.?/
under a PP headed by some preposition (IN) that is not either ‘of’ or ‘on’ ”
A Complex Expression
“An NP with rightmost child matching /NN.?/ under a PP headed by some preposition (IN) that is not either ‘of’ or ‘on’ ”
NP <- /NN.?/ > (PP <<# (IN ![ < of | < on]))
PP
IN NP
NNSabout
Named Nodes
Sometimes we want to find which nodes matched particular sub-expressions Ex: /NN.?/ $- JJ|DT What was the modifier that preceded the noun?
Name nodes with = and if expression matches, we can retrieve matching sub-expr with name Ex: /NN.?/ $- JJ|DT=premod Subtree with root matching JJ|DT is stored in a map
under key “premod” Note:
named nodes are not allowed in scope of negation
Optional Nodes
Sometimes we want to try to match a sub-expression to retrieve named nodes if they exist, but still match root if sub-expression fails.
Use the optional relation prefix ‘?’ Ex: NP < (NN ?$- JJ=premod) $+ CC $++ NP
Matches NP over NN with sisters CC and NP If NN is preceded by JJ, we can retrieve the JJ
using the key “premod” If there is no JJ, the expression will still match
Cannot be combined with negation
Tsurgeon
What?makes operations on a grammatical tree
How?based on Tregex syntax
Where? Javanlp: trees.tregex.tsurgeon
How? Tregex
• utility for identifying patterns in trees (like regular expressions for strings)• node descriptions and relationships between nodes
NP < /^NN/
NP
NN
filterscigaretteitsincroco-dilite
usingstoppedfirmThe
PRP
IN
PPVBG
VPVBDDT
VP
S
NN
NP
NN
NP
NNS
Tsurgeon syntax
Define a pattern to be matched on the trees
VBZ=vbz $+ NP
Define one or several operation(s)
relabel vbz VBZ_TRANSITIVE
Delete
(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat))) (PUNCT ?)))
PUNCT=punct > SBARQdelete punct
Delete
(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat))) (PUNCT ?)))
PUNCT=punct > SBARQ
delete punctDelete the node
and everything below it
delete <name1>…<nameN>
Excise
(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP
what) (VB eat))))))
SBARQ=sbarq > ROOT
excise sbarq sbarq
(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat)))))
Excise
(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP
what) (VB eat))))))
SBARQ=sbarq > ROOT
excise sbarq sbarq
name1 is name2
or dominates name2.
All children of name2
go into the parent of
name1,
where name1 was.
excise <name1> <name2>
Insert
(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat)))))
SQ=sq > ROOT !<- /PUNCT/insert (PUNCT .) >-1 sq
<tree> <position>
(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat))) (PUNCT .)))
Position for ‘insert’ and ‘move’
insert <name> <position>insert <tree> <position>
<position> := <relation> <name><relation>$+ the left sister of the named node$- the right sister of the named node>i the i_th daughter of the named node>-i the i_th daughter, counting from the
right, of the named node.
Move
(ROOT (SQ
(NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat))) (PUNCT .)))
VP < (/^WH/=wh $++ /^VB/=vb)
move vb $+ wh
<position>
move <name> <position>
moves the named node into the specified position
Move
(ROOT (SQ
(NP (NNS Cats)) (VP (VBP do) (VP (WHNP what)
(VB eat))) (PUNCT .)))
VP < (/^WH/=wh $++ /^VB/=vb)
move vb $+ wh
<position>
(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (VB eat) (WHNP what)))
(PUNCT .)))
Adjoin syntax
adjoin <auxiliary_tree> <name>
Adjoins the specified auxiliary tree into the named node. The daughters of the target node will become the daughters of the foot of the auxiliary tree.
adjoin (VP (ADVP (RB usually)) VP@) vp foot
Adjoin
VP=vp > SQ !> (__ << usually) adjoin (VP (ADVP (RB usually)) VP@) vp
Stage 1: NLP Transformation
Input: arbitrary text Output: simple, concise and
declarative sentences
Example: Extracting from Appositives
Input: Putin, the Russian Prime Minister, visited Moscow.
Desired Output: Putin was the Russian Prime Minister.
Example: Extracting from Appositives
NP
Putin visited
VBD
NP
ROOT
S
,
VP
, ,
, NP
Siberia
NP
the Russian Prime Minister
(mainverb)(appositive)(noun)
Example: Extracting from Appositives
NP < (NP=noun !$-- NP $+ (/,/ $++ NP|PP=appositive !$CC|CONJP)) >> (ROOT << /^VB.*/=mainverb)
NP
Putin visited
VBD
NP
ROOT
S
,
VP
, ,
, NP
Siberia
NP
the Russian Prime Minister
(mainverb)(appositive)(noun)
Example: Extracting from Appositives
NP
Putin visited
VBDNP
the Russian Prime Minister
Example: Extracting from Appositives
NP
Putin was
VBDNP
the Russian Prime Minister
Singular past tense form of be
Example: Extracting from Appositives
was
VBDNP
Putin
NP
the Russian Prime Minister
S
ROOT
VP
Implementation
Representation: phrase structure trees from the Stanford Parser
Syntactic rules are written in the Tregex tree searching language Tregex operators encode tree relations
such as dominance, sisterhood, etc. Performing manipulation over
identified Tregex pattern (Tsurgeon)
Encoding Linguistic Knowledge
Given an input sentence A that is assumed true, we aim to extract sentences B that are also true.
Our operations are informed by two phenomena:
• semantic entailment • presupposition
Semantic Entailment
A entails B:B is true whenever A is true.
Levinson 1983
A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy.
Simplification by Removing Modifiers
Entailment holds when removing certain types of modifiers.
A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy.
Simplification by Removing Modifiers
40
Entailment holds when removing certain types of modifiers.
discourse marker non-restrictive relative clause
A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy.
Simplification by Removing Modifiers
41
B: Jefferson did not believe the Embargo Act would hurt the American economy.
Entailment holds when removing certain types of modifiers.
discourse marker non-restrictive relative clause
Extracting from Conjunctions
In most clausal and verbal conjunctions, the individual conjuncts are entailed.
A: Mr. Putin built his reputation in part on his success at suppressing terrorism, so the attacks could be considered a challenge to his stature.
B2: The attacks could be considered a challenge to his stature.
B1: Mr. Putin built his reputation in part on his success at suppressing terrorism.
Extracting from Presuppositions
In some constructions, B is true regardless of whether the main clause of sentence A is true.
• i.e., B is presupposed to be true.
A: Hamilton did not like Jefferson, the third U.S. President.
B: Jefferson was the third U.S. President.
negation of main clause
Presupposition TriggersMany presuppositions have clear syntactic or lexical associations.
Trigger Example
non-restrictive appositives
Jefferson, the third U.S. President, …
non-restrictive relative clauses
Jefferson, who was the third U.S. President…
participial modifiers Jefferson, being the third U.S. President, …
temporal subordinate clauses
Before Jefferson was the third U.S. President, …
Jefferson was the third U.S. President.
Stage 1 Algorithms
extractSimplifiedSentences Input
▪ Constituency parse tree . Output
▪ set of trees representing simplified sentences Uses
▪ extractHelper▪ Input
One parse tree▪ Output
Split over conjunctions Checking outputs have subjects and finite main verbs.
Algorithm: extractSimpliedSentences(t)
non-restrictive appositives non-restrictive relative clauses subordinate clauses with a subject and finite verb participial phrases that modify noun phrases,
verb phrases, or clauses for each do
end for return
Algorithm: extractHelper(t)
move any leading prepositional phrases and quotations in to be the last children of the main verb phrase.
remove the following from : noun modifiers offset by commas leading modifiers of the main clause
if t is conjoined with a conjunction then extract new sentence trees for each conjuncts for alldo
end for else ifhas a subject and finite main verb then
{t} end if return
Stage 2: Question Transducer
Input Declarative sentences derived in stage 1
Output Set of grammatically correct questions
▪ Well defined syntactic transformations▪ Identification of answer phrases for WH-movement▪ Marking of unmovable chunks▪ etc
Stage 2: Question Transducer
Mark UnmovablePhrases
Generate PossibleQuestion Phrase *
(Decompose MainVerb)
(Invert Subjectand Auxiliary)
Insert Question Phrase
PerformPost-processing
Question
Declarative Sentence
Stage 2: Question Transducer
Mark phrases that cannot be answer phrases
Select an answer phrase, and generate a set of question phrases for it
Decompose the main verb Invert the subject and auxiliary verb Remove the answer phrase and
insert one of the question phrases at the beginning of the main clause
Post-process to ensure proper formatting
Stage 2: Question Transducer Exceptions
Yes-no questions▪ no answer phrase to remove nor question
phrase to insert answer phrase is the subject of the
declarative sentence▪ John met Sally Who met Sally?▪ decomposition of the main verb and subject-
auxiliary inversion are not necessary▪ subject is removed and replaced by a question
phrase in the same position
Stage 2: Question Transducer
Question generation involves WH-movement
▪ To generate WH questions▪ Target answer phrase is transformed into WH
phrase and is moved to front (WH-fronting)▪ Are all phrases movable?
Subject-Auxiliary inversion▪ To generate decision (yes-no) questions▪ Positions of subject and auxiliary verb are
swapped
Marking Unmovable Phrases An example
Darwin studied how species evolve.▪ ‘Species’ is a potential answer phrase▪ *What did Darwin study how evolve?
Mark phrases that should not undergo WH-movement using Tregex patterns▪ Constraints over the phrases▪ phrases under a clause with a WH
complementizer cannot undergo WH-movement▪ SBAR < /ˆWH.*P/ << NP|ADJP|VP|ADVP|PP=unmv
Marking Unmovable Phrases
Marking Unmovable Phrasesclauses (i.e., “S” nodes) that are under verb phrases and are signalled as adjuncts by being offset by commas
Pattern: VP < (S=unmv $,, /,/)
Input sentence: James hurried, barely catching the bus.
Question to avoid: *What did James hurry?
A $,, B A is a sister of B and follows B
Generating Question Phrases
Iterate over possible answer phrases Generate question for each
Skipped for decision questions. Answer phrase is one of the following
Noun phrase (“NP”) Abraham Lincon Prepositional phrase (“PP”) in 1801 Subordinate clause (“SBAR”) that
Thomas Jefferson was the 3rd U.S. President
Generating Question Phrases
Mapping answer phrases to question phrases Supersense tagger
▪ Label word tokens with high level semantic classes▪ Noun.person, noun.location etc.
B-noun.person I-noun.person B-verb.social B-noun.location O B-verb.change
Richard Nixon visited China to improve
B-noun.communication O
diplomacy .
Generating Question PhrasesWH-word Conditions Examples
Who tag@head=noun.person or a personal pronoun
Abraham Lincoln, him, the 16th president
What tag@head! = noun.time or noun.person
The White House, the building
Where Object of PP tagged with noun.location & preposition: on, in, at, over, to
in Japan, to a small town
When tag@head=noun.time Wednesday, next year, 1929
Whose NP tag@head word noun.person and answer phrase is modified with possessive
John’s car, the president’svisit to Asia, thecompanies’ profits
How many NP
answer phrase is modified by a cardinal number orquantifier phrase
10 books, two hundredyears
Decomposition of Main Verb
Situation: subject-auxiliary inversion Condition: Auxiliary verb or modal is
not present Action: main verb = auxiliary do +
base form of main verbJohn saw Mary
John did see MaryWho did John see?
Decomposition of Main Verb Identifying main verbs that need to
be decomposed
ROOT < (S=clause < (VP=mainvp [ < (/VB.?/=tensed !< is|was|were|am|are|has| have|had|do|does|did) | < /VB.?/=tensed !< VP ]))
Subject-Auxiliary Inversion
ROOT=root < (S=clause <+(/VP.*/) (VP < /(MD|VB.?)/=aux < (VP < /VB.?/=verb)))
clause
aux
verb
clause
aux
verb
A <+ (C) B
Subject-Auxiliary Inversion
ROOT=root < (S=clause <+(/VP.*/) (VP < (/VB.?/=copula <is|are|was|were|am) !< VP))
Copula: word used to link the subject of a sentence with a predicate (a subject complement)
Subject-Auxiliary Inversion
WH-Fronting
S<(NP=np $+ VP)delete np
S=start<VP=vp
relabel start SBARQrelabel vp SQ
SBARQ < SQ=ins
Insert (WHNP (WP Who)) $+ ins
Other Transformations
Other Transformations
The Whole picture
A Simple RunSir Isaac Newton's book "Mathematical Principles of Natural Philosophy", first published in 1687, laid the foundations for classical mechanics.
Simplification Phase
TREE-I
Simplification Phase
TREE-II
Mark Unmovables
Marking Answer Phrases
Subject-Auxiliary Inversion
WH-Fronting for PP-1
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate)Phrase to move: (PP (IN in) (NP (CD 1687)))
WH-Movement for PP-1Insert WH subtree: (WHNP (WHADVP (WRB when)))
Generated Questions
1. Whose book ``Mathematical Principles of Natural Philosophy'' was first published in 1687?
2. What laid the foundations for classical mechanics?3. What did Sir Isaac Newton's book ``Mathematical Principles of
Natural Philosophy'' lay?4. When was Sir Isaac Newton's book ``Mathematical Principles of
Natural Philosophy'' first published?5. Did Sir Isaac Newton's book ``Mathematical Principles of Natural
Philosophy'' lay the foundations for classical mechanics?6. Whose book ``Mathematical Principles of Natural Philosophy'' laid
the foundations for classical mechanics?7. Was Sir Isaac Newton's book ``Mathematical Principles of Natural
Philosophy'' first published in 1687?8. What was first published in 1687?
Another runArvind Kejriwal, the AAP leader, resigned from the post
of CM.
Appositive tree
Simplification Phase
TREE-I TREE-
II
Mark Unmovables
Marking Answer Phrases
WH-Fronting for NP-0
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate)Phrase to move: (NP (NNP Arvind) (NNP Kejriwal))
WH-Movement for NP-0Insert WH subtree: (WHNP (WHNP (WRB who)))
Decomposition of Main Verb
Subject-Auxiliary Inversion
WH-Movement for PP-1
Generated Questions
1. Who resigned from the post of CM?2. What did Arvind Kejriwal resign from?3. Who was Arvind Kejriwal?4. Who was the AAP leader?5. Did Arvind Kejriwal resign from the post of CM?6. Was Arvind Kejriwal the AAP leader?
Stage 3: Question Ranking
Acceptability of a question
▪ returns a vector of real-valued numbers pertaining to different aspects of the question
▪ vector of weights for each feature of a question
Learning weight vector▪ Penalized linear regression (Ridge regression)
Stage 3: Question Ranking
Question features Length feature
▪ Length of question, source sentence, answer phrase
WH words▪ Boolean feature whether a question is a WH
one N-gram log likelihood of question Grammatical features Transformation features etc.
Term Project Evaluation
Term project evaluation includes Presentation (10 min) Demonstration (20 min)
Date 18.04.2015 (Saturday) from 9:30 am Group 1 -4
Date 18.04.2014 (Saturday) from 2:30 am Group 5-9
top related