what is the jeopardy model? a quasi-synchronous grammar for question answering mengqiu wang, noah a....
Post on 21-Dec-2015
219 views
TRANSCRIPT
What is the Jeopardy Model?A Quasi-Synchronous Grammar
for Question Answering
Mengqiu Wang, Noah A. Smith and Teruko Mitamura
Language Technology InstituteCarnegie Mellon University
2
The task
High-efficiency document retrieval
High-precision answer ranking
Who is the leader of France?
1. Bush later met with French president Jacques Chirac. 2. Henri Hadjenberg, who is the leader of France ’s Jewish community, …3. …
1. Henri Hadjenberg, who is the leader of France ’s Jewish community, …2. Bush later met with French president Jacques Chirac. (as of May 16 2007)
3. …
3
Challenges
High-efficiency document retrieval
High-precision answer ranking
Who is the leader of France?
1. Bush later met with French president Jacques Chirac.
2. Henri Hadjenberg, who is the leader of France ’s Jewish community, …3. …
4
Semantic Tranformations
Q:“Who is the leader of France?”
A: Bush later met with French president Jacques Chirac.
5
Syntactic Transformations
Who leaderthe Franceofis ?
Bush met Frenchwith president Jacques Chirac
mod mod
mod
6
Syntactic Variations
Who leaderthe Franceofis ?
Henri Hadjenberb , who leaderis the of France ’s Jewish community
mod mod
mod
mod
7
Two key phenomena in QA
Semantic transformation leader president
Syntactic transformation leader of France French president
Q A)|( QAP
8
Existing work in QA
Semantics Use WordNet as thesaurus for expansion
Syntax Use dependency parse trees, but merely
transform the feature space into dependency parse feature space. No fundamental changes in the algorithms (edit-distance, classifier, similarity measure).
9
Where else have we seen these transformations?
Machine Translation (especially in syntax-based MT)
Paraphrasing Sentence compression Textual entailment
F E)|( FEP
10
Noisy-channel
Machine Translation
Question Answering
S E)()|()|( EPEFPFEP
Q A)()|()|( APAQPQAP
Language modelTranslation model
retrieval modelJeopardy model
11
From wikipedia.org: Jeopardy! is a popular international television
quiz game show (#2 of the 50 Greatest Game Show of All
Times). 3 contestants select clues in the form of an
answer, to which they must supply correct responses in the form of a question.
The concept of "questioning answers" is original to Jeopardy!.
What is Jeopardy! ?
)|( AQP
12
Jeopardy Model
We make use of a formalism called quasi-synchronous grammar [D. Smith
& Eisner ’06], originally developed for MT
13
Quasi-Synchronous Grammars Based on key observations in MT:
translated sentences often have some isomorphic syntactic structure, but not usually in entirety.
the strictness of the isomorphism may vary across words or syntactic rules.
Key idea: Unlike some synchronous grammars (e.g. SCFG,
which is more strict and rigid), QG defines a monolingual grammar for the target tree, “inspired” by the source tree.
14
Quasi-Synchronous Grammars In other words, we model the generation of
the target tree, influenced by the source tree (and their alignment)
QA can be thought of as extremely free translation within the same language.
The linkage between question and answer trees in QA is looser than in MT, which gives a bigger edge to QG.
15
Jeopardy Model Works on labeled dependency parse trees Learn the hidden structure (alignment between Q and
A trees) by summing out ALL possible alignments
One particular alignment tells us both the syntactic configurations and the word-to-word semantic correspondences
An example…
question answer
answerparse tree
questionparse tree
an alignment
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
isVB
Q: A:$
root$
root
root root
subj with
nmod
nmod
root)|P(root
noNE)|P(noNE
VBD)| P(VB
Our model makes local Markov assumptions to allow efficient computation via Dynamic Programming (details in paper)
given its parent, a word is independent of all other words (including siblings).
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
isVB
Q: A:$
root$
root
root
subj
root
subj with
nmod
nmod
child)-parent|P(subj
person)|P(qword
NNP)|P(WP
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
Q: A:$
root$
root
root
subj obj
root
subj with
nmod
nmod
child)-tgrandparen|P(obj
noNE)|P(noNE
NN)|P(NN
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
Q: A:$
root$
root
root
subj obj
det
root
subj with
nmod
nmod
)word-same|P(det
noNE)|P(noNE
N)|P(DT
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
)child-parent|P(of
location)|P(location
JJ)|P(NNP
23
6 types of syntactic configurations
Parent-child
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
Parent-child configuration
26
6 types of syntactic configurations
Parent-child Same-word
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
Same-word configuration
Parent-child configuration
29
6 types of syntactic configurations
Parent-child Same-word Grandparent-child
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
Parent-child configuration Same-word configuration
Grandparent-child configuration
32
6 types of syntactic configurations
Parent-child Same-word Grandparent-child Child-parent Siblings C-command(Same as [D. Smith & Eisner ’06])
Parent-child configuration Same-word configuration Grandparent-child configuration
Child-parent configuration Siblings configuration C-command configuration
34
Modeling alignment Base model
)child-parent|P(of
location)|P(location
N)|P(N
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
BushNNP
person
metVBD
FrenchJJ
location
presidentNN
Jacques ChiracNNP
person
whoWP
qword
leaderNN
isVB
theDT
FranceNNP
location
Q: A:$
root$
root
root
subj obj
det of
root
subj with
nmod
nmod
37
Modeling alignment cont.
Base model
Log-linear modelLexical-semantic features from WordNet,Identity, hypernym, synonym, entailment, etc.
Mixture model
38
Parameter estimation
Things to be learnt Multinomial distributions in base model Log-linear model feature weights Mixture coefficient
Training involves summing out hidden structures, thus non-convex.
Solved using conditional Expectation-Maximization
39
Experiments
Trec8-12 data set for training Trec13 questions for development
and testing
40
Candidate answer generation
For each question, we take all documents from the TREC doc pool, and extract sentences that contain at least one non-stop keywords from the question.
For computational reasons (parsing speed, etc.), we only took answer sentences <= 40 words.
41
Dataset statistics Manually labeled 100 questions for training
Total: 348 positive Q/A pairs 84 questions for dev
Total: 1415 Q/A pairs 3.1+, 17.1-
100 questions for testing Total: 1703 Q/A pairs 3.6+, 20.0-
Automatically labeled another 2193 questions to create a noisy training set, for evaluating model robustness
42
Experiments cont.
Each question and answer sentence is tokenized, POS tagged (MX-POST), parsed (MSTParser) and labeled with named-entity tags (Identifinder)
43
Baseline systems (replications) [Cui et al. SIGIR ‘05]
The algorithm behind one of the best performing systems in TREC evaluations.
It uses a mutual information-inspired score computed over dependency trees and a single fixed alignment between them.
[Punyakanok et al. NLE ’04] measures the similarity between Q and A by
computing tree edit distance. Both baselines are high-performing, syntax-based,
and most straight-forward to replicate We further enhanced the algorithms by augmenting
them with WordNet.
44
ResultsMean Average
PrecisionMean Reciprocal
Rank of Top 1
Statistically significantly better than the 2nd best score in each column
28.2% 23.9% 41.2% 30.3%
45
Summing vs. Max
46
Conclusion We developed a probabilistic model for QA
based on quasi-synchronous grammar Experimental results showed that our model
is more accurate and robust than state-of-the-art syntax-based QA models
The mixture model is shown to be powerful. The log-linear model allows us to use arbitrary features.
Provides a general framework for many other NLP applications (compression, textual entailment, paraphrasing, etc.)
47
Future Work Higher-order Markovization, both horizontally
and vertically, allows us to look at more context, at the expense of higher computational cost.
More features from external resources, e.g. paraphrasing database
Extending it for Cross-lingual QA Avoid the paradigm of translation as pre- of post-
processing We can naturally fit in a lexical or phrase
translation probability table into our model to model the translation inherently
Taking into account parsing uncertainty
48
Thank you!
Questions?