dependency tree-to-dependency tree machine translation november 4, 2011 presented by: jeffrey...
TRANSCRIPT
1
Dependency Tree-to-Dependency Tree Machine Translation
November 4, 2011Presented by: Jeffrey Flanigan (CMU)
Lori Levin, Jaime Carbonell
In collaboration with:Chris Dyer, Noah Smith, Stephan Vogel
2
Problem
Swahili: Watoto ni kusoma vitabu.Gloss: children aux-pres read booksEnglish: Children are reading books.MT (Phrase-based): Children are reading books.
Why?Phrase Table: Pr(reading books| kusoma vitabu)
Pr(books | kusoma vitabu)Language model: Children are three new reading books. Children are reading books three new.
Swahili: Watoto ni kusoma vitabu tatu mpya.Gloss: children aux-pres read books three newEnglish: Children are reading three new books.MT (Phrase-based): Children are three new books.
3
Problem: Grammatical Encoding Missing
Swahili: Nimeona samaki waliokula mashua.Gloss: I-found fish who-ate boatEnglish: I found the fish that ate the boat.
MT System: I found that eating fish boat.
Predicate-argument structure was corrupted.
4
Grammatical Relations
I found the fish that ate the boat.
SUBJ
OBJ
⇒ Dependency trees on source and target!
ROOT
DETRCMOD
DOBJ
DETREF
5
Approach
Source Dependency Tree
Target Dependency Tree
Source Sentence
Target Sentence
Undo grammatical encoding (parse)
Translate
Grammatical encoding(choose surface form, linearize)
All stages statistical
6
Extracting the rules:Extract all consistent tree fragment pairs
Children are
reading
books
three new
Abaana
barasoma
ibitabo
bitatu bishya
NSUBJ
NSUBJ
NUM
AUX
DOBJ
DOBJ
NUMAMOD
AMOD
are
reading
NSUBJAUX
[1] [2]
DOBJ
barasoma
NSUBJ
[1] [2]
DOBJ
are
reading
NSUBJ
AUX[1] books
DOBJ
ibitabo
barasoma
NSUBJDOBJ
[1]
[1]
NUM
three new
AMOD
bishya
[1]
NUM AMOD
bitatu
ChildrenAbaana
Example Extracted PairsSOURCE SIDE TARGET SIDE
Abaana
[1]
Children
[1]NUM NUM
7
Translating
Extension of phrase-based SMT• Linear strings → Dependency trees• Phrase pairs → Tree fragment pairs• Language model → Dependency language model
Search is top down on the target side using beam search decoder
8
Translation Example
umwaana[3]
arasoma
Ibitabo[4]
is
readingNSUBJ
AUX[1] [2]
DOBJ
[2]
arasomaNSUBJ DOBJ
[1]
childumwaana
P(e|f)=.5
P(e|f)=.8
NSUBJ DOBJ
Inventory of Rules
theDET
NSUBJ[1]
[1]NSUBJ
childumwaana
P(e|f)=.1
aDET
NSUBJ[1]
[1]NSUBJ
ibitabo books P(e|f)=.7
Input
The child is reading books
9
Translation Example
is
reading
[4]umwaana
[3]
arasoma
Ibitabo[4]
is
readingNSUBJ
AUX[1] [2]
DOBJ
[2]
arasomaNSUBJ DOBJ
[1]
childumwaana
P(e|f)=.5
P(e|f)=.8
NSUBJ DOBJNSUBJ
DOBJ
AUX
Inventory of Rules
theDET
NSUBJ[1]
[1]NSUBJ
childumwaana
P(e|f)=.1
aDET
NSUBJ[1]
[1]NSUBJ
Score = w1ln(.5)+w2ln(Pr(reading|ROOT))+w2ln(Pr(is|(reading,AUX)))
ibitabo books P(e|f)=.7
[3]
Input
Language model on target dependency tree
10
Translation Example
is
reading
booksumwaana
[3]
arasoma
ibitabo
is
readingNSUBJ
AUX[1] [2]
DOBJ
[2]
arasomaNSUBJ DOBJ
[1]
childumwaana
P(e|f)=.5
P(e|f)=.8
NSUBJ DOBJNSUBJ
DOBJ
AUX
Inventory of Rules
theDET
[3]
NSUBJ[1]
[1]NSUBJ
childumwaana
P(e|f)=.1
aDET
NSUBJ[1]
[1]NSUBJ
Score = w1ln(.5)+w1ln(.7)+w2ln(Pr(reading|ROOT))+w2ln(Pr(is|(reading,AUX)))+w2ln(Pr(books|(reading,DOBJ)))
ibitabo books P(e|f)=.7
Input
11
Translation Example
is
reading
booksumwaana
arasoma
ibitabo
is
readingNSUBJ
AUX[1] [2]
DOBJ
[2]
arasomaNSUBJ DOBJ
[1]
childumwaana
P(e|f)=.5
P(e|f)=.8
NSUBJ DOBJNSUBJ
DOBJ
AUX
Inventory of Rules
theDET
child
theDET
NSUBJ[1]
[1]NSUBJ
childumwaana
P(e|f)=.1
aDET
NSUBJ[1]
[1]NSUBJ
Score(Translation) = w1ln(.5)+w1ln(.7)+w1ln(.8)+w2ln(Pr(reading|ROOT))+w2ln(Pr(is|(reading,AUX)))+w2ln(Pr(books|(reading,DOBJ)))+w2ln(Pr(child|(reading,NSUBJ)))+w2ln(Pr(the|(child,DET),(reading,ROOT)))
ibitabo books P(e|f)=.7
Input
12
Linearization
• Generate projective trees• A* Search• Left to right with target LM• Admissible Heuristic: Highest scoring
completion without LMenough
is
strong
NSUBJ COP
ADVMOD
He
13
Linearization
enough
is
strong
NSUBJ COP
ADVMOD
He
He is enough strong
Score=Pr(He|START) Pr(<∙ NSUBJ,HEAD,COP>|is) Pr(<∙ HEAD,ADVMOD>|strong)
• Generate projective trees• A* Search• Left to right with target LM• Admissible Heuristic: Highest scoring
completion without LM
14
Linearization
enough
is
strong
NSUBJ COP
ADVMOD
He
He is enough strong
Score=Pr(He|START) Pr(is|He,START) Pr(<∙ ∙ NSUBJ,HEAD,COP>|is) Pr(<∙ HEAD,ADVMOD>|strong)
• Generate projective trees• A* Search• Left to right with target LM• Admissible Heuristic: Highest scoring
completion without LM
15
Linearization
enough
is
strong
NSUBJ COP
ADVMOD
He
He is strong enough
Score=Pr(He|START) Pr(is|He∙ ,START) Pr(strong|He,is)∙ ∙ Pr(<NSUBJ,HEAD,COP>|is) Pr(<∙ ADVMOD,HEAD>|strong)
• Generate projective trees• A* Search• Left to right with target LM• Admissible Heuristic: Highest scoring
completion without LM
16
Linearization
enough
is
strong
NSUBJ COP
ADVMOD
He
He is strong enough
Score=Pr(He) Pr(is|He) Pr(strong|He, is) Pr(enough|strong, is)∙ ∙ ∙ ∙ Pr(<NSUBJ,HEAD,COP>|is) Pr(<∙ HEAD,ADVMOD>|strong)
• Generate projective trees• A* Search• Left to right with target LM• Admissible Heuristic: Highest scoring
completion without LM
17
Comparison To Major ApproachesApproach Similarities Difference
Old Style Analysis-Transfer-Generate Separate analysis, generation, transfer models
Statistical, rules learned
Synchronous CFGs[Chiang 2005] [Zollman et al. 2006]
Model of grammatical encoding Allows adjunction and head switching
Tree-Transducers[Graehl & Knight 2004]
Model of grammatical encoding Different decoding
Quasi-Synchronous Grammars[Gimpel & Smith 2009]
Dependency trees on source and target
Different rules, decoding
Synchronous Tree Insertion Grammars [DeNeefe & Knight 2009]
Allows adjuncts Allows head switching
Dependency Treelets [Quirk et al 2005] [Shen et al 2008]
Dependency trees on source and target
Word order not in rules, linearization procedure
String-to-Dependency MT[Shen et al 2008]
Target dependency language model
Dependency trees on both source and target
Dependency tree to dependency tree (JHU Summer Workshop 2002)[Čmejrek et al 2003] [Eisner 2003]
Dependency trees on source and target. Linearization step.
Different learning of rules, different decoding procedure
18
Conclusion
• Separate translation from reordering• Dependency trees capture grammatical relations• Can extend phrase-based MT to dependency trees• Complements ISI’s approach nicely
Work in progress!
19
Backup Slides
20
Allowable Rules
• Nodes consistent w/ alignments• All variables aligned• Nodes variables arcs alignments = connected graph∪ ∪ ∪
Optional Constraints• Nodes on source connected• Nodes on target connected• Nodes on source and target connected
Decoding Constraint• Target tree connected
21
Head Switching Example
bébé
Le
vient
de
tomber
child
fell
just
The
NSUBJ
NSUBJ
DET
PREP
ADVMOD
POBJ
ADVMOD
[1]
[2]
just
NSUBJADVMOD
[1]
vient
de
[2]
NSUBJ PREP
POBJ
[1]
fell
[2]
NSUBJ ADVMOD
[1]
[2]
de
tomber
NSUBJ PREP
POBJ
22
Moving Up the Triangle
Propositional Semantic Dependencies
Deep Syntactic Dependencies
Surface Syntactic Dependencies
23
Comparison to Synchronous Phrase Structure Rules
• Training dataset:
• Test sentence:
• Synchronous decoders (SAMT, Hiero, etc) produce:The children are reading book ’s Charles new all of .The children are reading book Charles ’s all of new .
Problem: Grammatical encoding tied to word order.
Kinyarwanda: Abaana baasoma igitabo gishya kyose cyaa Karooli .English: The children are reading all of Charles ’s new book .
Kinyarwanda: Abaana baasoma igitabo cyaa Karooli gishya kyose.