pre-ordering dependency subtrees for phrase-based smt intern: arianna bisazza. mentors: alex ceausu,...
TRANSCRIPT
PRE-ORDERING DEPENDENCY SUBTREES
FOR PHRASE-BASED SMT
Intern: Arianna Bisazza. Mentors: Alex Ceausu, John Tinsley
Dependency subtree pre-ordering
What if… we can’t/don’t want to change the decoding process we have dependency parses available
…one way to go: pre-order input parse trees, then translate normally
Main research problems: how to pre-order? (ordering model) and what? (rule selection)
Dependency subtree pre-ordering
“Die Budapester Staat anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet.” the Budapest Prosecutor’s Office has its investigation on the accident initiated
ihre|PPOSAT
zum|APPRART
Vorfall|NN
hat|VAFIN
anwaltschaft|NN
Staat|NN
Budapester|NN
die|ART
eingeleitet|VVPP
.|$.
Ermittlungen|NN
NK
NK
NKSB
OC
PUNC OA
NK
MNR
NK
NN
VAFIN
VVPP $. NN
VAFIN
VVPP $.
NN
VVPP
NN
VVPP
... ...
Permute subtrees (a node + its children)
Each subtree processed independently
Dependency subtree pre-ordering
“Die Budapester Staat anwaltschaft hat ihre Ermittlungen zum Vorfall eingeleitet.” the Budapest Prosecutor’s Office has its investigation on the accident initiated
ihre|PPOSAT
zum|APPRART
Vorfall|NN
hat|VAFIN
anwaltschaft|NN
Staat|NN
Budapester|NN
die|ART
eingeleitet|VVPP
.|$.
Ermittlungen|NN
NK
NK
NKSB
OC
PUNC
OA NK
MNR
NK
NN
VAFIN
VVPP $. NN
VAFIN
VVPP $.
NN
VVPP
NN
VVPP
... ...
Permute subtrees (a node + its children)
Each subtree processed independently
Pre-ordering model (1) – MLE
Baseline model: max likelihood MLE (relative frequency-based) Subtree representation: relation type and POS tag
OA|NN
_OC|VVPP OA|NN
_OC|VVPPProb=0.75
Prob=0.25OA|NN
_OC|VVPP
Limitations: - ambiguity due to coarse word classification (only few relation/POS tags) - coverage: many unseen or low-counts subtrees
Pre-ordering model (2) – SMT
Idea: learn to reorder by SMT! Train a phrase-based system on pairs of original/pre-ordered
source language node sequences (subtrees)
Advantages: generalization: all node sequences can be processed model flexibility: represent different features as “factors” tune different model weights by MERT
ORIGINAL
SB|NN _ROOT|VAFIN OC|VVPP PUNC|$.NK|ART NK|NN NK|NN _SB|NNOA|NN _OC|VVPP ...OA|NN _OC|VVPP ...
PRE-ORDERED
SB|NN _ROOT|VAFIN OC|VVPP PUNC|$.NK|ART NK|NN NK|NN _SB|NN_OC|VVPP OA|NN...OA|NN _OC|VVPP ...
Pre-ordering model (2) – SMT
Possible models: original-to-preordered phrase table “target” (preordered) n-gram language models lexicalized reordering models at the level of relation type,
POS tags or words etc. all models log-linearly combined weights tuned by MERT, optimizing reo.score (KRS)
ORIGINAL
SB|NN|anwaltschaft _ROOT|VAFIN|hat OC|VVPP|eingeleitet PUNC|$.|.
NK|ART|die NK|NN|Budapester NK|NN|Staat _SB|NN|anwaltschaft
OA|NN|Ermittlungen _OC|VVPP|eingeleitet
...
Each feature type is represented as a factor, for example:
Evaluation
Training/dev/test: 495/2.5/2.5K sent. from WMT-12 De-En train data1.6M/8K/9K training subtrees (rooted at verb nodes)
Method Add. models BLEU KRS ACC UNK
MLE-rel -- 57.77 71.01 46.35 9.53
MLE-relPOS -- 55.00 71.03 45.75 24.33
SMT-relPOS(moses)
LM(rel) +LM(POS) 58.84 72.57 48.45 --
+lexreo(relPOS) 60.24 73.05 49.37 --
+lexreo(words) 59.87 72.92 49.05 --
+lexreo(nodeSpan) 59.72 72.94 49.03 --
Method Add. models BLEU KRS ACC UNK
MLE-relPOS -- 63.38 77.27 63.08 11.91
SMT-relPOS(moses)
+lexreo(relPOS) 66.74 78.24 64.69 --
4.8M/23K/24K training subtrees (all with >1 node)
Selective pre-ordering
Not all subtrees need to be pre-ordered
(especially in language pairs like German-English) How to select them? Approach: compute average distortion gain on training data,
then only pre-order subtrees with high distortion gain Pre-ordering performances, with two different thresholds
Selection %subtrees Method Add. models BLEU KRS ACC UNK
None (all subtrees) 100%
MLE -- 63.38 77.27 63.08 11.91
SMT +lexreo(relPOS) 66.74 78.24 64.69 --
HRPd15f3 13%MLE -- 55.59 72.64 40.09 30.43
SMT +lexreo(relPOS) 60.68 75.02 44.98 --
SRPd20f3 3%MLE -- 45.17 60.96 24.34 8.70
SMT +lexreo(relPOS) 51.70 65.21 28.76 --
MT experiments
Using WMT-12 De-En training and test data
Input DLnewstest2009 newstest2010
BLEU KRS BLEU KRSOriginal
4
19.43 62.70 20.96 66.00Reo.all 20.25 62.20 21.88 65.48Reo.verbRoot 20.27 62.27 21.97 65.48Reo.HRPd15f3 19.70 62.51 21.26 65.91Reo.SRPd20f3 19.55 62.67 21.08 65.95Original
8
20.18 63.14 21.68 66.15Reo.all 20.34 61.85 21.97 65.00Reo.verbRoot 20.38 62.00 22.09 65.06Reo.HRPd15f3 20.35 62.67 21.82 65.89Reo.SRPd20f3 20.25 62.99 21.73 66.03
MT output examples (1)
ORI: nach dem steilen Abfall am Morgen konnte die Prager Börse die Verluste korrigieren .REO: nach dem steilen Abfall am Morgen die Prager Börse konnte die Verluste korrigieren .
REF: after a sharp drop in the morning , the Prague Stock Market corrected its losses .BASE: after the sharp falls on the morning , the Prague Stock Exchange to correct the losses . NEW: after the sharp falls on the morning the Prague Stock Exchange was able to correct the losses .
MT output examples (2)
ORI: … über einen Plan , der funktionieren wird und der auf dem Markt auch wirksam sein muss .REO: … über einen Plan , der wird funktionieren und der muss sein auch wirksam auf dem Markt .
REF: … on a plan which will function and which also must be effective on the market . BASE: … on a plan that will work and on the market also needs to be effective . NEW: … on a plan that will work and must also be effective on the market .
MT output examples (3)
ORI: die Kongress Abgeordneten müssen nämlich noch einige Details der Vereinbarung aushandeln , ehe sie die Endfassung des Gesetzes veröffentlichen und darüber abstimmen dürfen .
REO: die Kongress Abgeordneten müssen nämlich aushandeln , ehe sie veröffentlichen die Endfassung des Gesetzes und dürfen darüber abstimmen noch einige Details der Vereinbarung .
REF: that is , the members of congress have to complete some details of the agreement before they can make the final version of the law public and vote on it .
BASE: members of Congress : some details must still negotiate the agreement before they publish the final version of the law and able to vote on it .
NEW: members of Congress must negotiate before they publish the final version of the law and must still vote on some details of the agreement .
Conclusions & TODO’s
Pre-ordering with SMT-like system always outperforms baseline MLE, but gains are small
Evaluation issue: reference reorderings are very noisy!
When input is pre-ordered BLEU improves but KRS decreases... more error analysis needed!
Possible reason: the SMT system must be re-trained (or at least tuned) on pre-ordered data
More thresholds for rule selection should be tested … other suggestions?
Thanks for your attention!