fast%andaccurate% preordering%for%smt%using...

21
Fast and Accurate Preordering for SMT using Neural Networks Adrià de Gispert, Gonzalo Iglesias and Bill Byrne SDL Research,Cambridge (UK)

Upload: others

Post on 11-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

Fast and Accurate Preordering for SMT using Neural NetworksAdrià de Gispert, Gonzalo Iglesias and Bill ByrneSDL Research, Cambridge (UK)

Page 2: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

2

Preordering for SMT

Transform the source sentence into target-­like order

Doubly beneficial:– Better translation models à improved quality– Faster decoding is possible (less distortion required)

Particularly interesting for real-­life commercial systems

Page 3: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

3

Preordering for SMT

私たちは、すっかり西安が好きになりました。

We have come to quite like Xi’an .

Hard

Translation

Page 4: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

4

Preordering for SMT

私たちは、すっかり西安が好きになりました。

We quite Xi’an like to come have .

Easier

Translation

We have come to quite like Xi’an .

Preordering

Page 5: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

5

Approaches to Preordering

Rule-­based, using language-­specific hand-­crafted rules

Statistical, learnt from:– Hand-­aligned data (rarely available)– Automatically-­aligned data– Automatically-­aligned source-­parsed data ß this work

Page 6: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

6

Approaches to Preordering (2)

From automatically-­aligned source-­parsed data:– Genzel (2010). Extracts corpus-­level preordering rule sequences• all sentences get identical preordering treatment;; no lexical info used

– Jehl et al. (2014). Feature-­rich Logistic Regression model• predicts when to swap two nodes in the source tree;; incorporates lexical information;; needs feature engineering

– This work: use a Neural Network to learn the node-­swapping model• superior modeling capabilities, better regression model• no need for feature engineering• faster decoding

Page 7: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

7

Preordering as node-­pair swapping (Jehl et al. 2014)

Features: POS, dependency labels, identity and class of the head word (parent node), left/right-­most word (children nodes)Combinations: bigrams and trigrams of some of the above ß too many features with lexical info!

Page 8: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

8

Preordering as node-­pair swapping (Jehl et al. 2014)

Features: POS, dependency labels, identity and class of the head word (parent node), left/right-­most word (children nodes)Combinations: bigrams and trigrams of some of the above ß too many features with lexical info!

We replace steps 6-­7 by feed-­forward Neural Network

-­ Trained with NPLM on ~100M labeled samples from aligned parallel corpus: swap if it decreases number of crossings -­ 4 layers: in, 2 hidden, out softmax with 2 output values-­ same basic features: no explicit combinations needed

Page 9: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

9

Preordering as node-­pair swapping (Jehl et al. 2014)

Search Permutation

We have come [TO]

[SENT]

to quite [LIKE]

like Xi’an

Given a node with 3 children: s1 s2 s3

Score each possible start:-­ with s1: (1-­p(s1,s2)) . (1-­p(s1,s3))-­ with s2: p(s1,s2) . (1-­p(s2,s3))-­ with s3: p(s1,s3) . p(s2,s3)

Continue exploring space of permutations with depth-­first branch-­and-­bound search until global optimum is found

Page 10: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

10

Intrinsic Evaluation: Crossing score

Page 11: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

11

Translation experiments

Phrase-­based decoder, averaging 3 MERT runs English into Japanese, Korean, Chinese, Arabic and Hindi– over 100M words training corpora (except Hindi: 9M)– General domain, web-­crawled

Two test sets:– in domain: same as parallel data– mixed domain: equally represent 10 domains (news, health, sport, science, business, chat, ...)

2014 WMT English-­Hindi task

Page 12: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

12

Translation Quality Evaluation: BLEU

distortion limit

Page 13: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

13

Translation Quality Evaluation: BLEU

distortion limit

We achieve the best performance among all preorderers

Scores with distortion 3 are:-­ better or equal than baseline with distortion 10, but about 3 times faster-­ much better than fast baseline with distortion 3

Page 14: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

14

Faster decoding

Preordering allows much faster decoding: narrower beams can be used as task is more monotonic

Good preordering: same performance than with wide beams, but way faster

Page 15: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

15

Example

Syntactic Analysis

We have come [TO]

[SENT]

to quite [LIKE]

like Xi’an

Preordering

Page 16: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

16

Example

Syntactic Analysis

We have come [TO]

[SENT]

to quite [LIKE]

Xi’an like

Preordering

Page 17: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

17

Example

Syntactic Analysis

We have come [TO]

[SENT]

quite [LIKE] to

Xi’an like

Preordering

Page 18: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

18

Example

Syntactic Analysis

We [TO] come have

[SENT]

quite [LIKE] to

Xi’an like

Preordering

Page 19: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

19

Conclusions

We use a Neural Network to model the node-­swapping Logistic Regression model for Preordering

Accurate:– automatically learns non-­linear feature combinations– beats previous models in crossing score and translation performance

Fast:– feed-­forward network is more efficient than explicit feature combination– BLEU scores improve at very fast decoding conditions

Page 20: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

20

Page 21: Fast%andAccurate% Preordering%for%SMT%using ...mi.eng.cam.ac.uk/~wjb31/PUBS/Preordering.NAACL2015.pdf3 Preordering%for%SMT 私たちは、すっかり西安が好きになりまし

Copyright © 2008-­2014 SDL plc. All rights reserved. All company names, brand names, trademarks, service marks, images and logos are the property of their respective owners.

This presentation and its content are SDL confidential unless otherwise specified, and may not be copied, used or distributed except as authorised by SDL.

Global Customer Experience Management

Thank you for your attention !