Philipp KoehnChief ScientistOmniscien Technologies
Professor of Computer ScienceJohns Hopkins University
Philipp Koehn is Professor of Computer Science at Johns Hopkins University andChief Scientist for Omniscien Technologies. He also holds the Chair for MachineTranslation in the School of Informatics at the University of Edinburgh. Philipp is aleader in the field of statistical machine translation research with over 100publications. He is the author of the seminal textbook in the field. Under hisleadership the open source Moses system became the de-facto standard toolkit formachine translation in research and commercial deployment.
Philipp led international research projects such as Euromatrix and CASMACAT.Philipp's research has been funded by the European Union, DARPA, Google,Facebook, Amazon, Bloomberg and several other funding agencies.
Philipp received his PhD in 2003 from the University of Southern California and wasa postdoctoral research associate at MIT. He was a finalist in the European PatentOffice's European Inventor Award in 2013 and received the Award of Honor fromthe International Association of Machine Translation in 2015.
At Omniscien Technologies Philipp has refined machine translation technology foruse in real-world deployments and helped to develop methods for data acquisitionand refinement. Philipp continues to drive innovation and technologicaldevelopment at Omniscien Technologies.
AI, MT and Language Processing Symposium
Philipp KoehnChief ScientistOmniscien Technologies
Professor of Computer ScienceJohns Hopkins University
The recent trend of using deep learning to solve a wide variety of problems inArtificial Intelligence has also reached machine translation - thus establishing a newstate-of-the-art approach for this application. This approach is not yet settled byany means. New neural architectures are proposed and ideas coming from suchdiverse fields as computer vision, game playing, and speech recognition can beapplied to machine translation as well.
At the practical end, we are just learning about the deployment challenges of thistechnology, since old methods, for example, to integrate terminology databases ordomain adaptation no longer apply.
This presentation will give an overview of the latest developments in research andwhat this means for practical deployment.
AI, MT and Language Processing Symposium
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Facebook.com/omniscien @omniscientech Omniscien Technologies [email protected]
Research in Translation –What is Exciting and Shows Promise Ahead?
Philipp Koehn
5Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Overview
• Evolution of Machine Translation
• Deep Learning
• Neural Machine Translation
• Challenges
• Looking Forward
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Evolution of Machine Translation
8Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Machine Translation Paradigms
• Various Approaches• Rule-based (1970s)
• Word-based (1990s)
• Phrase-based (2000s)
• Syntax-based (2010s)
• Neural-based (2016+)
Source Target
Interlingua
Semantic Transfer
Syntax Transfer
Lexical Transfer
9Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Hype and Reality
10Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Better Machine Learning
• Probabilistic models (1990s)
• Increased use of machine learning (2000s)
• Neural networks (since mid 2010s)
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Deep Learning
14Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Two Objectives
Fluency
• Translation must be fluent in the target language
• Need model that assigns a language score to each sentence
Adequacy
• Translation must have same meaning as source sentence
• Need model that assigns a translationscore to each sentence
15Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Learning from Data
• Detect patterns in aligned segment pairs
16Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Machine Learning
• Key to success• Analyze problem
• Feature engineering
• For instance: machine translation• What features are relevant for word order?
• What features are relevant for lexical translation?
Input
Features
Output
17Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Learning
• Promise: no more feature engineering
• Several steps of processing features automatically discovered
Input
Hidden
Output
18Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Deep Learning
• More layers
• More complex
feature interactionsHidden
Hidden
Output
Input
Hidden
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translation
20Copyright © 2018 Omniscien Technologies. All Rights Reserved.
word2vec
• Task: Predict word in the middle
21Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Network Solution
• Learn mapping with a neural network
22Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Map Word to Embedding
• Vector representation of word
• Mathematically: • a matrix multiplication
• followed by an non-linear activation function
23Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Visualizing Neural Relationships and Features
Relationships are built much like the human brain.Collections of concepts and vocabulary.
24Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Visualizing Neural Relationships and Features
Distance indicates closeness of relationships.Groupings are formed.
25Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Visualizing Neural Relationships and Features
Groups are directly and indirectly interrelated.i.e. Sports + Broadcasting and Entertainment
26Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translation
• Recall: two models
• Language model
… to ensure fluent output
• Translation model
… to ensure adequate translations
27Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I …
28Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I like ....
29Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I like to ...
30Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I like to learn ...
31Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I like to learn about …
32Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I like to learn about machine …
33Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Language Models
• Sequential language models:
predict the next word
I like to learn about machine translation .
37Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Recurrent Neural Language Model
Predict
the first word
of a sentence
same as before,
just drawn top-down
<s>
the
Given word
Embedding
Hidden state
Predicted word
38Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Recurrent Neural Language Model
Predict
the second word
of a sentence
Re-use hidden state
from
first word prediction
<s>
the
the
house
Given word
Embedding
Hidden state
Predicted word
39Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Recurrent Neural Language Model
<s>
the
the
house
house is big .
is big . </s>
Given word
Embedding
Hidden state
Predicted word
40Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Encoder Decoder Model
• We predicted the words of a sentence
• Why not also predict their translations?
41Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Encoder Decoder Model
<s>
the
the
house
house is big .
is big . </s>
Given word
Embedding
Hidden state
Predicted word
</s>
das
das
Haus
Haus ist groß .
ist groß . </s>
• Obviously madness
• Proposed by Google (Sutskever et al. 2014)
42Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Attention Mechanism
• What is missing?
• Alignment of source words to target words
• Solution: attention mechanism
43Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translation, 2016
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
44Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translation, 2016
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
• State of the art
• Used by Google, WIPO, Systran, Omniscien…
45Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Input Sentence
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
46Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Encode with Word Embeddings
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
47Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Output Sentence
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
48Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Each Word Predicted by Embedding
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
49Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Embedding Predicted from Input Context
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
50Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Input Context Selected By Word Alignment
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
51Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Input Context: Weighted Sum of Input Embeddings
Input Word
Embeddings
Left-to-Right
Recurrent NN
Right-to-Left
Recurrent NN
Alignment
Input Context
Hidden State
Output Words
52Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Benefits
• Each output predicted from• encoding of the full input sentence
• all previously produced output words
• Word embeddings allow generalization• “cat” and “cats” have similar representation
• “house” and “home” have similar representations
53Copyright © 2018 Omniscien Technologies. All Rights Reserved.
WMT 2016 Evaluation (News, English-German)
Neural MT
Statistical MT
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Challenges
55Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Benefits of Neural Machine Translation
• Evidence of overall better translation quality
• Ability to better generalize training data
• Better handling sentence-level context
• Better fluency
56Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translaton is Data-Hungry
Phrase Based SMT with Big Language Model
BLE
U S
core
Corpus Size 10,000,000100,000
100,000,0001,000,000
1,000,000,00010,000,000
30
20
10
0
Phrase Based SMT
Neural MT
WordsSentences
57Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translation Failures
58Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Adequacy or Fluency?
• Language model may take over
• Output unrelated to input
59Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Fluency vs. Adequacy Errors
• Input
Ich will Kuchen essen
• Fluency error (more common in SMT)
I want cake eat
• Adequacy error (more common in NMT)
I want to cook chicken
60Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Limited Vocabulary
• Words are encoded in highly dimension vector
• Only allows for limited vocabulary size• words are split into subwords
• maybe even split into characters?
• fall-back to dictionaries / phrase-based models
62Copyright © 2018 Omniscien Technologies. All Rights Reserved.
NMT More Susceptible to Noisy Training Data
• More harmed by• Alignment errors
• Bad language
• Wrong language on target side
• Severely harmed by un-translated source text (over-learns to copy)
• Data cleaning more important
63Copyright © 2018 Omniscien Technologies. All Rights Reserved.
NMT is Worse Out-of-Domain
• In nearly all cases, SMT was better than NMT when content was out of domain.
• More data is required for NMT to meet domain specific needs
• When sufficient data is available, NMT usually will be better than NMT for typical sentences
64Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Deployment Challenges for Neural MT
• Speed• training takes weeks
• decoding slower than traditional SMT
• Hardware requirements
• GPUs needed ($ 2’000 each)
• Google even has specialized hardware
• Process is not transparent
• Practically impossible to find out “why wrong?”
• Mistakes cannot be easily fixed
65Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Neural Machine Translation – A Mystery?
• Decisions of statistical often hard to understand
• Neural: even harder
input MAGIC output
• New studies reveal inner workings• Attention mechanism
• Word sense disambiguation
66Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Attention States
• Attention mechanism plays role of “word alignment”
• “Soft alignment”: distributed over several input words
67Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Word Sense Disambiguation
Deep embedding
of the word “right”
in encoder
68Copyright © 2018 Omniscien Technologies. All Rights Reserved.
NMT vs SMT: What We Know By Now
• In ideal conditions, NMT much better
• Different types of error (fluency vs. adequacy)
• NMT more susceptible to noise
• NMT less robust (out-of-domain, low-resource, etc.
=> Hybrid approach of Omniscien Technologies
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Looking Forward
71Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Attention Sequence-to-Sequence Model
• Based on recurrent neural networks
• Attention mechanism (alignment)
• Standard Approach 2015-2017
72Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Deeper Models
• More layers in encoder and decoder
• Models more complex relationships between words
• Significantly higher performance
73Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Google’s “Transformer” Model
• Self-attention
• Encoder: Input words inform each other
• Decoder: Attention on some previous output words
74Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Facebook’s Convolutional Model
• Hierarchical (“convolutions”) instead of sequential
• Faster (but more limited context)
• In encoder and decoder
75Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Synthesizing Data
• Neural machine translation trained on parallel data
• Improve with monolingual data• Back-translate target language text into source language
• Add as training data
• Can be iterated (“dual learning”)
76Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Domain Adapted Models
• Various techniques explored for cutomization
• One simple effective method• Train general system on all available data
• Fine-Tune on in-domain data
77Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Terminology
• Terminology, brand names with fixed translations
Der neue Neurolierer XVQ-72 ist lieferbar.
Neurolizer XVQ-72
• XML markup
Der neue <x translation=“Neurolizer XVQ-72”>
Neurolierer XVQ-72</x> ist lieferbar.
• Use attention states to detect insertion point
78Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Dynamic Software Environment
• Major players released deep learning frameworks• Tensorflow (Google)
• pyTorch (Facebook)
• MX-Net (Amazon)
• Theano framework discontinued development
• Also: dedicated NMT implementations (faster)
• Quick turn-around from research into deployment
79Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Hardware Developments
New GPUs from NVIDIA in 2018
• Faster, more memory
• Enable deeper models
Copyright © 2018 Omniscien Technologies. All Rights Reserved.
Facebook.com/omniscien @omniscientech Omniscien Technologies [email protected]
Research in Translation –What is Exciting and Shows Promise Ahead?
Philipp Koehn